Generative AI models are sparking excitement among business leaders, with the promise of automating tasks and potentially replacing millions of jobs. However, researchers at the Massachusetts Institute of Technology (MIT) caution that, while AI may provide plausible answers, it lacks an understanding of complex systems, remaining limited to predictive functions. In areas like logical reasoning, navigation, chemistry, and gaming, these limitations become particularly evident.
Testing AI’s “Understanding” Abilities
Large language models (LLMs) such as GPT-4 often give the impression of producing thoughtful responses to intricate queries. Yet, these models are primarily predicting the most probable next words based on context, rather than genuinely understanding the information. To determine whether AI can truly grasp real-world contexts, MIT scientists have developed metrics designed to test AI models’ “intelligence” objectively.
In one test, researchers evaluated AI’s capability to generate step-by-step directions for navigating the streets of New York City. Although generative AI models demonstrate some level of “implicit” learning of real-world rules, this is not the same as having a true understanding of these rules. To enhance the precision of their analysis, the MIT team developed formalized methods to assess how accurately AI perceives and interprets real-world scenarios. We’ll keep you updated on further developments.
Findings on AI’s Limitations
The MIT study focused on transformers, a category of generative AI models used in popular applications like GPT-4. Transformers are trained on vast amounts of textual data, allowing them to accurately predict word sequences and create coherent, plausible responses.
To delve deeper into AI capabilities, the researchers applied a class of tasks called deterministic finite automata (DFA), covering fields such as logic, navigation, chemistry, and gaming strategies. They chose two distinct tasks—guiding a car through New York and playing the game Othello—to test AI’s ability to reconstruct the internal logic of complex systems. As noted by Harvard University postdoc Keyon Vafa, “We needed testbeds where we knew exactly what the model of the world looked like. Now we can think rigorously about what it means to reconstruct that model of the world.”
The study showed that transformers could generate correct routes and moves in Othello when task conditions were straightforward. However, when complexities like New York’s detours were introduced, AI models produced illogical route suggestions, proposing nonexistent overpasses and inaccurate directions, notes NIXsolutions.
MIT’s research underscores fundamental limitations of generative AI models, especially in tasks that require adaptive, flexible thinking. While these models impress with their capacity to generate plausible responses, they remain tools of prediction rather than fully intelligent systems.