MIT News November 5, 2024
A team of researchers in the US (Harvard University, MIT, Cornell University, University of Chicago) used a case where the underlying reality was governed by deterministic finite automaton to test the possibility of large language models implicitly learning world models. They proposed new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory and illustrated their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models they considered did well on existing diagnostics for assessing world models, but their evaluation metrics revealed their world models to be far less coherent than they appeared. According to the researchers such incoherence creates fragility: using a generative model to solve related but subtly different tasks could fail badly. Building generative models that meaningfully capture the underlying logic of the domains they model would be immensely valuable; and that their results suggest new ways to assess how close a given model is to that goal… read more. Open Access TECHNICAL ARTICLE