MIT News November 5, 2024 A team of researchers in the US (Harvard University, MIT, Cornell University, University of Chicago) used a case where the underlying reality was governed by deterministic finite automaton to test the possibility of large language models implicitly learning world models. They proposed new evaluation metrics for world model recovery inspired by the classic Myhill-Nerode theorem from language theory and illustrated their utility in three domains: game playing, logic puzzles, and navigation. In all domains, the generative models they considered did well on existing diagnostics for assessing world models, but their evaluation metrics revealed their world […]