Critiques of World Models
World Model, the supposed algorithmic surrogate of the real-world environment which biological agents experience with and act upon, has been an emerging topic in recent years because of the rising needs to develop virtual agents with artificial (general) intelligence. There has been much debate on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of “hypothetical thinking” in psychology literature, we offer critiques of several schools of thoughts on world modeling, and argue the primary goal of a world model to be simulating all actionable possibilities of the real world for purposeful reasoning and acting. Building on the critiques, we propose a new architecture for a general-purpose world model, based on hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.
How should we create such a general WM? The key desiderata for building and training a WM include the following 5 aspects: identifying and preparing training data with the desired world information; adopting a general representation space for the latent world state with possibly richer meaning than the observation data in plain sight; designing an architecture that allows effective reasoning over the representations; choosing an objective that properly guides the model training; determining how to use the world model in a decision-making system. Recent years have seen a surge in efforts toward a world model. In this paper, we provide both empirical and technical critiques of several such efforts, including some very vocal schools of thoughts on WM where systematic proposals on the five aforementioned aspects of WM were offered.
A general-purpose WM enables simulation of diverse possibilities across a wide range of domains, enabling agents to reason about outcomes without direct interaction with the environment. This includes, but is not limited to the following examples:
• Physical dynamics: Mechanics of the real world, such as how water pours, how an object moves when thrown, or how a machine operates under varying conditions.
• Embodied experiences: Internal bodily states (e.g., balance, posture), sensations (e.g., heat, pain, dizziness), and complex motor activities like getting dressed or tying shoes.
• Emotional states: Affective responses such as happiness, sadness, or fear, which can facilitate planning in emotionally charged contexts (e.g., therapy or social interactions).
• Social situations: The actions and internal states of other individuals, including their embodied or emotional experiences, needs, intentions, and expectations.
• Mental world: Abstract “thought processes” such as logistics, tactics, and strategies, potentially in multi-agent or adversarial settings.
• Counterfactual world: Alternative realities or “what if” scenarios to guide better decision making under uncertainty or incomplete information.
• Evolutionary world: Generational dynamics such as genetic inheritance, adaptation, and survival of organisms.