What should a world model actually be designed to do?
Current AI research treats world models as either video predictors or RL dynamics learners, but what if their real purpose is simulating actionable possibilities for decision-making rather than predicting next observations?
What a world model is supposed to do has been contested in recent AI research. Some schools treat it as a video predictor — a system that generates the next frame given current observations. Others treat it as a learned latent dynamics model used for model-based RL. The Critiques of World Models essay argues both formulations miss what makes a world model useful: the capacity for hypothetical thinking, the capacity to simulate alternatives that did not happen and may not happen.
The argument draws on hypothetical thinking in psychology and on the Sci-Fi imagination from Dune (where the Bene Gesserit and the Mentat exemplify different modes of inner simulation). The proposed primary goal: a world model should simulate all actionable possibilities of the real world for purposeful reasoning and acting. The keyword is actionable — possibilities that an agent could choose between, not all metaphysical possibilities. This grounds the WM in decision-making rather than in passive prediction.
The simulation scope is broader than typical proposals. A general-purpose WM must simulate physical dynamics (how water pours, how machines operate), embodied experiences (balance, posture, motor sequences), emotional states (affective responses for therapy or social interactions), social situations (other agents' internal states and intentions), the mental world (logistics, tactics, strategies in adversarial settings), the counterfactual world (what-if scenarios for decisions under uncertainty), and the evolutionary world (generational dynamics like adaptation and inheritance). Each of these is a domain of actionable possibility; a WM that cannot simulate any of them cannot support reasoning in that domain.
This breadth implies that a single uniform representation will not suffice — the architectural answer the essay proposes is hierarchical, multi-level, mixed continuous/discrete representations within a generative self-supervised framework. The Physical, Agentic, and Nested (PAN) AGI concept is the architectural commitment: world models should be nested (worlds within worlds, simulations of agents that themselves simulate), agentic (a WM is for an agent, not a passive predictor), and physical (grounded in dynamics rather than only in symbol manipulation) — a commitment that aligns with the five-aspect WM decomposition by privileging aspect 5 (decision-integration) and aspect 3 (architecture for compositional/counterfactual operations).
Source: World Models
Related concepts in this collection
-
What five design choices compose a world model?
World models are often presented as monolithic systems, but they actually involve five distinct design decisions—data preparation, representation, reasoning architecture, training objective, and decision integration—that can each fail independently. Understanding this decomposition helps diagnose why world model proposals fall short.
extends: companion piece — this note picks the goal; the five-aspect note shows what design choices that goal forces
-
Do LLMs actually have world models or just facts?
The term 'world model' conflates two different capabilities: factual representation versus mechanistic understanding. Understanding which one LLMs actually possess matters for assessing their reasoning reliability.
complements: same disambiguation move — actionable-possibility simulation is exactly the mechanism reading, not the fact reading
-
Can language models simulate belief change in people?
Current LLM social simulators treat behavior as input-output mappings without modeling internal belief formation or revision. Can they be redesigned to actually track how people think and change their minds?
extends: applies the simulate-possibility goal to the social domain — behaviorist LLMs predict observations; cognitivist agents simulate alternative beliefs
-
Can non-reasoning models catch up with more compute?
Explores whether inference-time compute budget can close the performance gap between standard models and those trained for reasoning, and what training mechanisms might enable this.
exemplifies: imitation-learning ceiling argument is a special case — without a WM that simulates possibility space, the model cannot exceed observed-trajectory quality
-
What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
tension: PAN's "physical" commitment aims at embodied grounding but symbolic-on-pixels generative models may not satisfy enactivist embodiment
-
Can computation exist without a conscious mapmaker?
Explores whether algorithmic processes can generate the semantic interpretation and symbol selection they require, or whether conscious agents must precede all computation.
tension: the agentic-WM commitment presupposes the very experiencing agent computation cannot generate
-
Does behavioral speech output prove communicative subjecthood?
Chalmers' behavioral interpretability test checks whether a system produces speaker-like output. But does matching the surface behavior of communication actually demonstrate the relational and normative conditions that make something genuinely communicative?
complements: same surface-vs-structure move — behavioral output passes plausibility tests without simulating possibility space
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
the primary goal of a world model is to simulate all actionable possibilities of the real world for purposeful reasoning — not to predict the next observation