Can world models form from aggregated partial information across training distributions?
This explores whether a coherent model of how the world works can be assembled out of scattered, incomplete signals a model absorbs across its training data — rather than learned from direct experience.
This explores whether a coherent model of how the world works can be assembled out of scattered, incomplete signals a model absorbs across its training data — rather than learned from direct experience. The corpus splits sharply on this, and the disagreement is the interesting part. One line of work says yes, partly: LLMs can extract structured world representations from text produced by causally-grounded humans, so the model inherits an indirect, secondhand version of how the world hangs together — what one note calls 'indirect causal grounding' Can large language models develop genuine world models without direct environmental contact?. The world model here really is aggregated partial information: regularities pooled from millions of human-written fragments, none of which alone contains the whole picture.
But the skeptical line argues the aggregation often produces something that *looks* like a world model without being one. Foundation models trained on orbital mechanics or games tend to learn task-specific heuristics — predictive shortcuts that score well — rather than a unified generative structure, and probing reveals the 'laws' they've absorbed are nonsensical and change depending on which slice of data you test Do foundation models learn world models or task-specific shortcuts?. The deeper point is the standard a real world model has to meet: it must let you simulate interventions and counterfactuals, not just predict the next observation What makes a world model actually useful for reasoning?. Aggregating partial information gets you good prediction cheaply; it does not automatically get you a model you can reason *with*.
What bridges these views is the question of how the pieces are stored. Neural networks don't blend everything into mush — they tend to decompose compositional tasks into isolated, modular subnetworks, and pretraining makes that modular structure more consistent and reliable Do neural networks naturally learn modular compositional structure?. That's a mechanism for partial information to accumulate into reusable parts rather than collapse together, which is closer to what 'forming a world model from fragments' would actually require.
The catch is what happens *across* distributions, where your question really lives. Training doesn't treat every distribution evenhandedly: RL post-training tends to converge on a single dominant pretraining format and suppress the alternatives, often within the first epoch — and the winner is picked by model scale, not by which format is best Does RL training collapse format diversity in pretrained models?. Training order compounds this; the sequence in which domains are presented mechanically reshapes what survives, with structured domains driving entropy down and creative ones pushing it up Does training order reshape how models handle different task types?. So aggregation is real, but it's lossy and biased: a world model can form from pooled partial information, yet the pooling process quietly privileges some sources and erases others — which may be exactly why the resulting models so often reason like heuristic-stitchers rather than simulators.
Sources 6 notes
LLMs form structured world representations by extracting regularities from training data produced by causally grounded humans. This constitutes indirect causal grounding mediated through text, though the chain has gaps that limit real-time verification and model updating.
Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.
Research shows LLMs may achieve high prediction accuracy through task-specific heuristics without developing coherent generative models of how the world works. True world models must enable reasoning about interventions and counterfactuals, not surface regularities.
Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.