Can a world model have rich representations without adequate data coverage?
This explores whether a model's internal representations can look rich and detailed while gaps in its training-data coverage leave those representations hollow, fractured, or merely borrowed from better-covered regions.
This explores whether 'rich representation' and 'adequate data coverage' can come apart — whether a model can look like it has deep structure while sitting on thin or skewed data. The corpus suggests they come apart constantly, and that this gap is precisely what standard evaluation hides. The sharpest version of the point: a model can hold every linearly-decodable feature a task needs while its internal organization is fundamentally broken — perfect accuracy riding on top of structure that shatters under perturbation or distribution shift Can models be smart without organized internal structure?. Richness you can read off with a probe is not the same as richness that holds together.
Where coverage is thin, the model doesn't leave a blank — it borrows. Mechanistic analysis shows low-resource cultures like Ethiopia and Algeria get represented internally through high-resource cultural proxies, so the representation is 'rich' only in the sense that it's densely populated with the wrong neighbors. The model produces correct surface answers while the architecture quietly routes the under-covered case through a dominant stand-in Do LLMs represent low-resource cultures through dominant cultural proxies?. That's the failure mode in miniature: coverage gaps get papered over by proxy structure rather than honest uncertainty.
The world-model research extends this from culture to physics. Transformers trained on orbital mechanics or board games reach high predictive accuracy but, when probed, turn out to hold task-specific heuristics rather than a unified model of how the system works — fine-tuning reveals nonsensical, slice-dependent 'laws' that change depending on which corner of the data you poke Do foundation models learn world models or task-specific shortcuts?. Apparent richness was a patchwork of regularities each valid only where the data was dense. A genuine world model is supposed to let you reason about interventions and counterfactuals, not just match observed regularities What makes a world model actually useful for reasoning? — and that demand for simulating actionable possibilities is exactly what coverage gaps can't fake What should a world model actually be designed to do?.
There's a deeper reason the two can't fully decouple. LLM world models are a form of indirect causal grounding: structure extracted secondhand from text produced by causally grounded humans, with gaps in the chain that limit real-time verification and updating Can large language models develop genuine world models without direct environmental contact?. The representation can only be as causally faithful as the coverage of that mediating text. This is why 'theory-free' high accuracy is treated as a trap — a 95%-accurate model can still be systematically wrong wherever its training never reached, and accuracy itself won't tell you Can AI models be truly free from human bias?.
The useful turn here, and the thing you might not have known you wanted: one framework argues a world model isn't one thing but five inseparable design choices — data preparation, latent representation, reasoning architecture, training objective, and decision integration — and that failures get misdiagnosed when you treat them as a single blob What five design choices compose a world model?. Under that lens your question stops being 'can representation outrun data' and becomes 'representation and coverage are different design axes that can misalign.' Rich-but-uncovered is the signature of that misalignment. And one constructive response is to stop pretending the gap isn't there: stochastic latent reasoning lets a model represent a distribution over solutions instead of one confident answer, which is closer to honestly holding the uncertainty that sparse coverage should produce Can stochastic latent reasoning help models explore multiple solutions?.
Sources 9 notes
Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.
Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.
Research shows LLMs may achieve high prediction accuracy through task-specific heuristics without developing coherent generative models of how the world works. True world models must enable reasoning about interventions and counterfactuals, not surface regularities.
Drawing on hypothetical thinking in psychology, world models are most useful when designed to simulate all actionable possibility spaces—physical, embodied, emotional, social, mental, counterfactual, and evolutionary—grounded in agent decision-making rather than passive prediction.
LLMs form structured world representations by extracting regularities from training data produced by causally grounded humans. This constitutes indirect causal grounding mediated through text, though the chain has gaps that limit real-time verification and model updating.
Research shows that 'theory-free' AI models mask bigotry behind high accuracy metrics while committing fundamental statistical errors. A 95% accurate criminal justice system would wrongly convict thousands, demonstrating that model sophistication does not validate causal inference.
World model design comprises five distinct dimensions: data preparation, latent representation, reasoning architecture, training objective, and decision-system integration. Each can misalign with the others, and treating them as a single problem obscures where failures originate and prevents proper evaluation.
GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.