LLM Reasoning and Architecture

Do foundation models learn world models or task-specific shortcuts?

When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?

Note · 2026-02-21 · sourced from Philosophy Subjectivity
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The inductive bias probe paper distinguishes what foundation models learn to predict from what they learn to be. A transformer trained on planetary orbital mechanics can predict trajectories across solar systems it has never seen. But when fine-tuned to predict force vectors — a cornerstone of Newtonian mechanics — it produces nonsensical laws of gravitation, different laws depending on which slice of data it is applied to.

The test is precise: a world model (Newtonian mechanics) has a specific inductive bias. If the model has internalized that world model, fine-tuning on a small dataset should leverage it — the model should extrapolate using Newtonian state. The probe reveals it does not. The inductive bias is not toward Newtonian mechanics; it is toward task-specific heuristics that work locally but do not generalize as a unified world model would.

The pattern holds across domains: Othello game positions, lattice models, orbital mechanics. In each case, models learn to predict legal next states without developing inductive bias toward the underlying state structure. They appear to work on prediction tasks because they recover "coarsened state representations or non-parsimonious representations" — compact shortcuts that are not the world model.

The no-free-lunch theorem grounds this. Every learning algorithm has an inductive bias — the functions it tends to learn when extrapolating from limited data. A world model is a restriction on possible functions; a learning algorithm with that world model should extrapolate within it. Sequence prediction does not impose this restriction. The model finds other functions that fit the training distribution without committing to the world model's structure.

"Reasoning or Reciting?" provides systematic evidence from a different angle. By constructing counterfactual variants of 11 standard tasks — variants that deviate from default assumptions — the paper shows that LLMs exhibit nontrivial performance on counterfactual versions but consistently degrade compared to default conditions. The degradation is not task-specific: it appears across all 11 tasks, suggesting a general reliance on narrow, non-transferable procedures rather than abstract reasoning. This is the behavioral signature of task-specific heuristics: they work on default (training-distribution-aligned) cases but fail when the task is logically equivalent but distributionally shifted.

Circuit-level mechanistic evidence: "Arithmetic Without Algorithms" (2410.21272) provides the most granular evidence yet for the heuristics claim. Using causal analysis to identify the arithmetic circuit in LLMs, the authors discover a sparse set of important neurons that implement simple heuristics — each neuron activates when an operand falls within a certain numerical range and outputs corresponding answers. The unordered combination of these heuristic types explains most of the model's arithmetic accuracy. The model is not running an addition algorithm. It is combining pattern-matching rules — a bag of heuristics that produces correct answers for common cases without any generalizable procedure.

This creates an apparent tension with Can large language models develop genuine world models without direct environmental contact? — that note claims text training does extract world structure. The resolution may be level of analysis: coarse semantic regularities (the note) vs. precise generative-mechanistic structure (the probe). Or it may be a genuine tension requiring empirical resolution.

The familiar vs novel dimension. François Chollet and Subbarao Kambhampati's exchange clarifies the boundary: it's not complexity per se but familiarity at the instance level that determines whether heuristics suffice. LRMs can handle arbitrarily complex tasks as long as they've been covered during training — but show an unfamiliar task, even a simple one requiring just a handful of reasoning steps, and they fail. Scaling up problem variables is a "roundabout way to generate novelty" — the complexity increase forces the model into unfamiliar territory where heuristics break. Kambhampati's rejoinder sharpens this: "we showed that LRMs do indeed lose accuracy as the size of familiar instances grow — they don't learn algorithms." Both agree transformers fit instance-based patterns, not generalizable algorithms. The delineation matters for evaluation: testing on familiar problem types at increasing scale conflates two effects (novel instances vs. algorithmic generalization).

Compositional tasks provide the clearest evidence. "Faith and Fate" (Dziri et al., 2023) shows that on multi-digit multiplication, logic grid puzzles, and dynamic programming problems, transformers solve compositional tasks by reducing multi-step reasoning to linearized subgraph matching. When test problems share computation subgraphs with training data, models succeed; when the composition is novel, they fail. Training yields near-perfect performance at low complexity but "fails drastically" outside the training distribution. Error propagation in early stages compounds to prevent correct solutions at high complexity. Since Do transformers actually learn systematic compositional reasoning?, the heuristic IS subgraph matching — and it works well enough within distribution to create the illusion of systematic reasoning. Source: Arxiv/Evaluations.


Source: Philosophy Subjectivity; enriched from LLM Architecture, Flaws

Related concepts in this collection

Concept map
26 direct connections · 200 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

foundation models develop task-specific heuristics rather than world models even when sequence prediction accuracy is high