Why do foundation models develop task-specific heuristics instead of causal understanding?
This explores why foundation models tend to learn narrow, task-specific shortcuts (heuristics that work for the trained task) instead of building a unified, causal model of how the world actually works.
This explores why foundation models tend to learn narrow, task-specific shortcuts instead of a unified causal model of the world — and the corpus suggests the answer is less about a flaw and more about what the training signal actually rewards. The clearest direct evidence: when transformers are trained on things like orbital mechanics or board games, probes show they pick up predictive patterns that win on the task without ever assembling the underlying laws. Fine-tune the same model on a slice of the problem and it produces nonsensical, slice-dependent 'laws'; look inside its circuits and arithmetic turns out to run on range-matching heuristics, not anything resembling an algorithm Do foundation models learn world models or task-specific shortcuts?. The model is optimizing prediction, and prediction can be satisfied by a bag of shortcuts that happens to cover the training distribution.
The deeper 'why' comes into focus when you look at what models latch onto in their training data. Causal structure gets learned well when it's explicit and frequent — LLMs handle causal relations far better than temporal ordering precisely because causal connectives are stated outright in text, while temporal order is usually left implicit and must be inferred Why do LLMs handle causal reasoning better than temporal reasoning?. So a model doesn't build causal understanding from scratch; it absorbs the causal patterns that were spelled out for it and shortcuts the rest. What generalizes, meanwhile, isn't fact retrieval but procedure: analysis of millions of pretraining documents shows reasoning draws on broad, transferable procedural knowledge, whereas factual recall depends on narrow, document-specific memorization Does procedural knowledge drive reasoning more than factual retrieval?. Heuristics are what you get when the signal favors pattern-matching over procedure.
There's a worthwhile twist here: the heuristics may be a symptom of how capability gets *elicited*, not whether it exists. Several lines of work find that base models already carry latent reasoning ability that minimal training merely selects and surfaces — post-training picks reasoning out of existing activations rather than creating it Do base models already contain hidden reasoning ability?. And training quality, not just quantity, decides whether that latent capacity becomes genuine analysis or performative filler — RL training can flip the same 'thinking' mechanism from counterproductive self-doubt into productive gap analysis Does extended thinking help or hurt model reasoning?. Fine-tuning can even pull in the wrong direction: it makes chain-of-thought less faithful, so the stated reasoning steps stop actually driving the answer and become decorative Does fine-tuning disconnect reasoning steps from final answers?.
The behavioral signature of 'heuristics over understanding' also shows up at solve-time. Reasoning models don't fail mainly from lack of compute — they wander into invalid territory and abandon promising paths too early, lacking the validity, effectiveness, and necessity that systematic search needs, which makes success drop exponentially as problems get deeper Why do reasoning LLMs fail at deeper problem solving?, Why do reasoning models abandon promising solution paths?. That's what a heuristic library looks like under pressure: fine on shallow, familiar cases, brittle the moment a problem demands an actual model of the domain.
What's genuinely interesting is that the corpus points to fixes that don't require waiting for emergent causal understanding. Training abstractions alongside solutions forces structured, breadth-first exploration that heuristic-driven depth-only chains lack Can abstractions guide exploration better than depth alone?; making latent reasoning stochastic lets a model hold uncertainty and explore multiple strategies instead of committing to one shortcut Can stochastic latent reasoning help models explore multiple solutions?; and treating chain-of-thought as a rewarded action *during pretraining* plants reasoning earlier rather than bolting it on afterward Can chain-of-thought reasoning be learned during pretraining itself?. One caveat worth carrying away: even a perfect causal model wouldn't be the whole story — causal networks can't represent the associative, analogical, and emotion-driven moves that real reasoning also uses Can causal models alone capture how humans actually reason?. So the heuristics aren't only a bug to be trained away; part of intelligence may genuinely live outside the causal frame.
Sources 12 notes
Inductive bias probes show transformers trained on orbital mechanics and games learn predictive patterns, not unified world structure. Fine-tuning reveals nonsensical, slice-dependent laws; circuit analysis shows arithmetic relies on range-matching heuristics, not algorithms.
ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.
Three faithfulness tests show fine-tuned models generate reasoning chains that less reliably influence final outputs. Early termination, paraphrasing, and filler substitution all produce invariant answers more often after fine-tuning, suggesting reasoning becomes performative rather than functional.
Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.
Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.
RLAD jointly trains abstraction and solution generators, showing that allocating test-time compute to diverse abstractions outperforms parallel solution sampling at large budgets. Abstractions create structured breadth-first exploration that prevents the underthinking failure mode of depth-only reasoning chains.
GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.
RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.
Causal belief networks excel at modeling causal reasoning but cannot represent associative links, analogical mappings, or emotion-driven belief shifts. The GenMinds framework itself acknowledges this as a tractable starting point rather than a complete theory.