Why do models learn reasoning form instead of actual abstract inference?

This explores why models seem to learn the *shape* of reasoning — reproducing familiar step-by-step patterns — rather than performing genuine symbolic or abstract inference, and what the corpus says is actually happening under the hood.

This explores why models seem to learn the *shape* of reasoning rather than the substance — and the corpus offers a surprisingly coherent answer: chain-of-thought largely works by reproducing reasoning *form* it has already seen, not by inventing new inference. The clearest statement is that CoT is constrained imitation — models reproduce familiar reasoning schemata from training and degrade predictably under distribution shifts, the telltale signature of imitation rather than emergent capability Does chain-of-thought reasoning reveal genuine inference or pattern matching?. The deeper reason is that LLMs reason *semantically*, not symbolically: when you strip the familiar semantic content out of a task and leave only the logical structure, performance collapses even when the correct rules are sitting right there in the context Do large language models reason symbolically or semantically?. The models are leaning on token associations and commonsense from training, not manipulating abstract symbols.

The most striking evidence that form is doing the work comes from corrupting the reasoning itself. Models trained on systematically *wrong* or irrelevant traces perform about as well as those trained on correct ones — and sometimes generalize better Do reasoning traces need to be semantically correct?. If garbage steps teach as well as valid ones, the steps aren't where the reasoning lives; they're computational scaffolding. The same picture appears when you treat traces as explanations: invalid logical steps perform nearly as well as valid ones, so the visible trace is persuasive *appearance* rather than a faithful record of the computation Do reasoning traces show how models actually think?.

This raises an obvious puzzle — if the words don't carry the reasoning, where does it happen? Mechanistic work suggests the real computation is somewhere other than the tokens you read. Transformers can compute the correct answer in their earliest layers and then actively *overwrite* it to emit format-compliant filler Do transformers hide reasoning before producing filler tokens?, and models can scale test-time reasoning entirely in continuous latent space with no verbalized steps at all Can models reason without generating visible thinking tokens?. In other words, the verbal trace may be a learned *output format* layered on top of computation that doesn't need it. That reframes "learning reasoning form" not as a bug but as the model optimizing for the surface it was rewarded on.

But here's the turn that makes this more than a debunking. Several lines suggest the abstract capability *is* present — it's just not what the form-imitation is training. Base models already contain latent reasoning that minimal interventions can elicit, implying post-training *selects* reasoning rather than creating it Do base models already contain hidden reasoning ability?. And what generalizes seems to be *procedural* knowledge absorbed broadly across pretraining documents, not memorized facts or memorized templates Does procedural knowledge drive reasoning more than factual retrieval?. So the gap may be an *elicitation* failure: models default to mimicking reasoning form because that's the cheap, well-rewarded path, while genuine procedural inference sits underused.

If that diagnosis holds, the fix is to train the *act* of inference rather than its appearance. Treating CoT as an exploratory action rewarded by information gain during pretraining lifts reasoning measurably Can chain-of-thought reasoning be learned during pretraining itself?, and forcing models to generate and explore diverse *abstractions* — breadth-first — beats simply sampling more depth-only chains that fall into shallow pattern-matching Can abstractions guide exploration better than depth alone?. The thread running through all of it: models learn form because form is what gets rewarded and what's cheapest to imitate — the abstract inference is latent, and the open problem is making training reach for it instead of its costume.

Sources 10 notes

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Can abstractions guide exploration better than depth alone?

RLAD jointly trains abstraction and solution generators, showing that allocating test-time compute to diverse abstractions outperforms parallel solution sampling at large budgets. Abstractions create structured breadth-first exploration that prevents the underthinking failure mode of depth-only reasoning chains.

Why do models learn reasoning form instead of actual abstract inference?

Sources 10 notes

Next inquiring lines