Why do SFT models memorize patterns instead of learning generalizable reasoning?

This explores why models trained by supervised fine-tuning (SFT) tend to copy the surface form of reasoning rather than acquire reasoning that transfers to new problems — and what the corpus says is actually being learned when that happens.

This explores why supervised fine-tuning seems to produce pattern-copying rather than transferable reasoning. The sharpest answer in the collection is that chain-of-thought reasoning is, mechanically, constrained imitation rather than genuine inference — models learn to reproduce familiar reasoning *schemata* from their training data, not to perform novel logical steps Does chain-of-thought reasoning reveal genuine inference or pattern matching? Why does chain-of-thought reasoning fail in predictable ways? What makes chain-of-thought reasoning actually work?. The tell is how failure happens: when you shift the task, the length, or the format away from what was trained, performance degrades in a predictable, systematic way, and the model keeps producing fluent prose that is logically inconsistent underneath Does chain-of-thought reasoning actually generalize beyond training data?. That distribution-boundedness is the fingerprint of imitation: real reasoning wouldn't care whether the problem was phrased the way the training set phrased it.

A second thread points to *where* the memorization actually lives. One analysis decomposes chain-of-thought errors into local, mid-range, and long-range sources and finds that local memorization — predicting the next token mostly from the immediately preceding tokens — accounts for up to two-thirds of reasoning errors, and gets worse as problems grow more complex or drift from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. In other words, the model leans on short-range pattern completion exactly when it most needs to reason globally. This dovetails with a striking finding from pretraining analysis: factual recall depends on narrow, document-specific memorization, while genuine reasoning generalization rides on *procedural* knowledge spread across many diverse documents Does procedural knowledge drive reasoning more than factual retrieval?. If SFT mostly reinforces narrow target-specific traces, you'd expect it to push the model toward the memorization regime rather than the procedural one.

Here's the part that should reframe the question itself. There's strong evidence the reasoning traces SFT teaches don't even need to be *correct* to work — models trained on deliberately corrupted, semantically irrelevant traces match the accuracy of models trained on clean ones, and sometimes generalize better out of distribution Do reasoning traces need to be semantically correct?. That suggests the trace functions as computational scaffolding — a structural prompt to spend more compute in a certain shape — rather than as meaningful content the model absorbs. So "memorizing patterns instead of reasoning" may be less a bug in SFT and more a description of what the trace was ever doing: supplying form, not inference.

Which raises the deeper twist. Several independent results argue the reasoning capability was largely *already there* in the base model, latent in its activations, and that post-training merely selects or elicits it rather than installing it — RL steering, critique fine-tuning, decoding tweaks, and sparse-feature steering all unlock the same underlying ability Do base models already contain hidden reasoning ability?. On this view, SFT memorizes because it's the wrong tool for the job: it's a strong imitation signal, so it efficiently teaches the *surface* of reasoning while doing little to expand the underlying capability. That's also why approaches that reward exploration or information gain — planting chain-of-thought during pretraining itself Can chain-of-thought reasoning be learned during pretraining itself?, or using reinforcement learning to route between fast answers and extended thinking Can models learn when to think versus respond quickly? — get treated as the antidote: they reward *doing* reasoning rather than reproducing its shape.

If you want the doorway that most changes how you think about this, start with the corrupted-traces result Do reasoning traces need to be semantically correct? and the latent-capability synthesis Do base models already contain hidden reasoning ability?: together they suggest the real question isn't why SFT memorizes, but why we expected imitation of a trace to ever teach reasoning in the first place.

Sources 10 notes

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Why does chain-of-thought reasoning fail in predictable ways?

CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.

What makes chain-of-thought reasoning actually work?

CoT systems reproduce the form of reasoning through pattern matching rather than performing genuine logical inference. This explains why format effects dominate content, why structurally invalid prompts succeed, and why stronger reasoning models become less instruction-compliant.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Why do SFT models memorize patterns instead of learning generalizable reasoning?

Sources 10 notes

Next inquiring lines