Why do knowledge and reasoning train in different network layers?

This explores why LLMs appear to localize factual recall in their lower layers while reasoning adjustments happen in higher layers — and what that split tells us about how the two abilities are actually built.

This explores why factual knowledge and reasoning seem to live in different parts of an LLM's stack, and what that separation reveals about how each gets trained. The most direct answer in the collection comes from a two-phase view of inference: lower layers do knowledge retrieval, higher layers do reasoning adjustment Why does reasoning training help math but hurt medical tasks?. The practical sting of this is that reasoning training improves math but can quietly degrade knowledge-heavy domains like medicine — you're tuning the upper layers while the facts you need sit lower down, untouched or even disturbed.

But the deeper 'why' is that knowledge and reasoning aren't the same kind of thing to begin with, so they don't get learned the same way. One analysis of five million pretraining documents found that factual recall depends on narrow, document-specific memorization — the model essentially has to have seen the fact — while reasoning draws on broad, transferable *procedural* knowledge spread across many sources Does procedural knowledge drive reasoning more than factual retrieval?. Memorized facts and generalizable procedures have different statistical footprints in the data, so it's unsurprising they end up encoded in different places.

There's a second clue that reasoning is structurally distinct: it appears to already be latent in the base model before any reasoning-specific training. Several independent lines of evidence — RL steering, decoding tweaks, sparse-autoencoder feature steering — all elicit reasoning that was already present in base-model activations, suggesting post-training *selects* reasoning rather than creating it Do base models already contain hidden reasoning ability?. Relatedly, RL post-training seems to teach a model *when* to deploy reasoning rather than *how* to reason, with hybrid models recovering most of the gains just by routing tokens Does RL post-training create reasoning or just deploy it?. If reasoning is a deployment skill layered on top of pre-existing machinery, it makes sense that training touches different (higher) layers than the ones holding stored facts.

That framing also reframes what 'reasoning layers' are even doing. Some of the corpus argues chain-of-thought is closer to computational scaffolding than genuine inference — corrupted reasoning traces train models nearly as well as correct ones Do reasoning traces need to be semantically correct?, and CoT often reproduces the *form* of reasoning through pattern-matching, with format effects dominating content What makes chain-of-thought reasoning actually work?. If the higher layers are running a learned procedural process rather than retrieving content, then keeping them separate from the factual lower layers is a feature, not an accident.

The forward-looking thread worth knowing about: if reasoning is procedural and partly plantable, you don't have to wait for post-training to install it. RLP treats chain-of-thought as an exploratory action *during pretraining*, rewarding information gain, and lifts reasoning benchmarks meaningfully Can chain-of-thought reasoning be learned during pretraining itself? — hinting that the layer-level division of labor between knowing and reasoning is something we can shape early, not just inherit.

Sources 7 notes

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

What makes chain-of-thought reasoning actually work?

CoT systems reproduce the form of reasoning through pattern matching rather than performing genuine logical inference. This explains why format effects dominate content, why structurally invalid prompts succeed, and why stronger reasoning models become less instruction-compliant.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Why do knowledge and reasoning train in different network layers?

Sources 7 notes

Next inquiring lines