What separates knowledge from reasoning in neural network layers?

This explores whether 'knowing facts' and 'reasoning over them' live in physically different parts of a network — and what the corpus says actually separates them.

This explores whether knowledge and reasoning occupy distinct regions of a neural network — and the most direct answer in the corpus is geographic. A two-phase inference model finds that knowledge retrieval happens in the *lower* layers while reasoning adjustment happens in the *higher* layers Why does reasoning training help math but hurt medical tasks?. That split isn't just tidy — it has a cost. It's why training a model harder on reasoning can sharpen math performance while quietly degrading knowledge-heavy domains like medicine: you're tuning the upper floors and disturbing the foundation.

But depth isn't the only axis of separation. The two also differ in *where they come from* during training. One analysis of five million pretraining documents shows reasoning draws on broad, transferable procedural knowledge spread across many sources, while factual recall depends on narrow, document-specific memorization of the exact target fact Does procedural knowledge drive reasoning more than factual retrieval?. So 'knowledge' is a lookup against something memorized; 'reasoning' is a procedure generalized from many examples. Separate layers, separate learning signals.

What's stranger is that the reasoning machinery seems to be *already built* and largely just waiting to be switched on. Multiple independent methods — RL steering, decoding tweaks, feature steering — all elicit reasoning that's already latent in base-model activations rather than installing anything new Do base models already contain hidden reasoning ability?. The provocative follow-on is that RL post-training mostly teaches a model *when* to deploy reasoning, not *how* to reason; hybrid models recover 91% of the gains just by routing tokens Does RL post-training create reasoning or just deploy it?. If that's right, the knowledge/reasoning boundary is less about acquiring two different things and more about a deployment layer sitting on top of pre-existing capability.

You can even watch the separation happen token by token. Logit-lens analysis catches transformers computing a correct answer in layers 1–3 and then actively overwriting it in the final layers to emit format-compliant filler Do transformers hide reasoning before producing filler tokens?, and a 'deep-thinking ratio' tracks genuine reasoning by measuring how much predictions get revised across layers — a signal that correlates with accuracy Can we measure how deeply a model actually reasons?. Both treat layers as a timeline where retrieval and revision are distinguishable events, not a uniform smear.

The doorway worth walking through: this whole separation may be modular rather than fuzzy. Pruning experiments show networks naturally decompose compositional tasks into isolated subnetworks, with pretraining making that modularity more reliable Do neural networks naturally learn modular compositional structure? — yet the 'imposter intelligence' work warns the internal structure can be fractured and entangled even when outputs look perfect, and standard benchmarks can't tell the difference Can AI pass every test while understanding nothing?. So 'what separates knowledge from reasoning' has a clean textbook answer (lower vs. higher layers) and an unsettling research-frontier answer (we can't always trust that the separation we measure reflects what the network is actually doing).

Sources 8 notes

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can we measure how deeply a model actually reasons?

Deep-thinking ratio (DTR) measures the proportion of tokens whose predictions undergo significant revision across model layers, correlating robustly with accuracy across AIME, HMMT, and GPQA benchmarks. Think@n, a test-time strategy using DTR, matches self-consistency performance while reducing inference costs.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Can AI pass every test while understanding nothing?

The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.

What separates knowledge from reasoning in neural network layers?

Sources 8 notes

Next inquiring lines