LLM Reasoning and Architecture

Does reasoning rely on procedural knowledge or factual memorization?

Explores whether LLMs learn reasoning through general procedural patterns across documents or through memorizing specific facts. Understanding this distinction matters for training data strategy.

Note · 2026-02-22 · sourced from Training Fine Tuning
What kind of thing is an LLM really? How do you build domain expertise into general AI models? How should researchers navigate LLM reasoning research?

The "Procedural Knowledge in Pretraining Drives Reasoning" paper analyzes which pretraining documents most influence LLM reasoning by ranking 5 million documents by their influence on model completions. The finding: the approach to reasoning that models use is unlike retrieval. For reasoning tasks, positively influential documents contain procedural knowledge — descriptions of how to get to a solution — rather than the specific facts needed for the answer.

Three contrasts with factual recall:

  1. Generality: models rely on a broader, more general set of documents when reasoning than when answering factual questions. Factual recall draws on a narrow set of documents containing the target fact. Reasoning draws on a diffuse set of documents performing similar procedures.

  2. Transferability: documents have similar influence on reasoning queries that require applying the same procedure to different numbers. The procedural knowledge transfers across specific instances — it's the method, not the content, that the model has learned.

  3. Reliance distribution: the model needs to see factual information more often (across more documents) to memorize it, while procedural patterns can be learned from fewer but more diverse demonstrations.

This connects to the knowledge/reasoning layer separation. Since Why does reasoning training help math but hurt medical tasks?, the procedural knowledge finding provides the data-level explanation for the architectural finding: lower layers store memorized facts (requiring document-specific exposure), while higher layers encode procedural strategies (learnable from general demonstrations).

The implication for training data curation: reasoning capability benefits more from diverse demonstrations of procedures than from exhaustive factual coverage. Quality and diversity of reasoning demonstrations may matter more than volume for building reasoning capability — consistent with Can models improve themselves on tasks without verifiable answers?.


Source: Training Fine Tuning

Related concepts in this collection

Concept map
14 direct connections · 144 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

procedural knowledge in pretraining documents drives reasoning generalization unlike factual retrieval which requires document-specific memorization