Can pretraining signals unlock latent reasoning that post-training merely activates?
This explores whether the reasoning that post-training (RL, fine-tuning) appears to 'teach' is actually planted during pretraining and merely switched on later — and whether we could plant it earlier and better.
This explores whether the reasoning that post-training (RL, fine-tuning) appears to 'teach' is actually planted during pretraining and merely switched on later. The corpus leans hard toward yes: the bottleneck is elicitation, not acquisition. Multiple independent lines of evidence converge here. Base models already carry latent reasoning that minimal interventions unlock — RL steering, critique fine-tuning, decoding tweaks, and feature steering all elicit reasoning that was already sitting in the activations rather than creating it Do base models already contain hidden reasoning ability?. RLVR in particular seems to sharpen sampling efficiency within existing limits rather than expand them: a single training example can suffice to activate the behavior, and even spurious rewards work nearly as well — as long as the pretraining laid the groundwork What does reward learning actually do to model reasoning?. The sharpest framing of your question is that RL post-training teaches a model *when* to reason, not *how* — reasoning strategies pre-exist as activation vectors, and hybrid models recover 91% of the gains just by routing Does RL post-training create reasoning or just deploy it?.
But the more interesting part of your question is the word 'unlock.' If the capability is already latent, can you skip the RL middleman entirely? Several notes say yes. Steering a single SAE-identified reasoning feature matches or beats chain-of-thought across six model families, with no training at all — the reasoning mode activates early in generation and overrides surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. Modular 'cognitive tools' lifted GPT-4.1 on AIME from 26.7% to 43.3% with zero RL, just by isolating reasoning operations into structured calls Can modular cognitive tools unlock reasoning without training?. Even reasoning *verbosity* turns out to be a single linear direction you can steer without retraining Can we steer reasoning toward brevity without retraining?. These all point the same way: post-training is one of many keys to a lock that pretraining built.
So what actually makes pretraining a good substrate for reasoning? The most concrete answer: *procedural* knowledge, not facts. Analysis of 5 million pretraining documents found that reasoning generalizes from broad, transferable procedures drawn from diverse sources, while factual recall depends on narrow memorization of specific documents Does procedural knowledge drive reasoning more than factual retrieval?. That suggests the latent reasoning you later unlock is literally the accumulated 'how-to' diffused across the corpus. And it has an architectural home — knowledge sits in lower network layers, reasoning adjustments in higher ones, which is why reasoning training helps math but can damage knowledge-heavy domains like medicine Why does reasoning training help math but hurt medical tasks?.
If reasoning is procedural and plantable, why wait for post-training at all? RLP treats chain-of-thought as an exploratory action *during* pretraining, rewarding it by how much it improves next-token prediction — no verifier needed — and lifts reasoning roughly 19% Can chain-of-thought reasoning be learned during pretraining itself?. This is your question turned into a method: stop activating reasoning afterward, start growing it from the beginning. Related training-time tricks like jointly training backward reasoning to strengthen forward reasoning suggest the capability can be deepened, not just switched on Can backward reasoning during training improve forward reasoning?.
The one honest crack in the 'merely activates' story is worth knowing: it's domain-conditional. For standard reasoning, RL activates latent ability — but for complex multi-step planning, RL appears to generate genuinely novel strategies the base model can't reach even with heavy sampling Does reinforcement learning create new reasoning abilities or activate existing ones?. And training does more than gate: vanilla models use extended 'thinking' counterproductively, spiraling into self-doubt, and RL actively rewires that same mechanism into productive analysis Does extended thinking help or hurt model reasoning?. So the cleanest synthesis: pretraining builds the reasoning substrate and post-training mostly elicits it — but at the frontier of hard planning, and in fixing *how* thinking is used, post-training still does real creative work.
Sources 12 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Research shows RLVR improves sampling efficiency within existing capability boundaries without expanding them. A single training example suffices for activation, and spurious rewards work nearly as well as correct ones for models with appropriate pretraining.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.
RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.
Training models simultaneously on forward reasoning, backward question generation, and backward reasoning improves forward-only performance by 13.53% average across 12 datasets. The mechanism: generating backward questions forces models to understand the inverse relationship between problem and solution, deepening understanding that transfers to forward reasoning without test-time overhead.
For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.
Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.