Can pretraining signals unlock latent reasoning that post-training merely activates?

This explores whether the reasoning that post-training (RL, fine-tuning) appears to 'teach' is actually planted during pretraining and merely switched on later — and whether we could plant it earlier and better.

This explores whether the reasoning that post-training (RL, fine-tuning) appears to 'teach' is actually planted during pretraining and merely switched on later. The corpus leans hard toward yes: the bottleneck is elicitation, not acquisition. Multiple independent lines of evidence converge here. Base models already carry latent reasoning that minimal interventions unlock — RL steering, critique fine-tuning, decoding tweaks, and feature steering all elicit reasoning that was already sitting in the activations rather than creating it Do base models already contain hidden reasoning ability?. RLVR in particular seems to sharpen sampling efficiency within existing limits rather than expand them: a single training example can suffice to activate the behavior, and even spurious rewards work nearly as well — as long as the pretraining laid the groundwork What does reward learning actually do to model reasoning?. The sharpest framing of your question is that RL post-training teaches a model *when* to reason, not *how* — reasoning strategies pre-exist as activation vectors, and hybrid models recover 91% of the gains just by routing Does RL post-training create reasoning or just deploy it?.

But the more interesting part of your question is the word 'unlock.' If the capability is already latent, can you skip the RL middleman entirely? Several notes say yes. Steering a single SAE-identified reasoning feature matches or beats chain-of-thought across six model families, with no training at all — the reasoning mode activates early in generation and overrides surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. Modular 'cognitive tools' lifted GPT-4.1 on AIME from 26.7% to 43.3% with zero RL, just by isolating reasoning operations into structured calls Can modular cognitive tools unlock reasoning without training?. Even reasoning *verbosity* turns out to be a single linear direction you can steer without retraining Can we steer reasoning toward brevity without retraining?. These all point the same way: post-training is one of many keys to a lock that pretraining built.

So what actually makes pretraining a good substrate for reasoning? The most concrete answer: *procedural* knowledge, not facts. Analysis of 5 million pretraining documents found that reasoning generalizes from broad, transferable procedures drawn from diverse sources, while factual recall depends on narrow memorization of specific documents Does procedural knowledge drive reasoning more than factual retrieval?. That suggests the latent reasoning you later unlock is literally the accumulated 'how-to' diffused across the corpus. And it has an architectural home — knowledge sits in lower network layers, reasoning adjustments in higher ones, which is why reasoning training helps math but can damage knowledge-heavy domains like medicine Why does reasoning training help math but hurt medical tasks?.

If reasoning is procedural and plantable, why wait for post-training at all? RLP treats chain-of-thought as an exploratory action *during* pretraining, rewarding it by how much it improves next-token prediction — no verifier needed — and lifts reasoning roughly 19% Can chain-of-thought reasoning be learned during pretraining itself?. This is your question turned into a method: stop activating reasoning afterward, start growing it from the beginning. Related training-time tricks like jointly training backward reasoning to strengthen forward reasoning suggest the capability can be deepened, not just switched on Can backward reasoning during training improve forward reasoning?.

The one honest crack in the 'merely activates' story is worth knowing: it's domain-conditional. For standard reasoning, RL activates latent ability — but for complex multi-step planning, RL appears to generate genuinely novel strategies the base model can't reach even with heavy sampling Does reinforcement learning create new reasoning abilities or activate existing ones?. And training does more than gate: vanilla models use extended 'thinking' counterproductively, spiraling into self-doubt, and RL actively rewires that same mechanism into productive analysis Does extended thinking help or hurt model reasoning?. So the cleanest synthesis: pretraining builds the reasoning substrate and post-training mostly elicits it — but at the frontier of hard planning, and in fixing *how* thinking is used, post-training still does real creative work.

Sources 12 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

What does reward learning actually do to model reasoning?

Research shows RLVR improves sampling efficiency within existing capability boundaries without expanding them. A single training example suffices for activation, and spurious rewards work nearly as well as correct ones for models with appropriate pretraining.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Why does reasoning training help math but hurt medical tasks?

Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.

Can chain-of-thought reasoning be learned during pretraining itself?

RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.

Can backward reasoning during training improve forward reasoning?

Training models simultaneously on forward reasoning, backward question generation, and backward reasoning improves forward-only performance by 13.53% average across 12 datasets. The mechanism: generating backward questions forces models to understand the inverse relationship between problem and solution, deepening understanding that transfers to forward reasoning without test-time overhead.

Does reinforcement learning create new reasoning abilities or activate existing ones?

For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about reasoning in LLMs. The question remains open: does pretraining embed latent reasoning that post-training merely activates, or does post-training create genuinely novel reasoning strategies?

What a curated library found — and when (dated claims, not current truth):
Library findings span 2024–26; treat them as evidence from their publication date, not current truth:
• Base models carry latent reasoning activated by minimal interventions (RL, steering, decoding tweaks); RLVR sharpens sampling efficiency within existing limits rather than expanding them (2024–25).
• Feature steering of a single SAE-identified reasoning direction matches chain-of-thought without training; modular 'cognitive tools' lifted GPT-4.1 AIME from 26.7% → 43.3% with zero RL (2025).
• Procedural knowledge (broad transferable 'how-to' operations) in pretraining drives reasoning generalization; reasoning adjustments sit in higher network layers, knowledge in lower ones (2024–25).
• RLP: treating chain-of-thought as exploratory pretraining action (rewarded by next-token prediction gain) lifts reasoning ~19% without verifiers (2025).
• Domain-conditional exception: complex multi-step planning appears to require RL-generated *novel* strategies beyond base-model reach; RL actively rewires counterproductive 'thinking' spirals into productive analysis (2025–26).

Anchor papers (verify; mind their dates):
• arXiv:2411.12580 (Nov 2024): Procedural knowledge substrate
• arXiv:2506.12115 (Jun 2025): Cognitive tools and zero-RL reasoning lift
• arXiv:2510.01265 (Sep 2025): RLP pretraining-time reasoning growth
• arXiv:2601.08058 (Jan 2026): Latent computational modes in LLMs

Your task:
(1) RE-TEST the 'mostly elicits' claim. For each finding above, judge whether newer models (GPT-4.5+, o3, etc.), scaled SAE interpretability, or advances in mechanistic steering since Jan 2026 have relaxed the constraint that post-training is necessary for reasoning. Separately: has the domain-conditional exception (planning requiring novelty) held or eroded? Cite what evidence resolved or upheld each constraint.
(2) Surface the strongest work from the last ~6 months that *contradicts* the 'latent reasoning' framing—either showing post-training creates truly novel reasoning strategies, or showing pretraining signals are more malleable/destructible than this library suggests.
(3) Propose 2 research questions that assume the regime may have shifted: (a) If reasoning is ~fully latent and elicitable, what is the *minimal* intervention to activate complex reasoning in a new domain? (b) Can steering alone replace post-training-loop RL for planning tasks, or does planning reasoning require gradient-based adaptation?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can pretraining signals unlock latent reasoning that post-training merely activates?

Sources 12 notes

Next inquiring lines