Does latent reasoning capability exist in base models before any training?
This explores whether reasoning ability is already sitting inside a base model's weights before any RL or instruction tuning — and what 'already there' actually means once you look at how the corpus probes it.
This explores whether reasoning ability is already present in a base model before any training, and the corpus answers with a fairly strong yes — but with an important twist about what kind of reasoning is sitting there. The clearest line of evidence is that five independent techniques — RL steering, critique fine-tuning, decoding tricks, sparse-autoencoder feature steering, and RLVR — all manage to draw out reasoning that's already detectable in base-model activations Do base models already contain hidden reasoning ability?. The takeaway is that post-training mostly *selects* a capability rather than *creating* it; the bottleneck is elicitation, not acquisition. A companion finding sharpens this into a slogan: RL teaches a model *when* to reason, not *how*. Hybrid setups recover ~91% of the gains just by routing which tokens get the reasoning treatment, and the activation directions for reasoning strategies exist before any RL touches the model Does RL post-training create reasoning or just deploy it?.
If reasoning is latent, you'd expect you could surface it without training at all — and you can. Modular 'cognitive tools' implemented as sandboxed model calls lifted GPT-4.1 on a hard math benchmark from 26.7% to 43.3% with zero RL, just by enforcing isolation between reasoning operations that plain prompting can't guarantee Can modular cognitive tools unlock reasoning without training?. In the same spirit, latent-space reasoning architectures scale test-time compute by iterating on hidden states rather than emitting visible thinking tokens, which suggests the verbalized chain-of-thought we see is partly a training artifact layered on top of computation that doesn't need words Can models reason without generating visible thinking tokens?.
Where does the latent capability come from in the first place? Pretraining itself plants it. Quiet-STaR shows reasoning competence can emerge as a side effect of better next-token prediction on arbitrary internet text — no task-specific reasoning dataset required Can models learn reasoning from predicting any text?. And an analysis of five million pretraining documents found that reasoning leans on broad, transferable *procedural* knowledge spread across many sources, distinct from factual recall which depends on narrow memorization of specific documents Does procedural knowledge drive reasoning more than factual retrieval?. So the raw material for reasoning is being accumulated during pretraining, which is why minimal post-training can later 'unlock' it.
But here's the part you didn't know you wanted to know: several notes push back on how much of this 'latent reasoning' is genuine inference versus learned imitation. When semantic content is stripped from a task, LLM performance collapses even with the correct rules handed to them in context — they're reasoning by semantic association, not symbolic logic Do large language models reason symbolically or semantically?. Chain-of-thought turns out to be constrained reproduction of familiar reasoning *forms* from training, and it degrades predictably the moment you shift task, length, or format — the signature of imitation, not a portable capability Does chain-of-thought reasoning reveal genuine inference or pattern matching? Does chain-of-thought reasoning actually generalize beyond training data?. Even more unsettling: models trained on deliberately corrupted, irrelevant reasoning traces do about as well as those trained on correct ones, implying the traces work as computational scaffolding rather than meaningful logic Do reasoning traces need to be semantically correct?.
Put together, the corpus lands on a nuanced position: yes, a base model carries latent reasoning machinery before any training, and that machinery is real enough that diverse, lightweight interventions can elicit it. But what's latent is closer to a vast store of procedural patterns bounded by the training distribution than a free-standing logical engine — which is also why a counter-thread argues training regime still matters more than raw inference compute, since non-reasoning models can't simply 'think longer' to close the gap Can non-reasoning models catch up with more compute?. The capability is there; whether it's reasoning or a very good impression of it is the question the corpus genuinely disagrees on.
Sources 11 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.
DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.
Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.
Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.