What mechanisms activate latent reasoning capabilities already present in base models?
This explores the growing evidence that reasoning isn't something post-training installs from scratch — it's already latent in base models, and the real question is which levers wake it up.
This explores how reasoning capability appears to be *already present* in base models, and what mechanisms surface it rather than build it. The corpus is unusually convergent here: across at least five independent methods — reinforcement learning steering, critique fine-tuning, decoding changes, sparse-autoencoder feature steering, and RLVR — the same conclusion keeps surfacing, that post-training *selects* reasoning rather than *creates* it Do base models already contain hidden reasoning ability?. The bottleneck is elicitation, not acquisition. That reframing is the through-line for everything below.
The most striking activation mechanism is the smallest. Researchers found a single reasoning-related feature inside the model that, when directly steered, matches or beats chain-of-thought prompting across six model families — and it fires early in generation, even overriding surface instructions Can we trigger reasoning without explicit chain-of-thought prompts?. You don't need to *ask* the model to think step by step; you can flip an internal switch. At the other end of the intervention spectrum, you can elicit the same latent capability with no training at all: wrapping reasoning operations in modular "cognitive tools" (sandboxed LLM calls that isolate one operation each) lifted GPT-4.1 on a hard math benchmark from 26.7% to 43.3% Can modular cognitive tools unlock reasoning without training?. Both findings point the same way — the ability was there; the trigger was structural.
This is where the most counterintuitive result lands. If reasoning were being *taught*, the reward signal would have to be correct. But RLVR improves sampling efficiency *within* existing capability boundaries without expanding them — a single training example can suffice, and spurious (even random) rewards work nearly as well as correct ones, as long as the pretraining laid the groundwork What does reward learning actually do to model reasoning?. The complementary framing puts it crisply: RL post-training teaches a model *when* to reason, not *how* — hybrid models recover 91% of the gains just by routing tokens, and the activation vectors for reasoning strategies already exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?.
So where does the latent capability actually come from? The trail leads back into pretraining. Analysis of five million pretraining documents shows reasoning draws on broad, transferable *procedural* knowledge spread across many sources — unlike factual recall, which depends on narrow memorization of specific documents Does procedural knowledge drive reasoning more than factual retrieval?. That's the deposit later mechanisms withdraw from. It also explains a sobering limit: when semantic content is stripped from a task, performance collapses even with correct rules in hand, suggesting what gets activated is association-driven "semantic" reasoning rather than formal symbolic logic Do large language models reason symbolically or semantically? — a constraint echoed by evidence that models lean on whether a conclusion *looks* attested rather than whether premises support it Do LLMs predict entailment based on what they memorized?.
The part you didn't know you wanted to know: activation needn't happen in words at all. A separate line shows test-time compute can scale through hidden-state iteration with no verbalized intermediate steps — depth-recurrent models, Coconut, and Heima all reason in latent space — implying that the visible chain-of-thought may be a training artifact, not a requirement of reasoning itself Can models reason without generating visible thinking tokens?. Push further and you can make those latent transitions *stochastic*, letting a model hold uncertainty and explore multiple solution paths Can stochastic latent reasoning help models explore multiple solutions?, or scale reasoning in *width* by sampling parallel trajectories instead of only going deeper Can reasoning systems scale wider instead of only deeper?. Taken together, the corpus suggests the frontier isn't building a reasoner — it's finding cleaner switches to turn the latent one on, and better-shaped spaces for it to run in.
Sources 11 notes
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Research shows RLVR improves sampling efficiency within existing capability boundaries without expanding them. A single training example suffices for activation, and spurious rewards work nearly as well as correct ones for models with appropriate pretraining.
Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.
GRAM shows that stochastic latent transitions enabling parallel trajectory sampling sidestep the serial latency cost of depth-only scaling. Width matches token-level parallelism benefits: independent paths sample the solution space without variance inflation.