Do base models contain latent reasoning that minimal training can unlock?

This explores whether reasoning ability is already sitting inside base models before any reasoning-specific training — so that methods like RL, fine-tuning, or clever prompting merely *reveal* it rather than build it from scratch.

This explores whether reasoning is something base models already possess and that minimal training simply unlocks — as opposed to a skill that post-training has to install. The strongest version of this claim comes from work showing that five completely different interventions — RL steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all surface the *same* reasoning that was already latent in base-model activations Do base models already contain hidden reasoning ability?. The punchline is that post-training *selects* rather than *creates*: the bottleneck isn't acquiring the capability, it's eliciting it. Several other corners of the corpus independently converge on this. Modular 'cognitive tools' — just sandboxed LLM calls, no RL at all — lifted GPT-4.1 on competition math from 27% to 43% purely by isolating reasoning operations the model could already perform Can modular cognitive tools unlock reasoning without training?. And reasoning verbosity turns out to be a single linear direction you can steer with a vector pulled from 50 examples, no retraining Can we steer reasoning toward brevity without retraining?. When behavior bends to a handful of examples or a steering vector, that's the signature of something already present being redirected.

Where it gets more interesting is *what kind* of reasoning is actually latent. A skeptical thread argues that what gets unlocked is pattern-completion, not formal inference. When semantic content is stripped from a task, LLM performance collapses even with correct rules sitting in context — models lean on learned associations, not symbolic logic Do large language models reason symbolically or semantically?. Chain-of-thought looks the same way under scrutiny: it reproduces familiar reasoning *schemata* from training and degrades predictably under distribution shift — the tell of imitation rather than emergent capability Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Entailment judgments turn out to track whether the hypothesis was memorized, not whether the premise supports it Do LLMs predict entailment based on what they memorized?. So 'latent reasoning' might be real *and* shallow at once: minimal training reliably unlocks the form of reasoning the model saw in pretraining, but doesn't conjure logic that was never in the distribution.

There's also a limit on how far 'minimal' can go. Non-reasoning models can't simply be handed more inference compute to close the gap with reasoning models — the training regime instills a *protocol* that makes extra tokens productive in the first place Can non-reasoning models catch up with more compute?. That sharpens the question: minimal training may unlock latent capability, but apparently *some* structural training is still doing real work that prompting alone can't replace.

The most provocative adjacent idea is that verbalized chain-of-thought was never the reasoning itself — just a visible byproduct. Depth-recurrent architectures, Heima, and Coconut all scale test-time reasoning by iterating in hidden state, with no spoken intermediate steps, suggesting verbalization is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?. Pushed further, Quiet-STaR shows reasoning competence can emerge as a *side effect* of ordinary next-token prediction when the model is trained to generate rationales at every token Can models learn reasoning from predicting any text?, and Energy-Based Transformers reach System-2-style deliberation from unsupervised learning alone, via gradient-descent energy minimization at inference with no domain scaffolding Can energy minimization unlock reasoning without domain-specific training?. If reasoning can fall out of plain language modeling and out of inference-time optimization, the latent-capability story stops being surprising and starts looking like the default.

The thing worth carrying away: 'unlocking latent reasoning' is best read as *elicitation engineering*, not capability creation — and the open frontier is whether you can elicit reasoning the base model never imitated. Latent-thought language models hint at the next move: adding scaling dimensions beyond raw parameters, so the latent capacity itself can be grown rather than just tapped Can latent thought vectors scale language models beyond parameters?.

Sources 11 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can models learn reasoning from predicting any text?

Quiet-STaR trains language models to generate rationales at every token position during pretraining on arbitrary internet text, enabling general reasoning without task-specific datasets. Rationale quality is judged by predictive accuracy rather than labeled correctness, allowing reasoning competence to emerge as a side effect of improved language modeling.

Can energy minimization unlock reasoning without domain-specific training?

Energy-Based Transformers assign energy values to input-prediction pairs and use gradient descent minimization for inference, yielding 35% higher training scaling rates and 29% more inference-compute gains than Transformer++, while generalizing better on out-of-distribution data without domain-specific scaffolding.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Do base models contain latent reasoning that minimal training can unlock?

Sources 11 notes

Next inquiring lines