What latent reasoning capability do base models already possess before training?

This explores the growing body of evidence that reasoning ability is mostly *already present* in a base model after pretraining — and that post-training (RL, fine-tuning, prompting) unlocks rather than installs it.

This explores what reasoning a base model can already do before anyone fine-tunes or RL-trains it for it — and the surprising answer the corpus converges on is: most of it. The strongest claim is that base models carry reasoning in latent form, and the job of post-training is *elicitation, not acquisition*. One synthesis finds five independent levers — RL steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all pulling out reasoning that was already sitting in base-model activations Do base models already contain hidden reasoning ability?. A complementary framing sharpens this into a slogan: RL teaches a model *when* to reason, not *how*. Hybrid models recover ~91% of the gains by routing tokens alone, and the activation directions for reasoning strategies exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?.

If reasoning is latent, the natural next question is *where* it lives. Several notes point to a continuous, pre-verbal substrate rather than the visible chain-of-thought text. Looped pretraining shows iterative reasoning can happen in latent space during pretraining itself, with intermediate latent states that are more honest about the real computation than written-out CoT Can reasoning happen in latent space during pretraining?. Latent-thought vectors and stochastic latent transitions extend the same idea — that you can scale and diversify reasoning along a hidden dimension independent of parameter count Can latent thought vectors scale language models beyond parameters? Can stochastic latent reasoning help models explore multiple solutions?. So the latent capability isn't just a metaphor for 'untapped skill'; there's a literal hidden computation layer doing work.

Here's the twist the reader probably didn't expect: the visible reasoning text may be doing far less than it looks. Models trained on *deliberately corrupted* reasoning traces perform about as well as those trained on correct ones, suggesting the trace is computational scaffolding rather than meaningful steps Do reasoning traces need to be semantically correct?. Chain-of-Draft matches full CoT accuracy at 7.6% of the tokens — the other 92% was style and documentation, not thought Can minimal reasoning chains match full explanations?. And CoT itself looks like constrained imitation of reasoning *form* drawn from training, degrading predictably under distribution shift — the signature of pattern-replay, not fresh inference Does chain-of-thought reasoning reveal genuine inference or pattern matching?.

That raises the hard boundary on what 'latent reasoning' actually is. The corpus is blunt that it's semantic, not symbolic: strip the familiar semantic content from a task and performance collapses even when the correct logical rules are handed to the model in-context Do large language models reason symbolically or semantically?. Entailment judgments lean on whether a hypothesis was *attested* in training rather than whether the premise supports it Do LLMs predict entailment based on what they memorized?. So the latent capability is real but distribution-bound — it's a vast reservoir of pattern-completion and commonsense association, not a general logic engine.

The practical upshot ties the whole territory together: elicitation methods can only retrieve what pretraining already deposited. Cognitive tools — reasoning operations wrapped as isolated tool calls — lifted GPT-4.1 on AIME from 26.7% to 43.3% with *no* RL, purely by structuring access to latent ability Can modular cognitive tools unlock reasoning without training?. But prompt optimization can only reorganize existing knowledge; it cannot inject what isn't there, creating a hard ceiling Can prompt optimization teach models knowledge they lack?. So the answer to 'what can a base model already do' is: it already holds the reasoning machinery, much of it running in latent space, but bounded by the semantics of its training — and nearly everything we call 'teaching it to reason' is really just learning how to flip the right switch.

Sources 12 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Can reasoning happen in latent space during pretraining?

Ouro models achieve 2–3× efficiency gains by performing iterative reasoning in latent space during pretraining, not through extra capacity. Their intermediate predictions align faithfully with final outputs, making latent traces more honest than explicit chain-of-thought reasoning.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Can stochastic latent reasoning help models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Can minimal reasoning chains match full explanations?

Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

What latent reasoning capability do base models already possess before training?

Sources 12 notes

Next inquiring lines