Do base models truly possess latent reasoning capability?

This explores whether base models already contain reasoning ability before any post-training — and whether what we call 'reasoning' is genuine inference or something shallower wearing reasoning's clothes.

This explores whether base models already contain reasoning ability before any post-training — and the corpus answers in two voices that are worth holding side by side. The first voice says yes, emphatically: reasoning is already latent in the base model, and training merely unlocks it. Five independent mechanisms — reinforcement learning steering, critique fine-tuning, decoding tweaks, sparse-autoencoder feature steering, and RLVR — all surface reasoning that was already sitting in base-model activations Do base models already contain hidden reasoning ability?. The sharpest version of this claim is that RL post-training teaches a model *when* to reason, not *how*: hybrid models recover 91% of the performance gain just by routing tokens, and the activation vectors for reasoning strategies exist before any RL touches the weights Does RL post-training create reasoning or just deploy it?. By this account the bottleneck was never capability acquisition — it was elicitation.

But the second voice asks a harder question: latent *what*, exactly? Several notes argue that what gets elicited is not abstract inference but pattern-completion dressed as logic. When you strip the familiar semantics out of a task and leave only the formal rules, LLM performance collapses — models lean on commonsense token associations, not symbolic manipulation Do large language models reason symbolically or semantically?. Entailment judgments turn out to track whether the *hypothesis* looks memorized rather than whether the premise actually supports it Do LLMs predict entailment based on what they memorized?. And chain-of-thought, the visible signature of reasoning, degrades predictably under distribution shift — the tell of imitation, not emergent capability Does chain-of-thought reasoning reveal genuine inference or pattern matching?. Most unsettling: reasoning traces with invalid logical steps perform almost as well as valid ones, which means the trace is persuasive theater, not a window into computation Do reasoning traces show how models actually think?.

So is the latent capability real or illusory? A third cluster suggests the framing itself is wrong — and this is where the corpus gets genuinely surprising. Some apparent 'reasoning failures' aren't reasoning failures at all. When models hit a supposed complexity cliff, the bottleneck is often *execution bandwidth*: a text-only model can't carry out a long procedure it actually knows, and giving it tools lets it sail past the cliff Are reasoning model collapses really failures of reasoning?. Failures cluster not at complexity thresholds but at *novelty* boundaries — models succeed on any reasoning chain resembling their training instances and stumble on unfamiliar ones, the fingerprint of instance-fitting rather than general algorithms Do language models fail at reasoning due to complexity or novelty?. Reasoning even quietly decays just from longer inputs, dropping from 92% to 68% accuracy at 3,000 tokens of padding, far below the context limit Does reasoning ability actually degrade with longer inputs?.

The honest synthesis: base models *do* possess latent reasoning, but 'reasoning' here means a rich repertoire of learned reasoning-shaped patterns keyed to familiar semantics — not a general inference engine. Post-training selects and schedules that repertoire; it doesn't create symbolic competence. The frontier work is now asking what *more* you can pull from the base — making latent reasoning stochastic so a model can hold uncertainty and explore multiple solution paths rather than commit early Can stochastic latent reasoning help models explore multiple solutions?, or scaling test-time computation through hidden-state iteration with no verbalized steps at all, which hints that the spoken-out-loud chain of thought was a training artifact rather than the reasoning itself Can models reason without generating visible thinking tokens?. The thing worth knowing you didn't know you wanted to know: the reasoning was there all along, but so were its limits — and most of what training adds is knowing *when* to reach for it, not learning *how* to do it.

Sources 11 notes

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

Are reasoning model collapses really failures of reasoning?

Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Does reasoning ability actually degrade with longer inputs?

FLenQA shows reasoning accuracy drops from 92% to 68% at just 3000 tokens of padding, far below context window capacity. The degradation is task-agnostic, uncorrelated with language modeling performance, and persists even with chain-of-thought prompting.

Can stochastic latent reasoning help models explore multiple solutions?

GRAM replaces deterministic latent updates with stochastic sampling, enabling models to represent distributions over solutions rather than single predictions. This allows handling of ambiguous problems and multiple valid strategies that deterministic designs cannot represent.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do base models truly possess latent reasoning capability?

Sources 11 notes

Next inquiring lines