Can we trigger reasoning without explicit chain-of-thought prompts?
This research asks whether models possess latent reasoning capabilities that can be activated through direct feature steering, independent of chain-of-thought instructions. Understanding this matters for making reasoning more efficient and controllable.
Using Sparse Autoencoders to decompose model activations into interpretable features, a two-stage pipeline identifies latent features causally associated with reasoning behavior. First, SAEs extract sparse features from activations comparing CoT vs non-CoT prompting conditions. Second, targeted steering interventions modulate candidate features and measure downstream reasoning performance.
The central result: steering a single reasoning-related latent feature at the first generation step substantially improves accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT while producing more efficient outputs — fewer tokens, same accuracy.
Three properties of this reasoning mode are striking:
Early triggering. The reasoning-oriented internal state is triggered early in generation, not built up through sequential token production. This contrasts with the H2 assumption that reasoning emerges through the step-by-step construction of a chain.
Override robustness. The latent reasoning mode can override prompt-level instructions that discourage explicit reasoning — including the \no_think instruction used in Qwen models. The internal state takes precedence over surface directives, suggesting the latent mechanism operates at a deeper level than prompt compliance.
Cross-model generality. The finding replicates across six model families up to 70B parameters, suggesting this is not an architecture-specific artifact but a general property of how large language models organize reasoning capability.
The implication is that CoT prompting is one effective but not unique way of activating an underlying reasoning mechanism. Other triggers include: altered decoding procedures (CoT-decoding from base models already possess latent reasoning capability that minimal training signals can unlock), soft continuous representations (from Can we explore multiple reasoning paths without committing to one token?), and now direct feature steering. The multiplicity of triggers, all converging on the same capability, is the strongest evidence that the capability is latent and the triggers are interchangeable surface-level activators.
This extends the repertoire of steerable behavioral dimensions from Can we steer reasoning toward brevity without retraining? (reasoning verbosity), Can we track and steer personality shifts during model finetuning? (personality), and Can high-level concepts replace circuit-level analysis in AI? (truthfulness, honesty, morality) to include reasoning activation itself — arguably the most consequential dimension yet.
Source: Cognitive Models Latent Paper: Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models
Related concepts in this collection
-
Can we steer reasoning toward brevity without retraining?
This explores whether model reasoning style occupies learnable geometric directions in activation space, and whether we can shift toward concise thinking by steering through that space without expensive retraining.
extends steerable dimensions from verbosity to reasoning activation
-
Can high-level concepts replace circuit-level analysis in AI?
Instead of reverse-engineering individual circuits, can we study AI reasoning by treating concepts as directions in activation space? This matters because circuit analysis hits practical limits at scale.
SAE steering for reasoning adds a new dimension to the RepE paradigm
-
Can we explore multiple reasoning paths without committing to one token?
Standard language models pick one token at each step, collapsing uncertainty and forcing single reasoning trajectories. Could preserving the full probability distribution across token embeddings enable implicit parallel exploration instead?
another non-CoT trigger for latent reasoning
-
Where does LLM reasoning actually happen during generation?
Does multi-step reasoning emerge from visible chain-of-thought text, hidden layer dynamics, or simply more computation? Three competing hypotheses make different predictions and can be empirically tested.
provides causal evidence for H1
-
Can latent thought vectors scale language models beyond parameters?
Explores whether explicit latent thought vectors with dual-rate learning create new scaling dimensions independent of model size. This matters because it suggests alternatives to simply building larger models.
LTMs make latent thought vectors explicit architectural components; SAE steering shows reasoning features are already implicit in standard architectures
-
Does RL teach reasoning or just when to use it?
Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
converges: RL teaches when to activate reasoning; SAE steering shows the reasoning mechanism is a single activatable feature
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
SAE steering closes this gap for reasoning: the identified feature IS causally active, unlike many encoded-but-unused representations
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
steering a single SAE-identified reasoning feature matches CoT performance while bypassing explicit chain-of-thought — CoT is one trigger for latent reasoning not its cause