LLM Reasoning and Architecture

Can we trigger reasoning without explicit chain-of-thought prompts?

This research asks whether models possess latent reasoning capabilities that can be activated through direct feature steering, independent of chain-of-thought instructions. Understanding this matters for making reasoning more efficient and controllable.

Note · 2026-04-20 · sourced from Cognitive Models Latent

Using Sparse Autoencoders to decompose model activations into interpretable features, a two-stage pipeline identifies latent features causally associated with reasoning behavior. First, SAEs extract sparse features from activations comparing CoT vs non-CoT prompting conditions. Second, targeted steering interventions modulate candidate features and measure downstream reasoning performance.

The central result: steering a single reasoning-related latent feature at the first generation step substantially improves accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT while producing more efficient outputs — fewer tokens, same accuracy.

Three properties of this reasoning mode are striking:

Early triggering. The reasoning-oriented internal state is triggered early in generation, not built up through sequential token production. This contrasts with the H2 assumption that reasoning emerges through the step-by-step construction of a chain.

Override robustness. The latent reasoning mode can override prompt-level instructions that discourage explicit reasoning — including the \no_think instruction used in Qwen models. The internal state takes precedence over surface directives, suggesting the latent mechanism operates at a deeper level than prompt compliance.

Cross-model generality. The finding replicates across six model families up to 70B parameters, suggesting this is not an architecture-specific artifact but a general property of how large language models organize reasoning capability.

The implication is that CoT prompting is one effective but not unique way of activating an underlying reasoning mechanism. Other triggers include: altered decoding procedures (CoT-decoding from base models already possess latent reasoning capability that minimal training signals can unlock), soft continuous representations (from Can we explore multiple reasoning paths without committing to one token?), and now direct feature steering. The multiplicity of triggers, all converging on the same capability, is the strongest evidence that the capability is latent and the triggers are interchangeable surface-level activators.

This extends the repertoire of steerable behavioral dimensions from Can we steer reasoning toward brevity without retraining? (reasoning verbosity), Can we track and steer personality shifts during model finetuning? (personality), and Can high-level concepts replace circuit-level analysis in AI? (truthfulness, honesty, morality) to include reasoning activation itself — arguably the most consequential dimension yet.


Source: Cognitive Models Latent Paper: Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models

Related concepts in this collection

Concept map
17 direct connections · 167 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

steering a single SAE-identified reasoning feature matches CoT performance while bypassing explicit chain-of-thought — CoT is one trigger for latent reasoning not its cause