Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models

Paper · arXiv 2601.08058 · Published January 12, 2026

Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.

Large language models exhibit substantially improved performance on complex reasoning tasks with Chain-of-Thought (CoT) prompting, where intermediate reasoning steps are explicitly verbalized (Wei et al., 2022; Kojima et al., 2022). Since its introduction, CoT prompting has consistently enhanced performance across a wide range of arithmetic (Lewkowycz et al., 2022), symbolic, and logical reasoning (Talmor et al., 2019) benchmarks and has inspired numerous variants, such as self consistency (Wang et al., 2022). These successes have led to the widespread view that step-by-step prompt plays a central role in enabling multi-step reasoning in large language models.

However, the causal role of CoT prompting in multi-step reasoning remains unclear: CoT may be a convenient trigger rather than the unique pathway to reasoning behavior (Figure 1, top). Recent work shows that reasoning-relevant trajectories can be induced without CoT-style prompts, for example, by modifying the decoding process to surface latent CoT paths (Wang and Zhou, 2024), or by injecting continuous “soft thought” representations instead of generating explicit reasoning text (Xu et al., 2025). Moreover, causal analyses suggest that CoT traces are not necessarily the mechanism producing the final answer (Bao et al., 2024). These studies raise a fundamental question: whether multi-step reasoning in LLMs corresponds to a latent internal mechanism that can be selectively activated, and whether CoT prompting is uniquely responsible for activating this mechanism or merely one of several effective triggers? (Figure 1, bottom).

In this work, we answer this question by directly analyzing and intervening on the internal representations of large language models. Using Sparse Autoencoders (SAE) to identify latent features associated with reasoning behavior, we show that targeted modulation of these features can influence the model’s reasoning performance without explicit CoT prompting. Together, these findings suggest that multi-step reasoning reflects a latent capability inherent to the model, while CoT prompting is not the fundamental cause of this capability, but one of several ways to activate an underlying reasoning mechanism. Our contributions are:

• Methodological: We develop a two-stage pipeline using Sparse Autoencoders (SAEs) to identify reasoning-related latent features and causally validate their role through targeted steering interventions.

• Empirical: Experiments across six model families (up to 70B) demonstrate that steering a single latent feature at the first generation step matches or exceeds CoT performance while substantially reducing token overhead.

• Mechanistic: We show that this internal reasoning mode can be triggered early in generation and is robust enough to override prompt-level constraints like the \no_think instruction used in Qwen models.

2 Related Work

Chain-of-Thought and Multi-step Reasoning.

Chain-of-thought (CoT) prompting has been widely adopted as a practical approach for improving performance on tasks that require multi-step reasoning (Wei et al., 2022). Prior work has demonstrated that encouraging models to produce intermediate reasoning steps can lead to substantial gains across a variety of domains, including arithmetic problem solving (Lewkowycz et al., 2022), symbolic manipulation, and logical inference (Talmor et al., 2019). As a result, CoT-style prompting has become a common component in reasoning benchmarks and evaluation protocols for large language models, and has inspired numerous extensions such as self-consistency (Wang et al., 2022) and structured reasoning prompts (Kojima et al., 2022).

Reasoning Beyond Explicit CoT Prompting. Beyond the standard CoT prompting paradigm, a growing line of work suggests that multi-step reasoning behavior in large language models need not be uniquely tied to explicit CoT prompts. For instance, recent studies show that CoT-style reasoning trajectories can be elicited by altering the decoding process without using explicit prompting (Wang and Zhou, 2024), and that implicit reasoning leveraging internal hidden states can support complex reasoning without generating step-by-step text (Deng et al., 2023). Moreover, methods based on continuous or latent representations, such as soft thought tokens, demonstrate enhanced reasoning capability without relying on explicit verbal reasoning steps (Xu et al., 2025). Complementary empirical analyses further indicate that the effectiveness of CoT prompting does not strictly depend on correct or valid intermediate chains, suggesting that the internal drives for reasoning extend beyond the surface verbal structure (Wang et al., 2023).

Internal Representation Analysis of Reasoning. Beyond output-based analyses, prior work has investigated reasoning by examining internal representations of language models. Early approaches rely on probing, activation analysis, and causal tracing to associate hidden states with reasoning relevant behaviors, suggesting that substantial computation occurs internally even when it is not explicitly verbalized (Burns et al.; Meng et al., 2022).

More recent work in mechanistic interpretability aims to decompose superposed activations into more interpretable components. Sparse autoencoders (SAEs) have been proposed as a scalable tool for extracting mono semantic or behaviorally meaningful features from language model activations, enabling finer-grained analysis of internal mechanisms (Cunningham et al., 2023). These representations have been used to study internal structure and to support activation-level interventions and steering, including in reasoning-related settings (Xin et al., 2025; Wang et al., 2025).

However, most existing studies remain primarily correlational, identifying internal features associated with reasoning-like behavior without establishing whether such representations play a causal role or correspond to a distinct reasoning mode that can be selectively engaged.

In this section, we present a two-stage pipeline for identifying and intervening on latent features associated with reasoning-related behaviors in large language models. Chain-of-Thought prompting is used as a contrasting condition to reveal prompt dependent differences in latent activations. As illustrated in Figure 2, we first extract sparse latent features using a pretrained sparse autoencoder (SAE) and identify candidate features (Section 3.3). We then define a latent steering procedure to estimate the intervention sensitivity of individual features on training data, and evaluate the selected features on a held-out test set (Section 3.4).