Why does latent reasoning override no-think instructions in models?

This explores why models seem to 'think' even when told not to — and what that reveals about whether reasoning is something prompts switch on or a capability already baked into the weights.

This explores why a model keeps reasoning even when you instruct it not to — and the corpus suggests the answer is uncomfortable for anyone who treats reasoning as a prompt-level feature: the reasoning isn't living in the visible 'think' tokens at all. The most direct evidence is that steering a single internal feature, identified through a sparse autoencoder, can trigger reasoning that matches or beats chain-of-thought across six model families — and crucially, this mode activates early in generation and overrides surface-level instructions Can we trigger reasoning without explicit chain-of-thought prompts?. If one latent direction can flip reasoning on, a 'no-think' instruction in the prompt is operating at the wrong layer to turn it off.

Why is it already there to begin with? Several independent lines converge on the idea that base models *contain* latent reasoning rather than acquiring it from training. Five different elicitation methods — RL steering, critique fine-tuning, decoding changes, feature steering, and RLVR — all unlock reasoning that was already present in base-model activations, suggesting post-training selects reasoning rather than creating it Do base models already contain hidden reasoning ability?. An instruction can ask the model not to *show* its work, but the computation it's asking to suppress is a default capability of the network, not an optional subroutine.

The deeper reason 'no-think' fails is that thinking and verbalizing are separable. Architectures like depth-recurrent models, Heima, and Coconut scale test-time compute through iteration over hidden states, with no visible intermediate tokens at all — implying verbalization is a training artifact, not a requirement for reasoning Can models reason without generating visible thinking tokens?. A 27M-parameter recurrent model solved Sudoku-Extreme and large mazes perfectly while token-based CoT scored zero Can models reason without generating visible thinking steps?. So when you forbid 'thinking,' you're really only forbidding the narration; the latent computation proceeds underneath it.

There's also a reason the instruction loses the tug-of-war specifically as reasoning grows. The MathIF benchmark shows that the very training that improves reasoning *degrades* instruction-following — longer chains create contextual distance that dilutes the model's attention to the original instruction Why do better reasoning models ignore instructions?. A 'no-think' command is itself an instruction, so it's subject to the same erosion: the more reasoning-capable the model, the weaker its grip on the constraint telling it to stop.

Worth knowing for where this goes next: 'thinking' isn't a single switch but a steerable region of activation space. Verbose and concise reasoning occupy distinct linear directions you can push along without retraining Can we steer reasoning toward brevity without retraining?, and RL training can flip the *same* thinking mechanism from counterproductive self-doubt into productive analysis Does extended thinking help or hurt model reasoning?. The implication: if you actually want to control reasoning, the lever is in the activations, not in a polite request at the prompt.

Sources 7 notes

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can models reason without generating visible thinking steps?

Depth-recurrent and compressed-token architectures solve reasoning tasks through hidden computation rather than output tokens. A 27M-parameter model solved Sudoku-Extreme and 30×30 mazes perfectly while CoT methods scored zero.

Why do better reasoning models ignore instructions?

The MathIF benchmark shows that SFT and RL training improve reasoning but reduce instruction adherence, particularly as chain-of-thought length increases. Longer reasoning chains create contextual distance that dilutes the model's attention to original instructions.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Why does latent reasoning override no-think instructions in models?

Sources 7 notes

Next inquiring lines