Reasoning and Knowledge Reasoning and Learning Architectures Language Understanding and Reasoning

Can models recognize question difficulty before they reason?

Does reasoning language models encode implicit knowledge of problem difficulty in their hidden states, even before generating solution steps? And if so, why don't they act on this knowledge?

Note · 2026-05-18 · sourced from Reasoning Methods CoT ToT
Why does chain-of-thought reasoning fail in predictable ways? How should we allocate compute budget at inference time?

S1-Bench's probing analysis demonstrates that difficulty is already there in LRM representations. A single-layer MLP trained on the final-layer hidden state of the last token in an encoded question predicts difficulty with monotonically increasing accuracy across difficulty levels. The structure is implicit but linear — no extra training, no specialized probes, no auxiliary signal is required. The model knows.

The behavioral result then forms a contradiction with this internal knowledge. On simple questions that the linear probe correctly classifies as easy, LRMs still produce redundant solution rounds, repeatedly reverify already-correct answers, and emit higher average token entropy than necessary. The hidden-state signal that says "this is easy" is overridden during generation by exploratory behavior that says "let me check again."

The authors' interpretation — and the most plausible mechanism — is that models exhibit self-doubt about their own early difficulty judgments. The model perceives the question is simple, then second-guesses that perception, then engages in exploratory generation to compensate for the imagined possibility that its initial assessment was wrong. This is a structural failure mode: the architecture lacks a mechanism to commit to an early difficulty assessment and act on it.

The deeper insight is that LRM overthinking is not a perception failure (the model fails to recognize a simple question) but an action failure (the model recognizes the question is simple but cannot translate that recognition into terminating behavior). This distinction matters for fixes: prompt-engineering for "shorter answers on easy questions" treats it as a perception problem and produces brittle results. Mechanistic fixes that route generation through the difficulty representation — for example, conditioning continued-thinking decisions on the probe output — treat it as the action problem it appears to be.

The methodology generalizes. A linear probe on a hidden state is a cheap diagnostic for any property the model is suspected to track implicitly. If the probe succeeds and the behavior contradicts it, the gap localizes the failure to the perception-to-action interface — not to representation, not to capacity.

Related concepts in this collection

Concept map
12 direct connections · 129 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

problem difficulty is linearly decodable from LRM hidden states before formal reasoning begins — yet models override this signal with exploratory overthinking suggesting architectural self-doubt