What critical thinking skills do reasoning models actually lose?
Step-by-step reasoning training optimizes narrow deductive thinking while degrading meta-cognitive abilities like recognizing futile thinking and maintaining tentative reasoning. Understanding this tradeoff matters for deploying reasoning models reliably.
Post angle: Medium
We trained AI to think. In doing so, we trained it not to think in two specific and important ways.
Failure mode 1: It can't recognize when thinking is futile
Give a reasoning model a question with a missing premise — a question that cannot be answered because essential information is absent. A non-reasoning model quickly produces a short response acknowledging the problem. A reasoning model produces a response five times longer, cycling through "alternatively," "wait," "but..." — generating elaborate chains that never converge because there's nothing to converge on.
Non-reasoning models have better critical thinking about when to think. Reasoning-specific training optimizes for using thinking patterns. It doesn't develop the meta-capability to disengage when engagement is inappropriate.
Failure mode 2: It reasons its way to the wrong rule
Give a reasoning model four games with hidden special rules. Non-reasoning models score 55-65% on those exception-based rules. Reasoning models score below 25%. The detailed thinking chains make things worse — models apply arithmetic to symbols, overgeneralize from two examples, or invent rules that weren't in the data.
Inductive reasoning from sparse, exception-containing observations requires a different kind of thinking: tentative, minimal, defeasible. The CoT pattern forces positive, elaborating chains that work against the task.
The pattern: Training for deductive, step-by-step reasoning improves that specific skill while degrading adjacent cognitive capabilities — the ability to disengage, the ability to remain tentative, the ability to recognize an exception rather than rationalize around it.
The implication: Reasoning models have a narrower cognitive profile than their benchmark performance suggests. The benchmarks are in-distribution, CoT-suited tasks. The real-world distribution also contains ill-posed questions, hidden rules, and problems where the correct response is to stop thinking.
Source: Reasoning Critiques
Related concepts in this collection
-
Why do reasoning models overthink ill-posed questions?
Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.
first failure mode
-
Why do reasoning models fail at exception-based rule inference?
Explores why chain-of-thought models systematically underperform on tasks requiring inductive rule inference from exceptions in game-based settings, despite excelling at normal rule patterns.
second failure mode
-
When does explicit reasoning actually help model performance?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
the existing note establishing the first evidence for this pattern
-
Does extended thinking help or hurt model reasoning?
Explores whether activating thinking mode improves reasoning performance, and what role training plays in determining whether extended internal reasoning chains are productive or counterproductive.
proof that the critical thinking deficit is partly reversible: RL training can redirect extended thinking from counterproductive self-doubt toward productive gap analysis; the mechanism flips from harmful to helpful, but only for the specific capability trained
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
the trainable solution to failure mode 1: RL training raises missing-information detection from 0.15% to 73.98%; the critical thinking deficit is not fundamental but a consequence of what gets trained
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
the critical thinking problem — what reasoning models sacrifice when trained to think step by step