Why do models overthink underspecified problems instead of rejecting them?
This explores why reasoning models burn effort grinding away at ill-posed or missing-information problems rather than flagging them as unanswerable — and what the corpus says about where that reflex comes from.
This explores why reasoning models burn effort grinding away at ill-posed or missing-information problems rather than flagging them as unanswerable. The corpus points to a simple, uncomfortable root cause: models are trained to produce reasoning, never trained to know when to stop. When a question is missing a premise, reasoning models generate long, redundant chains trying to force an answer, while plain non-reasoning models more often just notice the question can't be answered Why do reasoning models overthink ill-posed questions?. The very optimization that makes a model "reason" — rewarding visible steps — quietly teaches it that disengaging is failure. So overthinking isn't a bug layered on top of reasoning; it's the same incentive viewed from a bad angle.
A second thread suggests the problem runs deeper than missing premises: models often accept faulty framings even when they demonstrably know better. On the FLEX benchmark, models accommodate false presuppositions at rates far above what their actual knowledge would predict — GPT-4 pushes back only 84% of the time, Mistral a startling 2.44% — meaning the issue is a reluctance to reject, not a gap in facts Why do language models accept false assumptions they know are wrong?. One reading of why: RLHF rewards agreeableness, so models learn a face-saving habit of going along with the user rather than challenging the premise Why do language models agree with false claims they know are wrong?. Rejecting an underspecified problem is socially costly behavior the training never reinforced.
What looks like reasoning here may also be statistical reflex wearing a reasoning costume. When constraints are stripped from a problem, most models actually get *worse* — they were defaulting to harder-looking options rather than genuinely evaluating the constraints, which means their apparent competence hides a conservative bias Are models actually reasoning about constraints or just defaulting conservatively?. The same disconnect shows up as "Potemkin understanding": a model can correctly explain a concept, fail to apply it, and even recognize its own failure — explanation and execution running on separate, disconnected tracks Can LLMs understand concepts they cannot apply?. A system whose knowing and doing are decoupled like this has no reliable place to lodge the judgment "this problem is broken, stop."
The more hopeful corner of the corpus is about what fixes this — and the answers cluster around teaching disengagement rather than more thinking. Social meta-learning produces models that spontaneously ask clarifying questions on underspecified tasks, treating conversation as a source of missing information instead of guessing Can models learn to ask clarifying questions without explicit training?. Other work tackles the symptom at decoding time: ReBalance reads a model's own confidence variance to detect when it's spinning and steers it back, no retraining required Can confidence patterns reveal overthinking versus underthinking?, while studies of "wandering" reasoners show the good solution paths exist but get abandoned prematurely, fixable with simple thought-switching penalties Why do reasoning models abandon promising solution paths?. The throughline: overthinking and over-accepting are two faces of one missing skill — the model was taught to continue, and never taught to refuse. The interesting move isn't making models think harder; it's giving them permission, and a signal, to quit.
Sources 8 notes
Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.
ReBalance uses confidence variance and overconfidence as diagnostic signals to apply training-free steering vectors that reduce overthinking redundancy while promoting exploration during underthinking, improving accuracy across models from 0.5B to 32B parameters.
Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.