Why do reasoning models fail at learning hidden rules from sparse exceptions?

This explores why models built to 'reason' (chain-of-thought, long deliberation) do worse at inferring a hidden rule when the evidence is mostly confirming cases plus a few telling exceptions — and the corpus suggests the reasoning machinery itself is the culprit, not a lack of capability.

This explores why reasoning models stumble at inducing a hidden rule from sparse exceptions — and the most direct answer in the collection is counterintuitive: the very chain-of-thought that's supposed to help is what hurts. In game-based tasks, reasoning models scored below 25% on exception-based rules while plain non-reasoning models hit 55–65% Why do reasoning models fail at exception-based rule inference?. The extended deliberation invites math overuse, overgeneralization, and hallucinated constraints — exactly the moves that bury the negative evidence a rare exception is trying to teach. More thinking produces more confident wrongness.

Why would more reasoning make exception-handling worse? A few notes converge on a shared mechanism. Models tend to fit the specific instances they've seen rather than extracting a transferable rule, so they succeed on familiar-looking cases and break at the boundary of novelty rather than at any 'complexity' threshold Do language models fail at reasoning due to complexity or novelty?. A sparse exception is, almost by definition, an unfamiliar instance — and worse, it's a *negative* signal. The frame-problem work shows the same blind spot from another angle: models fail not from missing knowledge but from not bringing the relevant background constraint forward, and simply forcing them to enumerate preconditions lifts accuracy from 30% to 85% Do language models fail at identifying unstated preconditions?. An exception is precisely an unstated precondition the model never promotes to 'relevant.'

Underneath this sits a deeper claim worth knowing: these models reason by semantic association, not symbolic logic. When you strip the familiar semantics out of a task, performance collapses even when the correct rule is sitting right there in the prompt Do large language models reason symbolically or semantically?. A hidden rule defined by exceptions demands exactly the symbolic, 'this case violates the pattern, therefore revise' move that token-association reasoning doesn't natively do. The same surface-vs-structure gap shows up in language itself, where models capture surface patterns but miss deep grammatical rules, degrading predictably as structure deepens Why do large language models fail at complex linguistic tasks?.

The collection also offers a useful lateral tension: not everyone agrees the failure is 'reasoning' at all. One line of work argues that apparent reasoning collapses are really *execution* failures — models that know the algorithm can't carry out the steps at scale in text, and tool access dissolves the supposed cliff Are reasoning model collapses really failures of reasoning?. Another finds reasoning traces work even when deliberately corrupted, implying they're computational scaffolding rather than genuine inference Do reasoning traces need to be semantically correct?. Read together with the exploration work showing models wander and abandon promising paths prematurely Why do reasoning models abandon promising solution paths?, a picture emerges: the reasoning chain is less a logical search than a fluent performance — which is fine until a rare exception requires you to actually backtrack and overturn the pattern you've committed to.

The thing you might not have known you wanted to know: this connects to 'Potemkin understanding,' where a model can correctly *explain* a concept, fail to *apply* it, and even *recognize* its own failure — a combination no human shows, pointing to explanation and execution running on disconnected pathways Can LLMs understand concepts they cannot apply?. That's the heart of the exception problem. A model can articulate 'I should watch for cases that break the rule' and still steamroll the one case that breaks it, because the part that talks about reasoning and the part that does it aren't the same system. Sparse exceptions are simply the cleanest place to watch those two pathways come apart.

Sources 9 notes

Why do reasoning models fail at exception-based rule inference?

Across four game-based tasks, reasoning models scored below 25% on exception rules versus 55–65% for non-reasoning models. Chain-of-thought introduces math overuse, overgeneralization, and hallucinated constraints that amplify errors in negative evidence recognition.

Do language models fail at reasoning due to complexity or novelty?

LRMs don't break at complexity thresholds but at instance-novelty boundaries. Models fit instance-based patterns rather than generalizable algorithms, so any reasoning chain succeeds if trained on similar instances, regardless of length.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do large language models fail at complex linguistic tasks?

Top-tier LLMs like Llama3-70b consistently misidentify embedded clauses, verb phrases, and complex nominals. Performance degrades predictably as syntactic depth increases, revealing that statistical learning captures surface patterns but not deep grammatical rules.

Are reasoning model collapses really failures of reasoning?

Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Why do reasoning models abandon promising solution paths?

Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Why do reasoning models fail at learning hidden rules from sparse exceptions?

Sources 9 notes

Next inquiring lines