INQUIRING LINE

Why do non-reasoning models work better under extreme decomposition than reasoning models?

This explores why, when a task is sliced into many tiny well-posed subproblems, a plain model can beat a 'thinking' model — the corpus suggests the reasoning protocol becomes dead weight once the hard planning has already been done externally.


This reads the question as: under extreme decomposition, each subtask is small and clearly specified, and that's exactly the regime where a reasoning model's trained-in habits turn into liabilities. The clearest mechanism is that reasoning models are optimized to always emit reasoning steps and are never taught when to stop. When a subproblem is trivial or even ill-posed, they keep generating — Why do reasoning models overthink ill-posed questions? shows reasoning models spray redundant chains at questions a non-reasoning model simply flags as unanswerable. Decomposition multiplies these easy subproblems, so it multiplies the overthinking tax.

The second mechanism is that 'more thinking' often isn't more computing. On constraint-bound numerical work, extended chain-of-thought produces more text but not more iterative work, and reasoning variants show no consistent edge over standard models (Do reasoning models actually beat standard models on optimization?). Relatedly, when models collapse on long procedures the bottleneck turns out to be execution bandwidth, not reasoning (Are reasoning model collapses really failures of reasoning?). Extreme decomposition addresses precisely the thing reasoning helps with — the planning — by handing each fragment to the solver pre-carved. What's left is execution, where the reasoning protocol adds tokens, not accuracy. And a lot of those tokens are decorative anyway: Chain of Draft hits the same accuracy at 7.6% of the token count, meaning ~92% of a verbose chain was style and documentation, not computation (Can minimal reasoning chains match full explanations?).

The deepest framing comes from work separating the decomposer from the solver (Does separating planning from execution improve reasoning accuracy?): decomposition ability and solving ability are different skills, and keeping them apart prevents planning-execution interference. Extreme decomposition is that separation pushed to its limit — the orchestration layer becomes the 'reasoner,' so a reasoning model at the leaf level is doing redundant planning on a problem that no longer needs planning. Its inclination to re-plan every fragment is now interference, not help.

The honest tension is Can non-reasoning models catch up with more compute?, which argues reasoning models persistently win regardless of inference budget because training makes their extra tokens productive. That isn't a contradiction — it's the boundary condition. Reasoning's advantage shows up when the problem demands integrated, multi-step thinking held in one head. Decompose that away and you've removed the very thing the training was good for, leaving overhead. This is also why routing approaches like Thinkless (Can models learn when to think versus respond quickly?) matter: the real win isn't 'reasoning' or 'no reasoning' but knowing when each fragment deserves thought.

The thing you may not have expected to learn: the question isn't really about model quality at all. It's that decomposition relocates the reasoning out of the model and into the task structure — and once it lives there, a model that insists on reasoning anyway is solving a problem that's already been solved.


Sources 7 notes

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Do reasoning models actually beat standard models on optimization?

Reasoning variants with extended CoT show no consistent advantage over standard models on constraint-bound numerical tasks like optimal power flow. Extended thinking produces more text, not more iterative computation, suggesting the bottleneck is numeric procedure rather than reasoning steps.

Are reasoning model collapses really failures of reasoning?

Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.

Can minimal reasoning chains match full explanations?

Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Next inquiring lines