How do reasoning training methods sacrifice some thinking skills while improving others?
This explores the trade-offs in training models to reason step-by-step — what gets better, what quietly gets worse, and why the same training that sharpens one skill can dull another.
This explores the hidden trade-offs of reasoning training: the gains aren't free, and the same process that sharpens step-by-step logic can quietly erode other skills. The clearest statement of the problem is that reasoning training narrows cognitive ability while appearing to broaden it — models get better at in-distribution logical tasks but lose the judgment to disengage from ill-posed questions, instead overthinking them, and will confidently reason their way to wrong rules on inductive problems What critical thinking skills do reasoning models actually lose?. So the sacrifice isn't random; it trades flexible judgment for procedural depth.
A big part of why this happens is structural. Knowledge and reasoning live in different places inside the network — factual retrieval in the lower layers, reasoning adjustment in the higher ones — so training that tunes the reasoning layers can degrade knowledge-heavy domains. This is why the same reasoning training that lifts math scores can hurt medical performance Why does reasoning training help math but hurt medical tasks?. You're not adding a skill on top; you're reweighting a shared system, and knowledge recall pays part of the bill.
The other recurring sacrifice is calibration — knowing how much to think. More thinking is not better past a point: pushing thinking tokens from ~1,100 to ~16K dropped accuracy from 87% to 70%, because models overthink easy problems and underthink hard ones Does more thinking time always improve reasoning accuracy?. Whether extended thinking helps at all depends on what training did to it — untrained models use "thinking mode" to spiral into self-doubt that hurts performance, while RL training redirects that same machinery into productive gap analysis Does extended thinking help or hurt model reasoning?. And sometimes the right amount of reasoning is none: for simple questions, direct question-to-answer flow beats step-by-step prompting, so a model trained to always reason loses the ability to take the shortcut Why do some questions perform better without step-by-step reasoning?.
Here's the reframe that makes the trade-offs feel less inevitable. A growing body of work argues that reasoning training mostly doesn't create capability — it selects and deploys what's already latent in the base model. Five independent methods all elicit reasoning that base models already contain, suggesting the bottleneck is elicitation, not acquisition Do base models already contain hidden reasoning ability?. RL in particular looks less like teaching reasoning and more like teaching *when* to use it — a hybrid model recovered 91% of the gains using just 12% of the tokens Does RL teach reasoning or just when to use it?. If that's right, then the "sacrifice" is often a deployment-policy problem: training over-applies a skill rather than destroying another one. The encouraging corollary from the critical-thinking work is that the narrowing is partly reversible through targeted RL What critical thinking skills do reasoning models actually lose?.
That reframing points to gentler ways to add reasoning without the collateral damage. Modular cognitive tools lifted GPT-4.1's competition-math score from 27% to 43% with no RL at all, by isolating reasoning operations rather than retraining the weights Can modular cognitive tools unlock reasoning without training?. Training on backward reasoning improves forward reasoning by building in consistency-checking Can backward reasoning during training improve forward reasoning?, and planting reasoning earlier — during pretraining via information-gain rewards or by reconstructing experts' hidden thought processes — produces skills that transfer across domains and adapt depth to difficulty, rather than locking in one rigid procedure Can chain-of-thought reasoning be learned during pretraining itself?, Can reconstructing expert thinking improve reasoning transfer?. The throughline: the methods that sacrifice the least are the ones that elicit and route reasoning rather than overwrite the rest of the model to install it.
Sources 11 notes
Models trained for step-by-step reasoning excel at in-distribution logical tasks but lose critical abilities: they overthink ill-posed questions instead of disengaging, and reason their way to wrong rules on inductive tasks. This cognitive narrowing is partly reversible through targeted RL training.
Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.
Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.
Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.
Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Pre-training acquires reasoning capability; RL teaches efficient deployment. A hybrid model combining base reasoning with thinking model steering recovered 91% of performance gains using only 12% of tokens, suggesting RL acts as a deployment optimizer rather than a capability creator.
Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.
Training models simultaneously on forward reasoning, backward question generation, and backward reasoning improves forward-only performance by 13.53% average across 12 datasets. The mechanism: generating backward questions forces models to understand the inverse relationship between problem and solution, deepening understanding that transfers to forward reasoning without test-time overhead.
RLP treats CoT as exploratory action during pretraining, using log-likelihood improvement as verifier-free reward. Applied to Qwen3-1.7B and Nemotron-Nano-12B, the method improves math and science benchmarks substantially, suggesting reasoning can be planted earlier in training.
Training on expert texts augmented with reconstructed thought processes (self-talk, knowledge recall, verification) produces reasoning skills that transfer across domains and adapt depth to problem difficulty, outperforming standard continual pretraining by up to 8 points on hard problems.