Does internal self-revision actually degrade reasoning accuracy in models?

This explores whether a model reworking its own reasoning (self-revision, reflection, second-guessing) makes its answers worse — and the corpus says the answer turns on *who* is doing the revising and *how the model was trained*, not on the act of revision itself.

This explores whether a model reworking its own reasoning actually hurts accuracy. The short version from the corpus: revising is not the problem — revising *yourself* is. The cleanest statement comes from work showing that the revision source determines the outcome: when an external model critiques the reasoning, accuracy improves, but when a model second-guesses its own uncertain output it usually just amplifies confidence in the wrong answer instead of fixing it Does revising your own reasoning actually help or hurt?. Direct evidence from o1-style reasoning models backs this up — across QwQ, R1, and LIMO, most revisions keep the wrong answer, smaller models often flip *correct* answers to incorrect mid-revision, and longer chains with more revisions correlate with lower accuracy Does self-revision actually improve reasoning in language models?.

Why does this happen? Two mechanisms show up repeatedly. First, models have a built-in bias toward trusting things they themselves generated: a high-probability self-generated answer simply *feels* more correct when the same model evaluates it, so self-checking collapses into self-agreement Why do models trust their own generated answers?. Second, when a single model keeps arguing with its own prior reasoning, it slides into a failure mode where it grows *more* certain of errors rather than less — and the fix is diversity: debate between genuinely different models reverses the pattern and improves both accuracy and calibration Does a model improve by arguing with itself?. The common thread is that a model has no independent vantage point on itself; the corrective signal has to come from outside the loop.

There's an even more deflating finding worth sitting with: a lot of what looks like self-correction isn't correction at all. Analysis across eight reasoning models found that reflection rarely changes the final answer and mostly serves as post-hoc confirmation of the first answer — and training on longer reflection chains improves the *first answer's* quality, not the model's ability to fix itself Is reflection in reasoning models actually fixing mistakes?. In the same spirit, frontier models that sound fluent while reflecting hit only 20–23% on constraint-satisfaction problems that demand real backtracking, showing that reflective *fluency* doesn't translate into reflective *competence* Can reasoning models actually sustain long-chain reflection?.

But here's the turn that makes this more than a 'self-revision is bad' story: the behavior is trainable. Vanilla models use extended thinking counterproductively — it induces self-doubt that degrades performance — yet RL training redirects that exact same mechanism into productive gap analysis, so training, not the act of thinking, mediates quality Does extended thinking help or hurt model reasoning?. Other approaches close the loop from inside in disciplined ways: using the model's own answer-span confidence as a reward signal strengthens step-by-step reasoning while *restoring* calibration Can model confidence work as a reward signal for reasoning?, and post-completion learning trains genuine self-evaluation into the model rather than letting it improvise self-critique at inference Can models learn to evaluate their own work during training?.

The thing you might not have expected to learn: even setting aside *who* revises, more revision is its own hazard because it usually means more thinking, and thinking has an optimum. Accuracy follows an inverted-U with chain length — one model dropped from 87% to 70% as thinking tokens climbed from ~1,100 to ~16K Does more thinking time always improve reasoning accuracy? Why does chain of thought accuracy eventually decline with length? — and much of that waste is models abandoning good reasoning paths too early, which a simple penalty on thought-switching can fix without retraining Do reasoning models switch between ideas too frequently?. So 'does self-revision degrade accuracy?' resolves into something sharper: unguided self-revision tends to degrade it, longer self-revision degrades it past a point, but externally-guided or training-instilled revision is exactly where the gains live.

Sources 12 notes

Does revising your own reasoning actually help or hurt?

Revision guided by external models improves accuracy, but a model revising its own uncertain output typically amplifies confidence in wrong answers rather than correcting them. The revision source, not the revision act itself, determines the outcome.

Does self-revision actually improve reasoning in language models?

Evidence from QwQ, R1, and LIMO shows most revisions retain wrong answers rather than correcting them. Smaller models frequently switch correct answers to incorrect during revision, and longer chains with more revisions correlate with lower accuracy.

Why do models trust their own generated answers?

LLMs exhibit structural bias toward validating their own outputs because high-probability generated answers feel more correct during evaluation. Comparing answers against broader alternatives breaks this self-agreement loop.

Does a model improve by arguing with itself?

Models that reconsider answers based on their own previous reasoning become more confident in errors, not less. Multi-agent debate with genuinely different models reverses this pattern, improving both accuracy and calibration.

Is reflection in reasoning models actually fixing mistakes?

Analysis of 8 reasoning models shows reflections rarely change answers and primarily serve as post-hoc confirmation. Training on longer reflection chains improves first-answer quality, not self-correction capability.

Can reasoning models actually sustain long-chain reflection?

DeepSeek-R1 and o1-preview achieve only 20-23.6% exact match on 850 constraint satisfaction problems requiring genuine backtracking. This ceiling reveals that reflective reasoning fluency does not translate to actual problem-solving competence on unfamiliar instance structures.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

Do reasoning models switch between ideas too frequently?

o1-like models frequently abandon reasoning paths mid-exploration, wasting tokens on incomplete approaches. A decoding-only penalty on thought-transition tokens (TIP strategy) discourages switching, improving accuracy on challenging math without model fine-tuning.

Does internal self-revision actually degrade reasoning accuracy in models?

Sources 12 notes

Next inquiring lines