Does revising your own reasoning actually help or hurt?
Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.
The self-revision literature contains an apparent contradiction. Critique-in-the-loop approaches (AutoMathCritique, Agent-R, Meta-Reasoner) show that revision guided by step-level feedback improves actor model performance. The reasoning model evidence shows that self-revision degrades accuracy — more revision tokens correlate with wrong answers, and smaller models primarily switch correct answers to incorrect during revision, not vice versa. Revision helps in one literature and harms in another.
The resolution: revision source is the determining variable, not the revision act itself.
Externally guided revision: A separate model — potentially better calibrated, trained on critique quality, operating with fresh context — evaluates the current response and provides correction signals. The actor revises against these signals. The quality of the revision is bounded by the quality of the external critic, which can be better than the actor's self-evaluation capacity.
Internally driven revision: The same model second-guesses its own output. The self-evaluation is bounded by the same uncertain capabilities that produced the uncertain output in the first place. A model that got an answer wrong does not have a reliable mechanism for knowing it got it wrong — if it did, it would not have produced the wrong answer. Internal revision therefore adds noise without a reliable correction signal.
Since Does a model improve by arguing with itself?, the mechanism for internal harm is confidence amplification rather than correction: the model does not revise toward correct answers, it revises toward more confidently stated incorrect ones. External debate prevents this by providing genuine challenge.
The practical implication for reasoning system design: do not rely on internal revision loops. If revision is needed, provide an external critic. Do critique models improve diversity during training itself? is the training-time version of the same principle — external critique is more valuable than self-critique across both training and inference.
Source: Test Time Compute
Related concepts in this collection
-
Does self-revision actually improve reasoning in language models?
When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
the internal-revision pole; this note explains the mechanism
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
extends: single-model self-revision amplifies confidence; the same-source problem at the multi-turn level
-
Do critique models improve diversity during training itself?
Explores whether critique integrated into the training loop, beyond test-time scoring, actively maintains solution diversity and prevents the model from converging too narrowly during iterative self-training.
training-time analog: external critique is the fix in RL training; same principle at different timescale
-
Does reflection in reasoning models actually correct errors?
When reasoning models reflect on their answers, do they genuinely fix mistakes, or merely confirm what they already decided? Understanding this matters for designing better training and inference strategies.
explains why internal revision fails: most reflection tokens are confirmatory, not evaluative — the model is not actually generating revision signals, making external critique the only path to genuine correction
-
Why does self-correction training on offline data fail?
Can language models learn to correct their own mistakes through supervised training on correction examples? This explores whether distribution mismatch and behavior collapse prevent self-correction from emerging.
SCoRe shows internal revision can work if properly trained: multi-turn online RL under the model's own error distribution converts internal revision from a harmful default into a trained capability, challenging the conclusion that external critique is the only path
-
Can a model's partial response guide what to retrieve next?
Can generation reveal implicit information needs that the original query cannot express? This explores whether using in-progress responses as retrieval signals outperforms upfront query formulation.
ITER-RETGEN is a retrieval-layer implementation of externally-guided revision: instead of the same model critiquing its own output, the response is used to retrieve new external documents that guide regeneration; the external information source plays the role of external critic
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
revision source determines accuracy outcome — external critique-guided revision improves performance while internal self-assessment-driven revision degrades it