Reinforcement Learning for LLMs LLM Reasoning and Architecture

Does revising your own reasoning actually help or hurt?

Self-revision in reasoning models often degrades accuracy, while external critique improves it. Understanding what makes revision helpful or harmful could reshape how we design systems that need to correct themselves.

Note · 2026-02-21 · sourced from Test Time Compute

The self-revision literature contains an apparent contradiction. Critique-in-the-loop approaches (AutoMathCritique, Agent-R, Meta-Reasoner) show that revision guided by step-level feedback improves actor model performance. The reasoning model evidence shows that self-revision degrades accuracy — more revision tokens correlate with wrong answers, and smaller models primarily switch correct answers to incorrect during revision, not vice versa. Revision helps in one literature and harms in another.

The resolution: revision source is the determining variable, not the revision act itself.

Externally guided revision: A separate model — potentially better calibrated, trained on critique quality, operating with fresh context — evaluates the current response and provides correction signals. The actor revises against these signals. The quality of the revision is bounded by the quality of the external critic, which can be better than the actor's self-evaluation capacity.

Internally driven revision: The same model second-guesses its own output. The self-evaluation is bounded by the same uncertain capabilities that produced the uncertain output in the first place. A model that got an answer wrong does not have a reliable mechanism for knowing it got it wrong — if it did, it would not have produced the wrong answer. Internal revision therefore adds noise without a reliable correction signal.

Since Does a model improve by arguing with itself?, the mechanism for internal harm is confidence amplification rather than correction: the model does not revise toward correct answers, it revises toward more confidently stated incorrect ones. External debate prevents this by providing genuine challenge.

The practical implication for reasoning system design: do not rely on internal revision loops. If revision is needed, provide an external critic. Do critique models improve diversity during training itself? is the training-time version of the same principle — external critique is more valuable than self-critique across both training and inference.

Source: Test Time Compute

Related concepts in this collection

Does self-revision actually improve reasoning in language models? When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
the internal-revision pole; this note explains the mechanism
Does a model improve by arguing with itself? When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
extends: single-model self-revision amplifies confidence; the same-source problem at the multi-turn level
Do critique models improve diversity during training itself? Explores whether critique integrated into the training loop, beyond test-time scoring, actively maintains solution diversity and prevents the model from converging too narrowly during iterative self-training.
training-time analog: external critique is the fix in RL training; same principle at different timescale
Does reflection in reasoning models actually correct errors? When reasoning models reflect on their answers, do they genuinely fix mistakes, or merely confirm what they already decided? Understanding this matters for designing better training and inference strategies.
explains why internal revision fails: most reflection tokens are confirmatory, not evaluative — the model is not actually generating revision signals, making external critique the only path to genuine correction
Why does self-correction training on offline data fail? Can language models learn to correct their own mistakes through supervised training on correction examples? This explores whether distribution mismatch and behavior collapse prevent self-correction from emerging.
SCoRe shows internal revision can work if properly trained: multi-turn online RL under the model's own error distribution converts internal revision from a harmful default into a trained capability, challenging the conclusion that external critique is the only path
Can a model's partial response guide what to retrieve next? Can generation reveal implicit information needs that the original query cannot express? This explores whether using in-progress responses as retrieval signals outperforms upfront query formulation.
ITER-RETGEN is a retrieval-layer implementation of externally-guided revision: instead of the same model critiquing its own output, the response is used to retrieve new external documents that guide regeneration; the external information source plays the role of external critic

Concept map

20 direct connections · 165 in 2-hop network ·medium cluster

Does revising your own reasoning actually help o… Does self-revision actually improve reasoning in l… Does a model improve by arguing with itself? Do critique models improve diversity during traini… Does reflection in reasoning models actually corre… Why does self-correction training on offline data … Can a model's partial response guide what to retri…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

revision source determines accuracy outcome — external critique-guided revision improves performance while internal self-assessment-driven revision degrades it