Language Understanding and Pragmatics LLM Reasoning and Architecture Psychology and Social Cognition

Does a model improve by arguing with itself?

When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?

Note · 2026-02-21 · sourced from Argumentation
How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

ReConcile (multi-LLM round-table with confidence-weighted voting) isolates a failure mode that earlier work had observed but not mechanistically explained: Degeneration-of-Thought.

The pattern: when a model is asked to reconsider its answer in response to a challenge from itself — its own previous reasoning reframed as external criticism — it doesn't maintain its position or improve it. It capitulates. And crucially, it does so with increasing confidence. The model ends more certain of the wrong answer than it was before self-revision began.

This is worse than no revision at all. Single-model self-reflection degrades not just accuracy but calibration. The model convinces itself.

The contrast with multi-agent debate is sharp. When diverse models challenge each other's reasoning, accuracy improves. The same model that capitulates to its own previous reasoning holds up better when genuinely different reasoning challenges it. The diversity of the external challenge is load-bearing — homogeneous multi-agent systems (same model, multiple instances) degrade similarly to self-revision.

The mechanism: self-revision exposes the model to its own rhetorical patterns. The model finds its own argument familiar and well-framed — the confidence signals it reads in external arguments. Multi-agent diverse debate introduces framing and vocabulary the model did not generate, which it must evaluate on logical rather than stylistic grounds.

This sits alongside Does self-revision actually improve reasoning in language models? but adds the contrastive finding. Self-revision degrades; diverse debate improves. The key variable is not the number of revision steps but the source of the challenge. Why does parallel reasoning outperform single chain thinking? maps the same pattern at the token level — parallel diversity beats sequential revision here at the agent level.

The implication: "self-reflection" as a prompting technique is not a universal improvement. It is specifically harmful when the model is the only source of disagreement. Genuine improvement requires external diversity — either multiple distinct models or structured dissent mechanisms.

Three root causes of DoT (from Arxiv/Agents Multi, MAD framework): The Multi-Agent Debate paper identifies three specific causes of Degeneration-of-Thought: (1) Bias and distorted perception — self-perception influenced by biases and preconceived notions learned from pretraining data, leading to instinctively inaccurate conclusions; (2) Rigidity and resistance to change — the model holds rigid beliefs and struggles to engage in self-reflection that challenges its assumptions; (3) Limited external feedback — self-reflection is purely internal, missing alternative viewpoints and blind spots that external feedback provides. Multi-agent debate is explicitly framed as an "encouragement of divergent thinking" — creating the external pressure that breaks rigidity and provides the feedback loop that self-reflection lacks. The three causes map to three failure dimensions: epistemic (biased priors), motivational (change resistance), and architectural (no external signal).

Society of Minds foundation (Du et al.): The Du et al. "Improving Factuality and Reasoning through Multiagent Debate" paper provides the foundational empirical grounding and the "Society of Mind" framing (after Minsky). In their setup, multiple model instances individually propose responses, then each reads and critiques all others' responses and updates its own answer over multiple rounds. The key structural element: each agent must construct an answer consistent with both its internal critic AND sensible peer assessments — dual coherence requirements that single-model self-revision lacks. This paper documents significant gains in mathematical and strategic reasoning across multiple tasks, and was an early demonstration that diverse external challenge is load-bearing for reasoning improvement.


Source: Argumentation

Related concepts in this collection

Concept map
27 direct connections · 198 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

degeneration of thought is a distinct failure mode where single-model self-revision amplifies confidence in wrong answers while multi-agent debate prevents it