Why do error avalanches accelerate in self-training loops without verification?

This explores why, when a model trains on its own outputs with no external check on correctness, small errors don't just persist but compound faster and faster.

This explores why a model that learns from its own outputs — with nothing external confirming those outputs are right — degrades faster and faster rather than gently plateauing. The corpus points to a single root cause with several reinforcing mechanisms. The headline finding is blunt: errors don't accumulate linearly, they avalanche exponentially, often within just two or three self-training iterations, settling into an error floor set by the quality of verification rather than by the model's actual ceiling How quickly do errors compound during model self-training?. The acceleration is the point — each round trains on data dirtier than the last, so the next round's outputs are dirtier still.

What makes the loop vicious is that the model is the worst possible judge of its own work. LLMs carry a structural bias toward trusting answers they generated themselves, because their own high-probability outputs simply *feel* more correct during evaluation Why do models trust their own generated answers?. So the very filter you'd hope would catch bad samples is tilted to wave them through. Worse, once errors enter the context they actively bias what comes next: prior mistakes in the history amplify future error rates non-linearly, and — strikingly — scaling the model doesn't fix it Do models fail worse when their own errors fill the context?. There's also a calibration trap underneath: training on binary correct/wrong signals rewards confident guessing, so the model gets more sure of itself exactly as it gets more wrong Does binary reward training hurt model calibration?.

The deepest framing in the corpus says this isn't a tuning problem you can engineer around — it's a formal bound. Self-improvement is limited by the generation-verification gap: every reliable fix requires *something external* to validate and enforce it, and no amount of metacognition lets a model escape this on its own What stops large language models from improving themselves?. That single idea explains why the avalanche accelerates without verification: the loop has no outside reference, so it converges on its own biases instead of on the truth.

The flip side is the genuinely useful discovery: the same loop becomes powerfully *self-correcting* the moment you bolt on a verifier. Transformers learning addition jump from 10 digits to 100 digits with exponential out-of-distribution gains — not because self-training is magic, but because they generate, *filter for correctness*, and only then retrain Can transformers improve exponentially by learning from their own correct solutions?. Asymmetric self-play works without any human labels by using majority-vote verification as its truth signal Can language models improve themselves without any external training data?, and the Darwin Gödel Machine improves open-endedly by swapping unprovable claims for empirical benchmarking Can AI systems improve themselves through trial and error?. The variable that decides whether a loop avalanches or ascends is the same one in every case: the verifier.

What's less obvious is *how cheap* that verifier can be. You don't need a bigger model — you need an external check, and the corpus is full of inexpensive ones. Asynchronous verifiers can police reasoning traces with near-zero latency on correct runs Can verifiers monitor reasoning without slowing generation down?; models can be trained to compute their own reward in unused post-output sequence space at zero inference cost Can models learn to evaluate their own work during training?; and extreme task decomposition with per-step voting lets even small non-reasoning models execute a million steps error-free by catching mistakes before they propagate Can extreme task decomposition enable reliable execution at million-step scale?. The lesson the corpus leaves you with: the avalanche isn't caused by self-training — it's caused by self-*trust*. Insert any honest external signal and the same dynamics run in reverse.

Sources 11 notes

How quickly do errors compound during model self-training?

Small inaccuracies in model-generated training data amplify rapidly across iterations, degrading performance unless self-consistency checks filter outputs. The effect stalls improvement within a few steps, setting an error floor based on verification quality rather than actual capability.

Why do models trust their own generated answers?

LLMs exhibit structural bias toward validating their own outputs because high-probability generated answers feel more correct during evaluation. Comparing answers against broader alternatives breaks this self-agreement loop.

Do models fail worse when their own errors fill the context?

Error accumulation in context causes non-linear performance degradation in long-horizon tasks. Model scaling does not fix this; only test-time compute through thinking models reduces the effect by preventing error-contaminated context from biasing reasoning.

Does binary reward training hurt model calibration?

Binary correctness rewards incentivize high-confidence guessing because they don't penalize confident wrong answers. Adding the Brier score as a second reward term mathematically guarantees joint optimization of accuracy and calibration without trade-off.

What stops large language models from improving themselves?

Self-improvement in LLMs is formally bounded by the generation-verification gap, meaning every reliable fix requires something external to validate and enforce it. Models cannot escape this constraint through metacognition alone.

Can transformers improve exponentially by learning from their own correct solutions?

Standard transformers generalize from 10-digit to 100-digit addition by repeatedly generating solutions, filtering for correctness, and retraining—showing exponential (not linear) out-of-distribution improvement across rounds without saturation.

Can language models improve themselves without any external training data?

SQLM uses a proposer-solver framework where the proposer generates calibrated problems and the solver learns via majority-vote verification. Both agents improve through RL alone, creating an automatic curriculum that scales without human labels or ground-truth answers.

Can AI systems improve themselves through trial and error?

DGM replaces formal proofs with empirical benchmarking and maintains an evolutionary archive of agent variants, achieving 2.5× improvement on SWE-bench and 2.2× on Polyglot by discovering capabilities like better code editing and context management.

Can verifiers monitor reasoning without slowing generation down?

Decoupling verification from generation lets verifiers run alongside a single trace, forking to extract verifiable state and intervening only on violations. On correct runs the latency penalty is near-zero; interwhen matches or beats CoT across benchmarks at similar token budgets.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Can extreme task decomposition enable reliable execution at million-step scale?

MAKER solves million-step tasks with zero errors by decomposing into minimal subtasks, applying voting at each step, and flagging correlated errors. Surprisingly, small non-reasoning models suffice when decomposition is extreme enough, inverting the standard approach to hard problems.

Why do error avalanches accelerate in self-training loops without verification?

Sources 11 notes

Next inquiring lines