Can transformers improve exponentially by learning from their own correct solutions?
Can standard transformers achieve extreme length generalization by iteratively filtering and training on their own correct outputs? This explores whether self-correction loops enable unbounded out-of-distribution improvement without architectural changes.
"Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges" (2502.01612) demonstrates that standard transformer architectures can achieve extreme out-of-distribution generalization through a self-improvement loop: generate solutions, filter for correctness, train on the correct ones, repeat.
The results across arithmetic, string manipulation, and maze solving show generalization far beyond the training distribution — 10-digit to 100-digit addition without apparent saturation. The critical mechanism: filtering for correct self-generated examples produces exponential improvement in OOD performance across training rounds. Not linear. Exponential.
This is achieved without any modification to the base transformer architecture. No external verifiers beyond a correctness check. No curriculum design. No reward models. The model's own ability to occasionally solve harder problems (via sampling variance) provides the training signal for the next round. The correctness filter is the critical factor that distinguishes this from How quickly do errors compound during model self-training? — without verification, small errors compound exponentially in the wrong direction; with verification, correct solutions compound exponentially in the right direction.
The finding directly challenges What limits how much models can improve themselves?. The generation-verification gap says self-improvement is bounded because the model cannot verify better than it generates. But for tasks with automated verification (arithmetic, string manipulation), the verification is perfect — the gap vanishes. This is exactly the class of tasks where self-improvement works unboundedly.
Since Can language models improve themselves without any external training data?, the self-improving transformer uses a different but related mechanism: the model serves as both proposer (generating candidate solutions at harder scales) and solver (learning from its own correct solutions). The asymmetry comes from the fact that generating one correct solution to a harder problem is easier than reliably solving all harder problems.
The exponential improvement finding may explain why Can a single training example unlock mathematical reasoning?. If a single correct example at the boundary can seed an exponential self-improvement cascade, then the minimal signal needed for activation is genuinely minimal.
Source: LLM Architecture
Related concepts in this collection
-
What limits how much models can improve themselves?
Explores whether self-improvement has fundamental boundaries set by how well models can verify versus generate solutions, and what this means across different task types.
self-improving transformers exploit the vanishing gap for verifiable tasks
-
Can language models improve themselves without any external training data?
Explores whether two language models playing against each other—one generating questions, one solving them—can create a self-improving loop. Matters because it would eliminate dependence on human-labeled datasets.
related self-improvement mechanism
-
Can a single training example unlock mathematical reasoning?
Does minimal data suffice to activate latent reasoning capabilities in language models? This explores whether one example can produce dramatic performance gains comparable to much larger datasets.
exponential cascade may explain minimal activation thresholds
-
How quickly do errors compound during model self-training?
When LLMs train on their own outputs without verification, do small mistakes amplify exponentially? This matters because it determines whether unsupervised self-improvement is even feasible.
direct tension: error avalanching predicts self-training collapses rapidly, while self-improving transformers achieve exponential improvement; the resolution is verification quality — self-improving transformers filter for correctness using automated verification (arithmetic, string matching), which prevents error accumulation; error avalanching occurs when self-training uses unverified outputs where small errors compound; the boundary between self-improvement and error avalanching is the verification gap
-
Can AI systems improve themselves through trial and error?
Explores whether replacing formal proof requirements with empirical benchmark testing enables AI systems to successfully modify and improve their own code iteratively, and what mechanisms prevent compounding failures.
extends self-improvement from task-specific domains (arithmetic, string manipulation) to general code-writing capability; DGM's evolutionary archive enables open-ended exploration while self-improving transformers follow a single improvement trajectory — population diversity vs. correctness filtering as alternative mechanisms for sustaining improvement
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
self-improving transformers achieve extreme length generalization through iterative self-generated solutions with exponential out-of-distribution improvement