INQUIRING LINE

Can AI systems improve themselves without external feedback?

This explores whether AI can bootstrap its own improvement using only internal signals — and the corpus splits sharply on whether 'no external feedback' is ever truly possible, or just feedback in disguise.


This explores whether AI can bootstrap its own improvement using only internal signals — and the most honest answer the corpus offers is: partly, but watch where the feedback is actually coming from. The sharpest framing comes from Can models reliably improve themselves without external feedback?, which argues that pure self-improvement is structurally circular. A model can't reliably grade work it wasn't able to do in the first place (the generation-verification gap), and left alone it tends toward diversity collapse and reward hacking. Its provocative claim: every method that *looks* like self-improvement is quietly smuggling in an external anchor — a frozen past version of the model, a third-party judge, user corrections, or tool results. So the real question becomes not 'can it improve without feedback?' but 'where is the feedback hiding?'

Many of the most striking results in the collection answer that by manufacturing feedback internally rather than importing it. Self-play approaches build an opponent inside the system: Can language models improve themselves without any external training data? pits a problem-proposer against a solver that grades itself by majority vote, and Can language models learn skills without human supervision? adds a neutral judge so a Challenger ratchets up difficulty while a Judge supplies binary rewards. Can models learn to judge themselves without external rewards? has one model alternate between answering and ranking its own answers, mining reward from how *consistent* its judgments are. And Can models learn to evaluate their own work during training? trains a model to compute its own reward in the unused space after its output. These genuinely remove the external reward model — but notice they all replace it with an internal tournament, vote, or consistency check. The signal is invented, not absent.

A second cluster sidesteps reward entirely by treating the *consequences of the model's own actions* as supervision. Can agents learn from their own actions without external rewards? calls this a third paradigm between imitation and RL: an agent acts, observes the resulting states, and learns from them — matching expert-trained baselines with half the data and no reward at all. Can agents learn from failure without updating their weights? (Reflexion) is similar but gentler on the weights — the agent writes verbal self-diagnoses after failures and stores them as memory, improving across episodes without any parameter update. Both still lean on something external in disguise, though: an *environment* that tells them whether they succeeded. That's feedback the world supplies for free, which is exactly the loophole the 'mirage' paper allows.

The trial-and-error evolutionary route pushes furthest toward autonomy. Can AI systems improve themselves through trial and error? rewrites its own code and keeps an archive of variants, validating them empirically on benchmarks for 2–2.5× gains, while Can an AI system improve its own search methods automatically? has an outer loop read and rewrite its own inner search algorithm, discovering new optimization methods for a 5× improvement. Impressive — but again, the benchmark or task score *is* the external anchor. Which is why Can AI systems improve their own learning strategies? names the deeper gap: today's self-improvement runs on fixed, human-designed metacognitive loops that break under domain shift. Truly autonomous improvement would require an agent to invent its *own* criteria for what counts as better — and the corpus flags that as a still-open research frontier.

The thing worth carrying away: the field has largely stopped asking whether AI can improve with *zero* feedback and started asking how cheaply and internally feedback can be *generated*. Whether through self-play, action-consequence learning, or evolutionary self-modification, the winning recipe is the same — build a reliable internal signal and guard obsessively against collapse, since the moment a model fully trusts its own judgment of itself, it tends to drift, hack its own reward, or shrink its range of behavior. Supporting tools like Can agents learn new skills without forgetting old ones? (storing skills outside the weights) and Can breaking down instructions into checklists improve AI reward signals? (breaking fuzzy goals into checkable pieces) show the practical edge of the same idea: the more verifiable you can make your own internal signal, the further self-improvement gets before circularity bites.


Sources 12 notes

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Can language models improve themselves without any external training data?

SQLM uses a proposer-solver framework where the proposer generates calibrated problems and the solver learns via majority-vote verification. Both agents improve through RL alone, creating an automatic curriculum that scales without human labels or ground-truth answers.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Can models learn to judge themselves without external rewards?

SERL enables self-improving language models by having them alternate between generating responses and judging them pairwise, deriving rewards from ranking consistency and self-consistency of judgments. On AlpacaEval, this reached 59.90% win rate without external signals, up from 52.37%.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Can agents learn from their own actions without external rewards?

Research across eight environments shows that agents can use future states from their own actions as supervision without external rewards, matching expert-dependent baselines with half the data and providing superior warm-starts for subsequent RL training.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can AI systems improve themselves through trial and error?

DGM replaces formal proofs with empirical benchmarking and maintains an evolutionary archive of agent variants, achieving 2.5× improvement on SWE-bench and 2.2× on Polyglot by discovering capabilities like better code editing and context management.

Can an AI system improve its own search methods automatically?

An outer loop successfully read inner loop code, identified bottlenecks, and generated new Python mechanisms at runtime, discovering combinatorial optimization and bandit methods that broke the inner loop's deterministic patterns and improved performance on GPT pretraining by 5x.

Can AI systems improve their own learning strategies?

Current self-improvement methods use extrinsic, fixed metacognitive loops designed by humans that fail under domain shift or capability changes. True self-improvement requires agents to generate their own adaptive metacognitive knowledge, planning, and evaluation—a gap confirmed as a neglected research area across neuro-symbolic AI.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can breaking down instructions into checklists improve AI reward signals?

RLCF and RaR methods decompose instruction quality into verifiable sub-criteria, improving performance on benchmarks like FollowBench and HealthBench. This decomposition principle reduces overfitting to superficial artifacts that plague holistic reward models.

Next inquiring lines