Agentic and Multi-Agent Systems Reinforcement Learning for LLMs

Can AI systems improve themselves through trial and error?

Explores whether replacing formal proof requirements with empirical benchmark testing enables AI systems to successfully modify and improve their own code iteratively, and what mechanisms prevent compounding failures.

Note · 2026-02-23 · sourced from Novel Architectures

The original Gödel Machine proposed self-improving AI via provably beneficial self-modifications. In practice, formally proving the impact of most self-modifications is impossible. The Darwin Gödel Machine (DGM) replaces formal proofs with empirical validation: try modifications, test them on benchmarks, keep what works. This mirrors biological evolution — mutations are not verified in advance but produced, trialed, and selected.

DGM alternates between self-modification and evaluation phases. During self-modification, agents from the archive generate modified versions of themselves — rewriting their own code. During evaluation, each modified agent is tested on coding benchmarks. The key assumption: improvement on coding benchmarks indicates better coding capabilities, which in turn indicates better ability to self-modify. This creates a meta-competence loop: better coding → better self-modification → better coding.

Results: SWE-bench from 20.0% to 50.0%, Polyglot from 14.2% to 30.7%.

The evolutionary archive is critical. Inspired by open-endedness research, DGM maintains a growing library of all generated agent variants — including suboptimal but interesting ones. These serve as stepping stones for future generations, enabling diverse exploration paths. The system doesn't just optimize for immediate performance; it accumulates diverse capabilities that may enable future breakthroughs. This is fundamentally different from single-trajectory self-improvement.

Concrete improvements discovered include better code editing tools, long-context window management, and peer-review mechanisms — capabilities the original agent lacked that emerged through the self-improvement process.

The Python-based implementation makes the self-modification space Turing-complete in principle. The current version modifies agent design (tools, workflows) with frozen foundation models. Full self-improvement — rewriting training scripts, training new foundation models — is left as future work.

This directly addresses What limits how much models can improve themselves?: DGM circumvents the formal proof requirement by using empirical validation, but inherits a different limitation — improvement is bounded by what the benchmark can measure. The archive approach partially addresses How quickly do errors compound during model self-training? by maintaining diverse populations rather than following single improvement trajectories.


Source: Novel Architectures

Related concepts in this collection

Concept map
16 direct connections · 127 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

darwin godel machine achieves open-ended self-improvement by replacing formal proofs with empirical validation and evolutionary archives