Agentic and Multi-Agent Systems LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can agents learn continuously without forgetting old skills?

Can lifelong learning systems retain previously acquired skills while acquiring new ones? This explores whether externalizing learned behaviors as retrievable code programs rather than parameter updates solves catastrophic forgetting.

Note · 2026-02-23 · sourced from Agents

VOYAGER introduces an architecture for lifelong learning that solves the catastrophic forgetting problem through externalization rather than internal parameter management. Three components work together:

  1. Automatic curriculum — proposes tasks based on the agent's current skill level and world state (finding yourself in a desert means harvesting sand before iron). Generated by GPT-4 with the overarching goal of "discovering as many diverse things as possible" — an in-context form of novelty search.

  2. Ever-growing skill library — each successfully completed task produces an executable code program stored in the library, indexed by the embedding of its description. When similar situations arise, relevant skills are retrieved by semantic similarity. This externalizes learned behavior as retrievable artifacts rather than weight updates.

  3. Iterative prompting with environment feedback — incorporates execution errors, environment feedback, and self-verification for program improvement. The agent refines skills based on real-world outcomes.

The compounding mechanism is the key insight: complex skills are synthesized by composing simpler programs. Fighting zombies builds on combat primitives; navigating a cave builds on movement and resource-gathering skills. This composition enables rapid capability growth without the forgetting that plagues weight-update-based continual learning methods.

Three lifelong learning requirements are met: (1) propose suitable tasks based on current capability and context, (2) refine skills from environmental feedback and commit to memory, (3) continually explore in a self-driven manner. These parallel the three requirements of the When should proactive agents push toward their goals versus accommodate users? framework — goal awareness, context adaptation, and initiative.

Because Can agents learn from failure without updating their weights?, VOYAGER's skill library is a more structured version of the same principle: externalize learning as retrievable artifacts. The embedding-indexed retrieval means skills are found by semantic similarity, not exact match — enabling transfer to novel but related situations.

Since Can communication pressure drive agents to learn shared abstractions?, the skill library pattern may generalize: agents under performance pressure naturally develop reusable, composable abstractions.


Source: Agents

Related concepts in this collection

Concept map
18 direct connections · 190 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

compositional skill libraries that compound through synthesis enable lifelong learning without catastrophic forgetting