Agentic and Multi-Agent Systems LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can agents learn continuously without forgetting old skills?

Can lifelong learning systems retain previously acquired skills while acquiring new ones? This explores whether externalizing learned behaviors as retrievable code programs rather than parameter updates solves catastrophic forgetting.

Note · 2026-02-23 · sourced from Agents

VOYAGER introduces an architecture for lifelong learning that solves the catastrophic forgetting problem through externalization rather than internal parameter management. Three components work together:

Automatic curriculum — proposes tasks based on the agent's current skill level and world state (finding yourself in a desert means harvesting sand before iron). Generated by GPT-4 with the overarching goal of "discovering as many diverse things as possible" — an in-context form of novelty search.
Ever-growing skill library — each successfully completed task produces an executable code program stored in the library, indexed by the embedding of its description. When similar situations arise, relevant skills are retrieved by semantic similarity. This externalizes learned behavior as retrievable artifacts rather than weight updates.
Iterative prompting with environment feedback — incorporates execution errors, environment feedback, and self-verification for program improvement. The agent refines skills based on real-world outcomes.

The compounding mechanism is the key insight: complex skills are synthesized by composing simpler programs. Fighting zombies builds on combat primitives; navigating a cave builds on movement and resource-gathering skills. This composition enables rapid capability growth without the forgetting that plagues weight-update-based continual learning methods.

Three lifelong learning requirements are met: (1) propose suitable tasks based on current capability and context, (2) refine skills from environmental feedback and commit to memory, (3) continually explore in a self-driven manner. These parallel the three requirements of the When should proactive agents push toward their goals versus accommodate users? framework — goal awareness, context adaptation, and initiative.

Because Can agents learn from failure without updating their weights?, VOYAGER's skill library is a more structured version of the same principle: externalize learning as retrievable artifacts. The embedding-indexed retrieval means skills are found by semantic similarity, not exact match — enabling transfer to novel but related situations.

Since Can communication pressure drive agents to learn shared abstractions?, the skill library pattern may generalize: agents under performance pressure naturally develop reusable, composable abstractions.

Source: Agents

Related concepts in this collection

Can agents learn from failure without updating their weights? Explores whether language models can improve through trial-and-error by storing reflections in memory rather than through gradient-based parameter updates. Tests if environmental feedback alone can drive learning.
related architecture: episodic memory as external learning
Can communication pressure drive agents to learn shared abstractions? Under what conditions do AI agents develop compact, efficient shared languages? This explores whether cooperative task pressure—rather than explicit optimization—naturally drives abstraction formation, mirroring human collaborative communication.
same pattern: reusable abstractions under optimization pressure
When should proactive agents push toward their goals versus accommodate users? Proactive dialogue agents face a tension between reaching their objectives efficiently and keeping users satisfied. This question explores whether these two aims can coexist or require constant negotiation.
parallel requirements for autonomous goal setting
Does self-generated training data improve model learning? Can models learn more effectively from training data they generate themselves rather than data created by external sources? This explores whether a learner's own restructuring process produces better learning outcomes.
SEAL: model-specific data as capability building blocks
Can agents learn continuously through memory without updating weights? Explores whether LLM agents can adapt to new tasks and failures by retrieving and updating past experiences stored in memory, rather than requiring expensive parameter fine-tuning.
AgentFly composes cases where VOYAGER composes skills; both achieve continual learning without parameter updates, but AgentFly adds a Q-function for principled case retrieval beyond static similarity
Can neural networks learn compositional skills without symbolic mechanisms? Do neural networks need explicit symbolic architecture to compose learned concepts, or can scaling alone enable compositional generalization? This asks whether compositionality is an architectural feature or an emergent property of scale.
VOYAGER's skill library implements compositional generalization externally: complex skills are synthesized from simpler skill programs, achieving the linear-scaling efficiency the MLP proof demonstrates; the embedding-indexed retrieval ensures the training distribution covers the compositional space
Can we teach LLMs to form linguistic conventions in context? Humans naturally shorten references as conversations progress, but LLMs don't adapt their language for efficiency even when they understand their partners do. Can training on coreference patterns teach this convention-forming behavior?
both VOYAGER and convention formation involve agents developing compact reusable abstractions through interaction: skills are behavioral conventions for task completion, and linguistic conventions are communicative skills for efficient reference; the shared mechanism is that repeated interaction under performance pressure drives abstraction

Concept map

18 direct connections · 190 in 2-hop network ·dense cluster

Can agents learn continuously without forgetting… Can agents learn from failure without updating the… Can communication pressure drive agents to learn s… When should proactive agents push toward their goa… Does self-generated training data improve model le… Can agents learn continuously through memory witho… Can neural networks learn compositional skills wit… Can we teach LLMs to form linguistic conventions i…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

compositional skill libraries that compound through synthesis enable lifelong learning without catastrophic forgetting