Can agents learn reusable sub-task routines from past experience?
Does extracting and abstracting sub-task workflows from previous trajectories enable web agents to build complex skills compositionally? This matters because it could explain why agents fail at long-horizon tasks despite strong reasoning abilities.
Agent Workflow Memory (AWM) takes the human heuristic of abstracting routines from past experience and operationalizes it for web agents. The diagnostic claim is that current agents fail at long-horizon tasks not because they lack reasoning but because they cannot extract and reuse sub-task workflows shared across similar tasks — they solve each task in isolation and never accumulate transferable skill structure.
AWM's intervention has two design choices that matter. First, granularity is below the task level: rather than memorizing "Buy dry cat food on Amazon and deliver to my address," the system induces "search for a product on Amazon" — a sub-task that re-appears across many top-level tasks. Second, example-specific contexts are abstracted out — "dry cat food" becomes "{product-name}" — so the workflow is reusable rather than overfit to its source trace.
The compounding effect is the key behavior. Once "find a place by its name" exists, it serves as a building block for "get the zip code of a place." Skill memory therefore grows hierarchically: complex workflows are constructed on top of previously acquired ones. Empirically this produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with a 22.5-point gap on WebArena after only tens of examples. Critically, online AWM's advantage widens as the train-test gap grows — from 8.9 to 14.0 absolute points — because workflow abstractions transfer where memorized trajectories do not.
The implication is that the right unit of agent memory is the sub-task routine with abstracted variables, not the full task trajectory and not generic helpful hints. The unit should be small enough to recur, abstracted enough to transfer, and structured enough to compose — a position that contrasts directly with Does state-indexed memory outperform high-level workflow memory for web agents?, where PRAXIS argues the opposite: that state-indexed local procedures outperform abstracted workflows precisely because abstraction loses the click-by-click specifics web environments demand.
Source: Action Models
Related concepts in this collection
-
Does state-indexed memory outperform high-level workflow memory for web agents?
Should procedural memory for web agents be organized around specific environment states and actions, or abstracted into higher-level workflows? This matters because web automation demands precise, context-sensitive recall that workflows might lose.
tension with: AWM claims abstracted workflows transfer best; PRAXIS claims state-indexed local procedures beat abstracted workflows because abstraction loses the click-by-click specifics. Both target web agents on similar benchmarks.
-
Can frozen language models learn without updating their parameters?
If agents built on frozen models can't change their weights, what kind of memory structure would let them keep improving across trials and transfer to new tasks? This challenges assumptions about how continual learning must work.
complements: CLIN abstracts causal-rule memory; AWM abstracts sub-task workflow memory; both argue the *shape* of textual memory matters more than the model. Three-way memory-granularity tension when paired with PRAXIS.
-
Can agents learn continuously without forgetting old skills?
Can lifelong learning systems retain previously acquired skills while acquiring new ones? This explores whether externalizing learned behaviors as retrievable code programs rather than parameter updates solves catastrophic forgetting.
extends: Voyager builds an ever-growing skill library by synthesis; AWM operationalizes the same compounding principle for web-agent sub-task routines.
-
Can agents learn from failure without updating their weights?
Explores whether language models can improve through trial-and-error by storing reflections in memory rather than through gradient-based parameter updates. Tests if environmental feedback alone can drive learning.
extends: Reflexion stores raw trial outcomes; AWM stores abstracted sub-task workflows. The progression is generic-hint → causal-rule → workflow-routine.
-
How can agent systems share learned skills across users?
Individual users operating autonomous agents independently rediscover solutions because systems lack mechanisms to propagate discoveries. Can centralized aggregation and automatic evolution convert isolated experiences into shared capabilities?
complements: AWM is single-agent skill compounding; SkillClaw is cross-agent skill propagation.
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
agent workflow memory induces reusable sub-task routines and compounds them — yielding 24-51 percent relative success gains and snowballing skill complexity