Agentic and Multi-Agent Systems

Can agents learn reusable sub-task routines from past experience?

Does extracting and abstracting sub-task workflows from previous trajectories enable web agents to build complex skills compositionally? This matters because it could explain why agents fail at long-horizon tasks despite strong reasoning abilities.

Note · 2026-05-03 · sourced from Action Models

Agent Workflow Memory (AWM) takes the human heuristic of abstracting routines from past experience and operationalizes it for web agents. The diagnostic claim is that current agents fail at long-horizon tasks not because they lack reasoning but because they cannot extract and reuse sub-task workflows shared across similar tasks — they solve each task in isolation and never accumulate transferable skill structure.

AWM's intervention has two design choices that matter. First, granularity is below the task level: rather than memorizing "Buy dry cat food on Amazon and deliver to my address," the system induces "search for a product on Amazon" — a sub-task that re-appears across many top-level tasks. Second, example-specific contexts are abstracted out — "dry cat food" becomes "{product-name}" — so the workflow is reusable rather than overfit to its source trace.

The compounding effect is the key behavior. Once "find a place by its name" exists, it serves as a building block for "get the zip code of a place." Skill memory therefore grows hierarchically: complex workflows are constructed on top of previously acquired ones. Empirically this produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with a 22.5-point gap on WebArena after only tens of examples. Critically, online AWM's advantage widens as the train-test gap grows — from 8.9 to 14.0 absolute points — because workflow abstractions transfer where memorized trajectories do not.

The implication is that the right unit of agent memory is the sub-task routine with abstracted variables, not the full task trajectory and not generic helpful hints. The unit should be small enough to recur, abstracted enough to transfer, and structured enough to compose — a position that contrasts directly with Does state-indexed memory outperform high-level workflow memory for web agents?, where PRAXIS argues the opposite: that state-indexed local procedures outperform abstracted workflows precisely because abstraction loses the click-by-click specifics web environments demand.


Source: Action Models

Related concepts in this collection

Concept map
13 direct connections · 83 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

agent workflow memory induces reusable sub-task routines and compounds them — yielding 24-51 percent relative success gains and snowballing skill complexity