Does agent memory work better at one level of abstraction?
Three competing architectures claim superior agent memory transfer using different abstraction levels. Do they all work, or does one architecture genuinely outperform the others across domains?
Three papers from the agentic cluster — AWM, CLIN, and PRAXIS — each propose a different shape for agent memory and each report transfer gains: AWM extracts abstracted sub-task workflows ("search for a {product-name} on Amazon"), CLIN extracts causal abstractions ("opening doors may be necessary for movement between rooms"), PRAXIS extracts state-dependent local action recall. The papers claim incompatible answers because they implicitly answer different questions. The resolution is not "one wins" but "each wins in the domain where its abstraction matches the structure of the task."
Three domain-shape signatures predict three memory shapes:
Routine-rich domains (e-commerce flows, customer-service scripts, repetitive browser tasks): the variance is in arguments, not in topology. The same workflow recurs with different parameters. Workflow-routine memory compounds because complex workflows are built by composing simpler ones, and the composition graph stays stable across instances. AWM wins.
Environment-rich domains (embodied agents, scientific simulators, novel game environments): the variance is in causal structure, not in arguments. Action consequences depend on environmental state in ways that can be summarized as causal rules. Workflow memory fails because there are no recurring workflows; state-action memory fails because the state space is too large to recall locally. Causal-rule memory transfers because causal structure is the invariant. CLIN wins.
Spatially-rich web tasks (modern web UIs with dense local affordances, dynamic menus, context-dependent actions): the variance is in fine-grained UI state. Workflow abstractions throw away the local visual cues that distinguish a working action from a broken one. State-action local recall preserves what AWM compresses out. PRAXIS wins.
The deeper claim: agent memory design is not a horse race between architectures but a domain-classification problem. Before choosing a memory architecture, classify the deployment domain along the routine-richness, environment-causality, and spatial-density axes — each axis predicts a memory shape. Reframing the AWM/CLIN/PRAXIS contest this way also explains why parallel benchmark wins coexisted: the benchmarks differed along these axes too, so each architecture won in its native habitat. A composite memory system that selects abstraction level per task class would likely beat any single-architecture system on a heterogeneous workload.
Source: Action Models
Related concepts in this collection
-
Can agents learn reusable sub-task routines from past experience?
Does extracting and abstracting sub-task workflows from previous trajectories enable web agents to build complex skills compositionally? This matters because it could explain why agents fail at long-horizon tasks despite strong reasoning abilities.
AWM evidence; workflow-level memory wins in routine-rich domains
-
Can frozen language models learn without updating their parameters?
If agents built on frozen models can't change their weights, what kind of memory structure would let them keep improving across trials and transfer to new tasks? This challenges assumptions about how continual learning must work.
CLIN evidence; causal-rule memory wins in environment-rich domains
-
Does state-indexed memory outperform high-level workflow memory for web agents?
Should procedural memory for web agents be organized around specific environment states and actions, or abstracted into higher-level workflows? This matters because web automation demands precise, context-sensitive recall that workflows might lose.
PRAXIS evidence; state-action memory wins in spatially-rich domains
-
How do agentic AI systems decompose into adaptation paradigms?
What are the core dimensions that distinguish different approaches to adapting agents and tools in agentic systems? Understanding this taxonomy could clarify which adaptation strategy fits which problem.
adjacent design taxonomy; suggests memory granularity is a third dimension that should compose with these
-
How should agents decide what memories to keep?
Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.
orthogonal axis (recall mechanism) that interacts with granularity choice
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
agent memory granularity is domain-conditional — workflow-level for routine-rich tasks, causal-level for environment-rich tasks, state-action-level for spatially-rich web tasks