Does state-indexed memory outperform high-level workflow memory for web agents?
Should procedural memory for web agents be organized around specific environment states and actions, or abstracted into higher-level workflows? This matters because web automation demands precise, context-sensitive recall that workflows might lose.
PRAXIS distinguishes two kinds of agent knowledge — facts (atomic, context-independent at any moment) and procedures (state-dependent sequences over actions) — and argues procedures are at least as important as facts for real-world deployment yet remain underexplored compared to factual memory frameworks like Mem0 and Letta.
The standard alternative — a-priori procedural specification, where humans write SOPs included in the agent's context — fails for three structural reasons. Many procedures are not fully documented because humans learn by observation rather than reading SOPs. Enumerating all states and edge cases in a combinatorial space is intractable. And procedures become obsolete quickly as environments change. The brittleness intensifies as AI design tools generate novel interfaces that push agents into out-of-distribution states.
PRAXIS's response is a-posteriori learning of procedures from demonstrations or experience, indexed by environment state. The key differentiation from Agent Workflow Memory (see Can agents learn reusable sub-task routines from past experience?), Synapse, and ExpeL — which abstract workflows from successful trajectories at the high-level natural-language workflow tier — is that PRAXIS performs local state-based recall grounded primarily in the live environment state and secondarily to the goal. Memories are indexed with explicit state and action descriptors rather than high-level trajectory summaries, enabling precise recall of minute details that web environments require.
This is a direct architectural disagreement with Can frozen language models learn without updating their parameters? (CLIN: causal-rule memory transfers best) and AWM (workflow-routine memory compounds best). All three target the question "what shape should agent memory take?" and pick different answers — causal rules, abstracted workflows, or local state-action pairs — with PRAXIS arguing the first two abstract too far from the specifics web automation demands.
Empirically, integrating state-dependent memory into the Altrina web agent yields consistent improvements on the REAL benchmark across diverse VLM backbones: higher average accuracy, higher best-of-5, better reliability, fewer steps to completion. An ablation shows gains increase with retrieval breadth k. The structural claim is that reusable local state-to-action priors are what guide robust generalizable behavior — not abstracted workflows that transfer the gist but lose the click-by-click specifics web automation demands.
This note is tagged type: tension because the disagreement with AWM and CLIN is real and unresolved — see ops/tensions/agent memory granularity tension across AWM CLIN and PRAXIS for the cross-paper tension capture.
Source: Tool Computer Use
Related concepts in this collection
-
Can agents learn reusable sub-task routines from past experience?
Does extracting and abstracting sub-task workflows from previous trajectories enable web agents to build complex skills compositionally? This matters because it could explain why agents fail at long-horizon tasks despite strong reasoning abilities.
contradicts: PRAXIS argues local state-action recall beats workflow abstraction; AWM argues abstracted sub-task workflows compound and transfer best. Direct architectural disagreement on memory granularity.
-
Can frozen language models learn without updating their parameters?
If agents built on frozen models can't change their weights, what kind of memory structure would let them keep improving across trials and transfer to new tasks? This challenges assumptions about how continual learning must work.
contradicts: CLIN advocates causal-rule abstractions; PRAXIS argues abstractions of any kind lose the specifics web environments need. Three-way memory-shape tension when paired with AWM.
-
How can GUI agents adapt when software constantly changes?
Can desktop automation agents stay current by combining real-time web documentation with learned task patterns and concrete execution memories? This explores how to avoid training obsolescence in open-world software environments.
partial agreement: Agent S's episodic memory is closer to PRAXIS than its narrative memory; PRAXIS would predict Agent S's gains come from the episodic layer and the narrative layer is dispensable for web automation.
-
How should multimodal agents organize their memory?
Can organizing agent memory around entities and separating episodic events from semantic knowledge enable more natural, preference-aware assistance without constant clarification?
complements: M3-Agent splits episodic vs semantic at the storage layer; PRAXIS focuses on the procedural memory dimension that M3-Agent leaves underspecified.
-
Why do LLM agents ignore condensed experience summaries?
LLM agents faithfully learn from raw experience but systematically disregard condensed summaries of the same experience. This study investigates whether the problem lies in how summaries are made, how models process them, or whether models simply don't need them.
supports: agents systematically ignore condensed experience, which would predict that high-level workflow abstractions degrade exactly the way PRAXIS observes — supporting evidence for state-dependent local memory over abstracted summaries.
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
state-dependent procedural memory beats workflow-level memory for web agents — local state-action recall captures details that high-level trajectory abstractions lose