Real-Time Procedural Learning From Experience for AI Agents

Paper · arXiv 2511.22074 · Published November 27, 2025

AI agents are artificial intelligence systems capable of observing and taking actions in an environment. As adoption spreads across industries, there is a growing need for AI agents to quickly learn domain- or user-specific information. There are two main classes of information that agents are typically expected to learn:

(1) Facts: atomic pieces of information independent of the state of the agent or environment (e.g., a user’s name, preferences, organizational charts). Facts can change over time, but at any given moment they are generally context-independent.

(2) Procedures: established conventions for doing things (e.g., "how to troubleshoot a failed login" or "how to guide a customer through a sales process to the most fitting product"). Procedures can be viewed as a sequence of state-dependent requirements or preferences over actions.

In real-world applications, learning and optimizing procedures in real time are at least as important as learning facts. While frameworks such as Mem0 [4] and Letta [12] focus on long-term factual memory, effective post-training learning of procedures in AI agents remains relatively underexplored.

A naïve approach is a priori procedural specification: a human writes rules or standard operating procedures (SOPs) that are included in the agent’s context at inference time. This approach effectively reduces procedures to a large bundle of facts. In practice, this approach faces challenges because (1) many procedures are not fully documented, as humans are often trained by observation rather than by reading long SOPs; (2) enumerating all states and edge cases in a combinatorial space is difficult; and (3) procedures can become obsolete quickly as environments change. We argue that a more robust approach is to learn procedures a posteriori from demonstrations or experience. Inspired by state-dependent memory in psychology [3, 14], we propose PRAXIS, a concrete method for procedural recall and show that it improves agent accuracy, reliability, and efficiency in web browsing settings. Our method is compatible with both experiences demonstrated by a human expert and actual trajectories generated by the AI agent itself.

1.2 Web Agents and the Browser Environment

Human-facing web applications almost always require multi-step interactions to accomplish meaningful goals (e.g., purchasing an item online requires searching, filtering, logging in, completing forms, and checking out)—a procedure. Moreover, these procedures must also adapt to changing environments (e.g., an e-commerce site may have seasonal pop-ups or redesigned interfaces), making web browsing a natural environment to study a posteriori procedural learning. Importantly, even when tasks are obvious to humans, comprehensive procedures are rarely documented, and high personalization limits pretraining coverage in foundation models. As AI-based design tools increasingly generate and update web platforms, the economic value shifts to novel, previously unseen interfaces and interaction flows, pushing agents into out-of-distribution states and rendering a priori SOPs brittle. A post-training, state-indexed procedural memory thus becomes essential for robust web automation, allowing agents to acquire and reuse procedures precisely when new states appear.

2 Related Work

External Memory for AI Chatbots. A broad class of systems augment LLMs with non-parametric memory in the conversational environment. Retrieval-augmented generation (RAG) attaches a document store to provide factual knowledge at inference time [8]. In agentic settings, persistent memory frameworks such as Letta (formerly MemGPT) [12] provide hierarchical storage and dynamic context management. Mem0 [4] provides a queryable, cross-session memory for user preferences and long-range conversational context. Academic frameworks include MemoryBank [19] which mimics human long-term memory with continual decay and reinforcement, and A-MEM [16] which dynamically links and evolves structured notes. These approaches typically focus on factual memory for agents in conversational environments. In contrast, our method focuses on learning action policies in stateful visual environments that are significantly more complex and not entirely observable like the web environment.

Experience-Based Self-Improvement and Workflow Memories. A complementary line of work improves agents via self-reflection. Reflexion [13] maintains verbal reflections in an episodic buffer to guide subsequent trials; Self-Refine [10] iteratively critiques and edits its own outputs; and CLIN [11] performs continual task adaptation with a persistent textual memory of causal abstractions. These methods are effective but are not well-tested in visual environments and generally do not encode information of environmental state. There has also been a line of work on experience- or workflow-based memories for agents. Agent Workflow Memory [15], Synapse [18], and ExpeL [17] induce abstracted, natural language workflows from successful trajectories and retrieve them to augment prompts at test time. In contrast, our method performs local state-based recall that is grounded primarily in the live environment state, a factor not present in prior works, and secondarily to the goal. Moreover, we index memories with explicit state and action descriptors, rather than high-level trajectories, enabling precise recall and learning of minute details required for environments like the web.

5 Discussion

Personalized learning as a critical component of AI agents in the economy. We envision a future where, instead of replacing humans, AI agents work alongside them. To encompass the diverse activities across our economy and maintain high collaboration efficiency, users will need to customize their AI agent with their own data and procedures. Even if it becomes possible to quickly and efficiently train everything into a single, universal model, we may not wish to, as each user should be able to decide whether to share their private knowledge with the world. In this context, personalized learning methods like PRAXIS that customize agents, not only on a superficial level, but also in terms of real capabilities, will be critical to the adoption of agents in the economy.

Summary of contributions. This paper introduces state-dependent memory, an a posteriori learning mechanism that stores local interaction traces and retrieves them by jointly matching the environment state and the agent’s internal objective. When tested as an integration into the Altrina web agent, our method yields consistent improvements on the REAL web browsing benchmark across diverse VLM backbones: higher average accuracy, higher best-of-5 accuracy, better reliability, and fewer steps to completion. An ablation shows increased gains with retrieval breadth 𝑘. Together, our results indicate that state-dependent memory provides reusable local state-to-action priors that guide AI agents towards robust, generalizable behavior.

Future directions. Beyond web environments. State-dependent memory is conceptually agnostic to the environment, and the same idea can be naturally extended to general cases of agentic computer use. Richer state encoding. Our proof-of-concept implementation of state-dependent memory uses basic visual and DOM feature overlap along with simple similarity metrics. A richer encoder can improve both retrieval quality and invariance to superficial changes.