INQUIRING LINE

How does PRAXIS differ architecturally from Agent Workflow Memory and causal rule learning?

This explores how agents accumulate and reuse experience — and what architectural unit each approach extracts: Agent Workflow Memory stores reusable sub-task routines, while the corpus's adjacent work stores skills, episodic cases, or pruned reasoning trees instead.


This reads the question as being about the *unit of reuse* — what an agent extracts from past experience and how it stores and recomposes it. One caveat up front: the corpus doesn't contain a note named PRAXIS or one specifically on causal rule learning, so this synthesizes the architectural landscape they'd sit inside rather than naming all three head-to-head. What it does have is a sharp spread of design choices along exactly that axis.

Agent Workflow Memory is the clearest anchor: it extracts *sub-task routines* at finer granularity than whole tasks, strips out example-specific values to make them reusable, and compounds them hierarchically — and the gains grow as the gap between training and test situations widens Can agents learn reusable sub-task routines from past experience?. The architectural commitment is procedural: the reusable thing is a *how-to*, abstracted from the specifics. VOYAGER makes a sibling choice but stores executable *skills* in an embedding-indexed library and composes complex skills from simpler ones, which is what lets it learn continuously without the catastrophic forgetting that weight-update methods suffer Can agents learn new skills without forgetting old ones?. Both externalize procedure into a library rather than baking it into model weights — the difference is granularity and how composition happens.

The interesting contrast is what *else* an agent could store. AgentFly keeps three memory modules — case, subtask, and tool — and treats the whole thing as a memory-augmented decision process, improving its policy entirely through memory operations with zero parameter updates Can agents learn continuously from experience without updating weights?. DeepAgent folds raw interaction history into episodic, working, and tool schemas to stay efficient under long horizons Can agents compress their own memory without losing critical details?. A 2025 survey argues these aren't really different memory *types* at all — it reframes agent memory along forms, functions, and dynamics, showing the familiar short-term/long-term split is an emergent temporal pattern rather than an architectural fact Can three axes replace the short-term long-term memory split?. That's the lens that makes the PRAXIS-vs-AWM-vs-rules question crisp: they differ in which *form* (routine, skill, case, rule) and which *function* (experiential vs procedural) they externalize.

If 'causal rule learning' is the third leg, the corpus's nearest territory is the move to externalize *cognitive burden into structure* rather than rely on model scale: reliable agents push memory, skills, and protocols into a harness layer so the model stops re-solving the same problems Where does agent reliability actually come from?. There's also a thread that replaces stored routines with *control flow* — LLM Programs embed the model inside an explicit algorithm that hides step-irrelevant context Can algorithms control LLM reasoning better than LLMs alone?, and the Thread Inference Model dispenses with a separate memory store entirely by structuring reasoning as recursive subtask trees with KV-cache pruning, letting one model do internally what multi-agent systems do across components Can recursive subtask trees overcome context window limits?.

So the architectural fault line the question is pointing at is real and the corpus maps it well: do you reuse *procedures* (AWM, VOYAGER), *episodes/cases* (AgentFly, DeepAgent), *control structure* (LLM Programs, Thread Inference), or *abstracted rules* — and is that store external and editable, or compiled into weights? The thing worth knowing you wanted to know: the survey's claim that these look like distinct architectures but are better understood as different settings of form-and-function on a shared substrate — which means the 'difference' between approaches like these is often a choice of granularity and storage medium, not a deep architectural divide Can three axes replace the short-term long-term memory split?.


Sources 8 notes

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents compress their own memory without losing critical details?

DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.

Can three axes replace the short-term long-term memory split?

A 2025 survey reframes agent memory along forms (token/parametric/latent), functions (factual/experiential/working), and dynamics (formation/evolution/retrieval), showing that short/long-term phenomena emerge from temporal patterns rather than architectural separation. This enables precise system comparison and replaces vague implementation-based claims.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Can recursive subtask trees overcome context window limits?

The Thread Inference Model demonstrates that reasoning structured as recursive subtask trees with rule-based KV cache pruning sustains accurate reasoning beyond context limits, even when manipulating 90% of the cache. This enables single models to replace multi-agent systems by handling full recursive reasoning internally.

Next inquiring lines