Agentic Systems and Planning Conversational AI Systems

How should agent memory split across time scales?

Explores whether agent working memory should be organized by temporal scope—some components persisting across a conversation, others refreshed each turn. Understanding this distinction could reveal why some memory designs fail.

Note · 2026-05-18 · sourced from Memory

Most agent architectures describe their memory as one undifferentiated working buffer plus an external store. RAISE (2401.02777) refines the working layer into four components — but the contribution that gets missed is not the four components, it is the two granularities underneath them.

The four components: system prompt (role identity, objectives, tool descriptions, few-shot anchors), context (conversation history plus task trajectory), scratchpad (background information, intermediate reasoning, observations from tool calls), examples (query-response pairs retrieved for the current task to supplement knowledge gaps).

The granularity split is the under-noticed structural claim. Conversation history and scratchpad are dialogue-level: they accumulate across the entire conversation and persist between turns. Examples and task trajectory are turn-level: they are recalled and replaced each turn based on the current query. The four components form a 2×2 design space: dialogue-vs-turn × continuous-accumulation-vs-retrieval-replacement.

The granularity distinction matters because it predicts which components introduce certain failure modes. Dialogue-level components grow monotonically and trigger context-length pressure; they need pruning policies. Turn-level components risk recall failure if the retrieval index is stale or the retrieval signal is weak; they need refresh policies. Treating all working memory as one buffer makes both problems invisible. RAISE makes them addressable as separate concerns.

The update protocol shows the granularity in action. On each turn: (1) append the user query to conversation history (dialogue-level append), (2) recall top-k relevant examples from a separate example pool via vector retrieval (turn-level replace), (3) update current entity information in the scratchpad if applicable (dialogue-level update), (4) update agent trajectory and tool results in task memory during execution (turn-level append-within-turn). Different components, different update rules, different lifecycles — all coordinated by the controller.

The implication for agent design: the question "where does this go in memory?" decomposes into two sub-questions — what is its temporal scope, and what is its update policy? Architectures that conflate these end up with either bloated dialogue buffers (everything is dialogue-level append) or lossy turn-level memory (everything is replaced each turn).

Paper: From LLM to Conversational Agent: RAISE

Related concepts in this collection

How should agents decide what memories to keep? Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.
Letta's hot/cold path is about *who triggers updates* (agent vs system); RAISE's two granularities are about *what temporal scope is updated* — orthogonal design axes
Can three axes replace the short-term long-term memory split? Does breaking agent memory into forms, functions, and dynamics provide a clearer framework than the traditional short-term/long-term distinction? This matters because current agent-memory literature lacks a unified vocabulary, making comparison between systems nearly impossible.
RAISE's components occupy specific positions along the functions axis: scratchpad is working, examples are factual, conversation history is experiential
Can a single model replace retrieval for long-term conversation memory? COMEDY proposes collapsing the standard retrieval pipeline into one unified model that generates, compresses, and responds. But does eliminating the retriever actually improve performance, or does compression lose critical information?
COMEDY collapses these distinctions by merging everything into one compressive store; RAISE preserves them
Can interleaving reasoning with real-world feedback prevent hallucination? Does grounding language model reasoning in external world observations rather than internal associations help prevent error propagation and false outputs? This explores whether breaking the static chain-of-thought pattern can catch and correct mistakes in real time.
RAISE is a ReAct enhancement; the four-component memory is what makes the reasoning-action loop trackable across long dialogues

Concept map

13 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

How should agent memory split across time scales… How should agents decide what memories to keep? Can three axes replace the short-term long-term me… Can a single model replace retrieval for long-term… Can interleaving reasoning with real-world feedbac…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Original note title

agent working memory decomposes into four components at two granularities — dialogue-level history and scratchpad versus turn-level examples and task trajectory

How should agent memory split across time scales?

Related concepts in this collection

Related papers in this collection