Agentic Systems and Planning Conversational AI Systems

How should agent memory split across time scales?

Explores whether agent working memory should be organized by temporal scope—some components persisting across a conversation, others refreshed each turn. Understanding this distinction could reveal why some memory designs fail.

Note · 2026-05-18 · sourced from Memory
Why do multi-agent systems fail despite individual capability?

Most agent architectures describe their memory as one undifferentiated working buffer plus an external store. RAISE (2401.02777) refines the working layer into four components — but the contribution that gets missed is not the four components, it is the two granularities underneath them.

The four components: system prompt (role identity, objectives, tool descriptions, few-shot anchors), context (conversation history plus task trajectory), scratchpad (background information, intermediate reasoning, observations from tool calls), examples (query-response pairs retrieved for the current task to supplement knowledge gaps).

The granularity split is the under-noticed structural claim. Conversation history and scratchpad are dialogue-level: they accumulate across the entire conversation and persist between turns. Examples and task trajectory are turn-level: they are recalled and replaced each turn based on the current query. The four components form a 2×2 design space: dialogue-vs-turn × continuous-accumulation-vs-retrieval-replacement.

The granularity distinction matters because it predicts which components introduce certain failure modes. Dialogue-level components grow monotonically and trigger context-length pressure; they need pruning policies. Turn-level components risk recall failure if the retrieval index is stale or the retrieval signal is weak; they need refresh policies. Treating all working memory as one buffer makes both problems invisible. RAISE makes them addressable as separate concerns.

The update protocol shows the granularity in action. On each turn: (1) append the user query to conversation history (dialogue-level append), (2) recall top-k relevant examples from a separate example pool via vector retrieval (turn-level replace), (3) update current entity information in the scratchpad if applicable (dialogue-level update), (4) update agent trajectory and tool results in task memory during execution (turn-level append-within-turn). Different components, different update rules, different lifecycles — all coordinated by the controller.

The implication for agent design: the question "where does this go in memory?" decomposes into two sub-questions — what is its temporal scope, and what is its update policy? Architectures that conflate these end up with either bloated dialogue buffers (everything is dialogue-level append) or lossy turn-level memory (everything is replaced each turn).


Paper: From LLM to Conversational Agent: RAISE

Related concepts in this collection

Concept map
13 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

agent working memory decomposes into four components at two granularities — dialogue-level history and scratchpad versus turn-level examples and task trajectory