Should agents update memory after every turn or batch process sessions?
This explores whether agents should fold new information into memory continuously (every turn) or accumulate raw interactions and consolidate them in batches — and the corpus suggests the framing is a false binary.
This explores whether agents should update memory every turn or batch-process sessions — and the most useful thing the corpus has to say is that this is the wrong axis to optimize on. The real split isn't cadence (now vs. later); it's *what kind* of update you're doing and *what triggers it*. One paper maps memory management into two distinct paths: an explicit "hot path" where the agent itself decides to write something mid-task via a tool call, and an implicit background path that fires on programmatic triggers How should agents decide what memories to keep?. Per-turn updating and batch consolidation aren't competitors — they're these two paths, and a well-built agent runs both.
Where the corpus gets sharp is on the danger of doing continuous consolidation naively. One study found that LLM-summarized textual memory follows an inverted-U: it helps at first, then actively degrades as experience piles up, eventually performing *worse* than just keeping raw episodes — a model failed 54% of problems it had previously solved after consolidation, via misgrouping, stripping away applicability conditions, and overfitting to narrow streams Does agent memory degrade when continuously consolidated?. So "summarize after every turn" is a real trap. But continuous updating isn't doomed — it depends on what you continuously update. FluxMem shows that constantly creating and pruning *links* between memory units based on closed-loop execution feedback reaches state of the art Should agent memory adapt dynamically based on execution feedback?, because what makes memory useful is connectivity, not the freshness of summaries — storage is inert, and topology decides whether the right memory is reachable at decision time Is agent memory a storage problem or a connectivity problem?.
Notice the apparent contradiction: continuous consolidation degrades, continuous link-rewiring wins. The reconciler is that they're operating on different things. Compressing content every turn loses the conditions that made each memory applicable; re-wiring connections every turn, guided by whether the agent actually succeeded, sharpens retrieval. The lesson isn't "batch vs. live" — it's that the *trigger* should be execution feedback, not the clock.
The cadence question also dissolves once you stop treating memory as one thing. RAISE decomposes working memory into four components across two time scales — dialogue-level (conversation history, scratchpad) and turn-level (examples, task trajectory) — and argues each demands its own update policy and has its own failure modes How should agent memory split across time scales?. A scratchpad wants per-turn writes; a distilled skill or episodic schema wants deliberate, less frequent consolidation. Done well, batch-style consolidation is powerful: DeepAgent's autonomous memory folding compresses interaction history into episodic, working, and tool schemas on its own initiative, cutting token overhead while letting the agent pause to rethink strategy Can agents compress their own memory without losing critical details?. The key word is *autonomous* — the agent chooses when, rather than a fixed every-N-turns rule.
Underneath all of it is the reframe worth taking away: the bottleneck is quality, not capacity or timing. The hard problem in agent memory is preventing staleness, drift, contamination, and over-generalization — adding more (whether more frequent writes or bigger batches) without curation makes performance worse Is agent memory capacity or quality the real bottleneck?. So the answer to "every turn or batch?" is: write cheap, reversible scratch continuously; consolidate durable knowledge deliberately and only when feedback says it earned its place; and let the trigger be the agent's own judgment plus execution signal, not a schedule.
Sources 7 notes
Memory management decomposes into explicit hot-path (agent decides via tool calling) and implicit background (programmatically triggered) paths. Each approach trades context-sensitivity for reliability differently across generation, storage, retrieval, and deletion.
LLM-consolidated textual memory degrades as experience accumulates, eventually performing worse than episodic-only retention. GPT-5.4 failed 54% of previously-solved problems after consolidation, with three mechanisms identified: misgrouping, applicability stripping, and overfitting on narrow streams.
FluxMem demonstrates that adaptive memory topology—where links form, refine, and consolidate based on closed-loop execution feedback—consistently reaches state-of-the-art across three distinct benchmarks. Dynamic connectivity outperforms fixed retrieval by aligning abstraction and eliminating interference.
FluxMem shows that memory usefulness is determined by links between co-activated units forming an accessible subgraph, not by what is stored. Storage is necessary but inert; topology determines whether useful memories are reachable at decision time.
RAISE shows that agent memory consists of four components organized by two design axes: dialogue-level (conversation history, scratchpad) versus turn-level (examples, task trajectory). This granularity distinction predicts different failure modes and update policies for each component.
DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.
The core challenge in agent memory is not accumulating more data but managing what exists—preventing staleness, drift, contamination, and over-generalization. Adding capacity without curation actively makes performance worse.