Is agent memory capacity or quality the real bottleneck?
While more storage seems like the obvious solution to memory problems, what if the real constraint is actually curation—deciding what to keep, discard, and retrieve without degrading performance?
The intuitive picture of agent memory is a storage problem: give the agent more room and it remembers more. The system-scaling analysis rejects this. "Memory is not merely a storage layer; the harder problem is memory quality" — what to store, what to discard, how to retrieve the right information at the right time, and how to avoid staleness, drift, contamination, and over-generalization. Adding capacity without curation makes things worse: more stored material means more stale entries to retrieve, more opportunities for contaminated content to surface, and more room for over-generalized lessons to misfire on cases they do not fit.
This reframes memory engineering as a discarding and curation problem, not an accumulation one. The failure modes are specific and diagnosable — staleness (kept too long), drift (slowly diverging from ground truth), contamination (bad entries poisoning retrieval), over-generalization (a narrow lesson applied too broadly) — and each calls for different hygiene. The open question, which the paper leaves unresolved, is what discarding policy avoids all four without throwing away genuinely useful long-tail knowledge. The counterpoint is that aggressive forgetting risks losing rare-but-critical information, so quality is a trade-off, not a free win. This matters because it redirects effort from bigger memory stores toward better forgetting — the part of memory design that is hardest and least solved.
— "From Model Scaling to System Scaling: Scaling the Harness in Agentic AI", https://arxiv.org/abs/2605.26112
Related concepts in this collection
-
How should agents decide what memories to keep?
Agent memory management splits between agents autonomously recognizing important information versus programmatic triggers. Understanding this choice reveals why different memory architectures prioritize different information types.
describes the management machinery through which memory-quality decisions get made
-
Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
concrete instance where more stored context degrades quality, supporting discard-over-accumulate
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
the real memory problem is quality not storage — what to discard and how to avoid drift contamination and over-generalization