Could a single agent system switch memory granularity between tasks?
This explores whether one agent could shift the *shape* of its memory — coarse workflow recipes for one task, fine-grained step-by-step state for another — instead of being locked into a single fixed format, and what the corpus says about whether that's even desirable.
This explores whether a single agent could change *how* it remembers depending on the task in front of it — and the corpus suggests not only that it could, but that it probably should. The strongest case for this comes from work showing memory granularity isn't a one-size-fits-all setting: Does agent memory work better at one level of abstraction? finds that the best abstraction is *domain-conditional* — workflow-level memory wins in routine-heavy tasks, causal-rule memory wins where the environment is the source of difficulty, and fine-grained state-action memory wins in web tasks where the UI details matter. If the optimal granularity changes with the task, then an agent locked to a single level is leaving performance on the table on every task that doesn't match its default.
What makes switching plausible rather than aspirational is that several systems already maintain *multiple* memory granularities at once rather than one. How should agent memory split across time scales? shows agent working memory naturally splits across two time scales — dialogue-level (the running conversation, a scratchpad) versus turn-level (examples, the current task trajectory) — each with its own update rules and failure modes. Can agents compress their own memory without losing critical details? goes further: DeepAgent folds raw history into distinct episodic, working, and tool schemas. Once an agent holds several representations side by side, "switching granularity between tasks" becomes a routing decision — which store to consult and update — rather than a rebuild.
The missing piece is the *decision* about when to switch, and the corpus has a clean answer for that too. How should agents decide what memories to keep? separates an explicit hot-path (the agent decides what to keep via tool calls, sensitive to context) from an implicit background path (programmatic triggers, reliable but blind). A granularity-switching agent is essentially using the hot path to pick its abstraction per task. And Should agent memory adapt dynamically based on execution feedback? (FluxMem) shows that letting the memory's *topology* adapt from execution feedback — forming, refining, and consolidating links as results come back — beats fixed retrieval precisely because it "aligns abstraction" to what the work needs. That's granularity-switching by another name, driven by outcomes rather than declared up front.
There's a cautionary thread worth knowing about before you assume more flexibility is always better. Does agent memory degrade when continuously consolidated? shows that aggressive consolidation can backfire — an agent re-coarsening its memory failed 54% of problems it had previously solved, through misgrouping and stripping away the conditions that made a memory applicable. So switching granularity isn't free: collapse fine detail too eagerly and you lose the very specifics a later task needs. The skill is knowing which axis the current task's difficulty lives on — arguments, causal structure, or fine state — which is exactly the diagnostic Does agent memory work better at one level of abstraction? offers.
If you want the broader bet underneath all of this: Where does agent reliability actually come from? argues reliability comes from pushing memory, skills, and protocols out of the model and into a harness layer. A harness that holds several memory formats and routes between them is the natural home for granularity-switching — and Can recursive subtask trees overcome context window limits? hints at why a *single* agent is enough to do it, showing one model with structured, prunable internal memory can absorb work that used to require a whole multi-agent crew.
Sources 8 notes
Workflow-level memory wins in routine-rich domains, causal-rule memory in environment-rich domains, and state-action memory in spatially-rich web tasks. The optimal abstraction depends on whether task variance comes from arguments, causal structure, or fine-grained UI state.
RAISE shows that agent memory consists of four components organized by two design axes: dialogue-level (conversation history, scratchpad) versus turn-level (examples, task trajectory). This granularity distinction predicts different failure modes and update policies for each component.
DeepAgent's autonomous memory folding consolidates interaction history into episodic, working, and tool memory schemas. This reduces token overhead while letting agents pause to reconsider strategies—the autonomy and structure together avoid degradation that plagues poorly designed consolidation.
Memory management decomposes into explicit hot-path (agent decides via tool calling) and implicit background (programmatically triggered) paths. Each approach trades context-sensitivity for reliability differently across generation, storage, retrieval, and deletion.
FluxMem demonstrates that adaptive memory topology—where links form, refine, and consolidate based on closed-loop execution feedback—consistently reaches state-of-the-art across three distinct benchmarks. Dynamic connectivity outperforms fixed retrieval by aligning abstraction and eliminating interference.
LLM-consolidated textual memory degrades as experience accumulates, eventually performing worse than episodic-only retention. GPT-5.4 failed 54% of previously-solved problems after consolidation, with three mechanisms identified: misgrouping, applicability stripping, and overfitting on narrow streams.
Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.
The Thread Inference Model demonstrates that reasoning structured as recursive subtask trees with rule-based KV cache pruning sustains accurate reasoning beyond context limits, even when manipulating 90% of the cache. This enables single models to replace multi-agent systems by handling full recursive reasoning internally.