Rethinking Memory as Continuously Evolving Connectivity
Existing memory-augmented LLM agents often treat memory as a static repository with pre-defined representations and fixed retrieval pipelines, which is brittle in dynamic agentic environments where feedback, task variation, and heterogeneous signals continuously reshape what should be remembered and how it should be connected. To address this, we propose FluxMem, a connectivity-evolving memory framework that models memory as a heterogeneous graph and progressively refines its topology through three stages: initial connection formation, feedback-driven refinement, and long-term consolidation. During execution, FluxMem repairs missing links, prunes interference, aligns abstraction granularity, and distills recurrent successful trajectories into reusable procedural circuits, guided by one metric for memory generalizability and evolutionary maturity. Across three fundamentally distinct benchmarks including LoCoMo, Mind2Web, and GAIA, FluxMem achieves consistent state-of-the-art performance, demonstrating strong adaptation and generalization in complex agentic environments. The code will be open-sourced in the near future.
For long-horizon agents, memory mechanism plays a central role, by distilling useful factual information, reusable experiences and skills from the agent's past interaction trajectories, storing them in diverse memory forms, and retrieving relevant memories when similar tasks arise to support downstream problem solving and agent evolving. For long-horizon agents, memory effectiveness ultimately depends on whether the most useful memories can be accessed at each decision step, so sufficiently useful memory context substantially improves subtask success. We formalize such usefulness as a problem of memory connectivity. Drawing from cognitive science, we define memory as the long-term sedimentation of memory units and their connections, continuously shaped through environmental interaction. Mirroring human cognitive processes, this structural evolution operates on two levels. At the unit level, the brain generates new units for novel information and continuously reshapes existing units by modifying their internal content. At the connection level, operations are strictly task-centric, the system establishes links between co-activated units to form functional associations, and prunes links that prove irrelevant, maintaining an efficient associative network.
To address these challenges, we propose FluxMem, a connectivity-evolving framework that models memory as a dynamically editable heterogeneous graph across semantic, episodic, and procedural layers. Context is formalized as an activated subgraph refined through a three-stage evolutionary pipeline. (1) Initial Connection Formation rapidly establishes tentative cross-layer associations for novel tasks. (2) Feedback-Driven Refinement employs a closed-loop mechanism to iteratively edit subgraph topology, creating missing links, pruning interference, or conditionally bypassing memory until execution succeeds. (3) Long-Term Consolidation clusters successful trajectories to induce stable procedural circuits, monitored by a convergence maturity metric. As high-utility pathways crystallize, recurring tasks bypass redundant retrieval and directly activate mature subgraphs. This pipeline transforms static memory storage into a self-optimizing connectivity substrate that continuously adapts to evolving task demands.
We introduced FluxMem, an evolutionary framework conceptualizing agent memory as dynamic connectivity. By a three-phase evolution, FluxMem enables autonomous memory adaptation. SOTA results across LoCoMo, Mind2Web and GAIA, provides a principled foundation for self-evolving agents in dynamic environments. Despite the demonstrated effectiveness, several limitations in our experimental design warrant acknowledgment: the computational overhead of closed-loop operations, since Stages II and III rely on iterative LLM calls for context verification, topological editing, and skill induction; static benchmark protocols that do not fully simulate continuous, open-world distribution shifts or streaming environments; and hyperparameter sensitivity across several control thresholds. Future work should systematically evaluate the robustness of these parameters under varying computational budgets.