Toward Efficient Agents: A Survey of Memory, Tool Learning, and Planning
Recent years have witnessed increasing interest in extending large language models into agentic systems. While the effectiveness of agents has continued to improve, efficiency — which is crucial for real-world deployment — has often been overlooked. This paper therefore investigates efficiency from three core components of agents: memory, tool learning, and planning, considering costs such as latency, tokens, steps, etc. Aimed at conducting comprehensive research addressing the efficiency of the agentic system itself, we review a broad range of recent approaches that differ in implementation yet frequently converge on shared high-level principles including but not limited to bounding context via compression and management, designing reinforcement learning rewards to minimize tool invocation, and employing controlled search mechanisms to enhance efficiency. We characterize efficiency in two complementary ways: comparing effectiveness under a fixed cost budget, and comparing cost at a comparable level of effectiveness. This trade-off can also be viewed through the Pareto frontier between effectiveness and cost.
An efficient agent is not a smaller model, but an agentic system optimized to maximize task success rates while minimizing resource consumption, including token usage, inference latency, and computational cost across memory, tool usage, and planning modules. Unlike a standard LLM that typically operates in a linear, single-turn query-response format, an agent consumes exponentially more resources due to its recursive nature. To automate intricate real-world tasks, agents must perform extensive memory management, iterative tool usage, and complex planning over multiple steps. This multi-step execution leads to prohibitive latency, context window saturation, and excessive token consumption, raising profound concerns regarding the long-term sustainability and equitable accessibility of these increasingly capable systems.
Our survey categorizes existing works into three strategic directions: 1) Efficient Memory: Techniques for compressing historical context, managing memory storage, and optimizing context retrieval. 2) Efficient Tool Learning: Strategies to minimize the number of tool calls and reduce the latency of external interactions. 3) Efficient Planning: Strategies to reduce the number of executing steps and API calls required to solve a problem. Across all three areas, many seemingly different methods converge on shared high-level ideas, suggesting that efficiency gains are not component-specific accidents but reflect deeper structural constraints on agentic computation.
Key open challenges include the lack of a unified efficiency evaluation framework — existing efficiency numbers are not directly comparable across papers — and the underexplored efficiency of MLLM-based agents, where maintaining visual history creates a trade-off between memory retention and inference speed that is far more severe than in LLM-based agents. We advocate efficiency-aware agent design and evaluation by jointly considering performance and cost, including latency, interaction steps, and tool-call overhead.