Does agent efficiency really break down into three distinct components?
Can we understand agent efficiency as three independent optimization problems—memory, tool use, and planning—each with separate cost drivers? This matters because it could explain why point optimizations keep missing the bigger picture.
The agent-efficiency literature has historically been a collection of point optimizations — a paper about better tool selection here, a paper about prompt compression there. The Toward Efficient Agents survey argues the field is better understood through a three-component decomposition that maps to where the costs actually accumulate.
Efficient Memory: techniques for compressing historical context, managing memory storage, and optimizing context retrieval. The cost driver here is context window — long agent trajectories accumulate context that grows linearly with steps, hitting token budgets and increasing latency per turn. Compression, summarization, structured episodic memory, and retrieval-on-demand all reduce this cost.
Efficient Tool Learning: strategies to minimize the number of tool calls and reduce the latency of external interactions. The cost driver is external API latency — each tool call is a round-trip to a system the agent does not control, often with seconds of latency and rate-limiting. Reducing tool calls (caching, batching, smarter selection) reduces wall-clock time more than any internal optimization can.
Efficient Planning: strategies to reduce the number of executing steps and API calls required to solve a problem. The cost driver is multi-step amplification — each step's cost compounds across all steps. Better planning means fewer steps to the same outcome, and the savings multiply across the trajectory.
The three axes are orthogonal in the sense that a technique improving one does not automatically improve the others. A memory-compression technique does not reduce tool calls; a tool-selection improvement does not affect context length; a planning improvement does not address either. Efficient agent design requires optimization on all three axes — and the costs (latency, tokens, steps) are the right axes for comparing techniques regardless of which component they target.
The methodological consequence is that benchmarks should report effectiveness under fixed cost budgets and cost at comparable effectiveness — the Pareto frontier between effectiveness and cost. Single-number rankings of agent quality miss the structure of the actual deployment trade-off.
Related concepts in this collection
-
Why does agent efficiency differ from model size reduction?
Explores why making models smaller doesn't solve agent cost problems. Agents loop recursively, compounding costs multiplicatively, so efficiency requires system-level design, not just parameter reduction.
same paper, the framing
-
Do efficiency techniques across agent components reveal shared structural constraints?
Despite targeting different parts of agentic systems, efficiency techniques converge on similar principles. This raises a question: are these convergences independent discoveries, or do they reflect deeper architectural constraints that all agent systems face?
same paper, the convergence observation
-
Can three axes replace the short-term long-term memory split?
Does breaking agent memory into forms, functions, and dynamics provide a clearer framework than the traditional short-term/long-term distinction? This matters because current agent-memory literature lacks a unified vocabulary, making comparison between systems nearly impossible.
adjacent: complementary three-axis decomposition of agent memory specifically
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
agent efficiency decomposes into three orthogonal axes — memory tool learning and planning — each measured by latency tokens and steps