Does agent efficiency really break down into three distinct components?

Can we understand agent efficiency as three independent optimization problems—memory, tool use, and planning—each with separate cost drivers? This matters because it could explain why point optimizations keep missing the bigger picture.

Note · 2026-05-18 · sourced from Agents

The agent-efficiency literature has historically been a collection of point optimizations — a paper about better tool selection here, a paper about prompt compression there. The Toward Efficient Agents survey argues the field is better understood through a three-component decomposition that maps to where the costs actually accumulate.

Efficient Memory: techniques for compressing historical context, managing memory storage, and optimizing context retrieval. The cost driver here is context window — long agent trajectories accumulate context that grows linearly with steps, hitting token budgets and increasing latency per turn. Compression, summarization, structured episodic memory, and retrieval-on-demand all reduce this cost.

Efficient Tool Learning: strategies to minimize the number of tool calls and reduce the latency of external interactions. The cost driver is external API latency — each tool call is a round-trip to a system the agent does not control, often with seconds of latency and rate-limiting. Reducing tool calls (caching, batching, smarter selection) reduces wall-clock time more than any internal optimization can.

Efficient Planning: strategies to reduce the number of executing steps and API calls required to solve a problem. The cost driver is multi-step amplification — each step's cost compounds across all steps. Better planning means fewer steps to the same outcome, and the savings multiply across the trajectory.

The three axes are orthogonal in the sense that a technique improving one does not automatically improve the others. A memory-compression technique does not reduce tool calls; a tool-selection improvement does not affect context length; a planning improvement does not address either. Efficient agent design requires optimization on all three axes — and the costs (latency, tokens, steps) are the right axes for comparing techniques regardless of which component they target.

The methodological consequence is that benchmarks should report effectiveness under fixed cost budgets and cost at comparable effectiveness — the Pareto frontier between effectiveness and cost. Single-number rankings of agent quality miss the structure of the actual deployment trade-off.

Related concepts in this collection

Concept map

14 direct connections · 91 in 2-hop network ·medium cluster Open in graph ↗

Does agent efficiency really break down into thr… Why does agent efficiency differ from model size r… Do efficiency techniques across agent components r… Can three axes replace the short-term long-term me…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Original note title

agent efficiency decomposes into three orthogonal axes — memory tool learning and planning — each measured by latency tokens and steps

Does agent efficiency really break down into three distinct components?

Related concepts in this collection

Related papers in this collection