Agentic Systems and Planning

Why does agent efficiency differ from model size reduction?

Explores why making models smaller doesn't solve agent cost problems. Agents loop recursively, compounding costs multiplicatively, so efficiency requires system-level design, not just parameter reduction.

Note · 2026-05-18 · sourced from Agents

A definitional point from Toward Efficient Agents that resolves a common confusion. "Efficient" in the LLM context has typically meant "smaller model" — distillation, quantization, sparser attention, anything that reduces per-token inference cost. For agentic systems, this is the wrong frame.

The reason is structural. A standard LLM in single-turn query-response operates linearly: input goes in, output comes out, cost is proportional to context plus output length. An agent operates recursively: it queries the model, observes the response, decides on actions, executes tools, reads results, queries the model again, and so on. The compound cost across this loop grows multiplicatively in the number of steps, often quadratically or worse if context accumulates per turn. A 7B-parameter model running an agent loop for 50 steps consumes far more than 50 times the resources of a 7B-parameter model answering one question.

This makes "smaller model" a marginal optimization for agentic systems. Halving the model size halves per-call cost but does not address the multi-step accumulation. A truly efficient agent has to be optimized at the system level — what triggers the recursion, when does it stop, how much state does each turn carry forward, how much can be pruned at each step.

The right metric is not "throughput per token" but the Pareto frontier between effectiveness (task success rate) and cost (latency + tokens + tool invocations + dollar cost). An agent that completes the task in 5 steps with a larger model can be more efficient than one that completes it in 50 steps with a smaller model. The model size is a knob, not the answer.

For deployment, this argues against the reflexive "downsize the model" approach to agentic-system cost reduction. The right intervention is usually structural — reduce steps, compress memory, eliminate unnecessary tool calls, plan better. Model size cuts come last and offer the least leverage for the cost they impose on capability.

Related concepts in this collection

Concept map
15 direct connections · 104 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

efficient agent is system-level optimization for the success-versus-cost Pareto frontier — distinct from smaller model because agent recursion consumes resources exponentially beyond single-turn use