Agentic and Multi-Agent Systems

Are multi-agent systems actually intelligent coordination or just token spending?

Does multi-agent performance come from better coordination strategies, or primarily from distributing tokens across parallel contexts? Understanding this distinction matters for deciding when to build multi-agent systems versus scaling single agents.

Note · 2026-02-23 · sourced from Agents Multi Architecture

Three independent findings converge on an uncomfortable thesis about multi-agent AI systems:

Finding 1: Anthropic's internal research evaluation shows token usage alone explains 80% of multi-agent performance variance. Model choice and tool calls explain the remaining 15%. Multi-agent systems use roughly 15× more tokens than chat interactions.

Finding 2: The Science of Scaling Agent Systems finds coordination yields negative returns once single-agent baselines exceed 45% accuracy. The mechanism: coordination overhead exceeds diminishing improvement potential. For sequential reasoning tasks, every multi-agent variant degrades performance by 39-70%.

Finding 3: Multi-agent systems fragment per-agent token budgets, leaving insufficient capacity for complex tool orchestration on tool-heavy tasks.

Together: multi-agent systems don't primarily coordinate intelligently — they buy performance by distributing tokens across parallel context windows. The value proposition is token parallelism, not intelligent orchestration.

The counter-argument is important: Sometimes token spending IS the value. Breadth-first research genuinely requires exploring multiple directions simultaneously. Compression via parallel subagents — each exploring with its own context window — produces a kind of intelligence that a single agent with the same total budget cannot replicate. And since Does token spending drive multi-agent research performance?, model upgrades multiply token efficiency, making the token tax more productive per unit spent.

The escape route: LatentMAS demonstrates 70-84% token reduction while improving accuracy by up to 14.6%. If agents communicate through latent representations rather than text, the token tax drops dramatically. The tax is a property of text-based inter-agent communication, not of multi-agent coordination itself.

The practical question for anyone building multi-agent systems: Is the task valuable enough to justify 15× the compute? Does it genuinely require parallel exploration of independent directions? Or would a better single model with more tokens accomplish the same thing?


Source: Agents Multi Architecture

Related concepts in this collection

Concept map
13 direct connections · 112 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

the token tax — multi-agent systems are primarily an expensive way to spend more tokens not an intelligent way to coordinate