INQUIRING LINE

Why do multi-agent systems use 15 times more tokens than chat interactions?

This explores why multi-agent AI systems burn through so many more tokens than a single chat — and what that 15× figure actually buys you (or doesn't).


This explores why multi-agent AI systems burn through roughly 15× the tokens of a single chat exchange — and whether that spend reflects smart coordination or just brute force. The corpus has a blunt answer: it's mostly brute force. The reason multi-agent systems use so many more tokens is that running several agents in parallel *is* the mechanism by which they get better results. Anthropic's own evaluations found that token spending alone explains about 80% of the performance variance in multi-agent research systems — not the sophistication of how agents talk to each other Does token spending drive multi-agent research performance? Are multi-agent systems actually intelligent coordination or just token spending?. In other words, the 15× isn't a side effect of coordination; the extra tokens are the product. You're paying to explore more of the problem in parallel.

What makes this striking is that coordination — the thing 'multi-agent' is supposed to be about — can actively hurt. Above roughly 45% task accuracy, adding more agent-to-agent coordination yields *negative* returns Are multi-agent systems actually intelligent coordination or just token spending?. Agents accept information from their neighbors without verifying it, so errors propagate, and they fail to coordinate by either agreeing too late or silently changing strategy without telling anyone Why do multi-agent systems fail to coordinate at scale?. So a lot of the token budget is spent on a chattier, more error-prone version of what a single focused agent might do — which raises the obvious question of whether you need the multiple agents at all.

The corpus suggests several ways out, and they're more interesting than the problem. One line of work shows the token tax is largely *avoidable*: the reason agents talk so much is that they serialize their thinking into natural-language messages and re-read each other's full output. LatentMAS lets agents share internal representations directly through their KV caches instead of writing text back and forth, cutting token use by 70–84% while *improving* accuracy Can agents share thoughts without converting them to text?. A related approach extracts shared 'thoughts' from hidden states so agents coordinate at the representational level rather than through paragraphs of prose Can agents share thoughts directly without using language?. The implication: most of those 15× tokens are translation overhead — the cost of forcing internal reasoning through the bottleneck of language.

Another angle questions whether you need separate model instances at all. Non-linear prompting — having a single model simulate multiple personas in a branching context — reproduces the cognitive benefits of multi-agent debate without spinning up multiple agents Can branching prompts replicate what multi-agent systems do?. And on the cost side, much of agent work is repetitive and well-defined, which small language models handle at 10–30× lower cost, making a 'small by default, large only when needed' architecture the economically rational design Can small language models handle most agent tasks?. Structured artifacts — agents writing standardized documents and pulling what they need — also beat free-form conversation, trimming the noise that inflates token counts Does structured artifact sharing outperform conversational coordination?.

The deeper reframe worth taking away: when context persists and gets reused across a long-running task, the right cost denominator stops being tokens at all. One 115-day case study found 82.9% of tokens were cache reads, meaning the meaningful unit of cost was completed artifacts, not raw token count Do persistent agents really cost less per token?. So the honest answer to 'why 15×?' is: because today's multi-agent systems buy performance by spending tokens rather than by coordinating intelligently — and the most promising research is precisely about decoupling those two, so you can keep the performance and drop most of the tax.


Sources 9 notes

Does token spending drive multi-agent research performance?

Anthropic's internal evals show token spending alone accounts for 80% of performance variance in multi-agent research systems. Model capability upgrades deliver larger gains than doubling token budget, suggesting efficiency matters as much as quantity.

Are multi-agent systems actually intelligent coordination or just token spending?

Research shows token usage explains 80% of multi-agent performance variance, systems use 15× more tokens than single agents, and coordination yields negative returns above 45% accuracy. Performance gains come from token distribution, not coordination sophistication.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can agents share thoughts without converting them to text?

LatentMAS enables agents to share internal representations directly via KV caches, reaching 14.6% accuracy gains and 70.8-83.7% token reduction with no additional training. Hidden embeddings preserve reasoning fidelity that text-based systems cannot.

Can agents share thoughts directly without using language?

Research formalizes inter-agent thought sharing via sparse autoencoders that recover individual, shared, and private latent thoughts from hidden states. This approach detects alignment conflicts at the representational level before they manifest in language.

Can branching prompts replicate what multi-agent systems do?

Research shows single LLMs using dynamic persona simulation achieve multi-agent cognitive synergy without multiple model instances. Solo Performance Prompting validates that structured prompting techniques map directly to multi-agent debate architectures, enabling equivalent outcomes through structural equivalence.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

Does structured artifact sharing outperform conversational coordination?

MetaGPT demonstrates that agents producing standardized engineering documents achieve superior coordination compared to conversational exchange. Active information pulling from shared environments eliminates noise and mirrors efficient human workplace infrastructure.

Do persistent agents really cost less per token?

A 115-day case study found 82.9% of tokens were cache reads. When context persists and reuses, the meaningful cost denominator becomes completed artifacts, not individual tokens.

Next inquiring lines