At what task difficulty does multi-agent decomposition become worth the coordination cost?

This explores when splitting a task across multiple agents actually pays off — at what point the gains from decomposition outrun the overhead, error propagation, and token cost that coordination introduces.

This reads the question as a threshold problem: there's a task-difficulty band where multi-agent decomposition earns its keep, and bands on either side where it doesn't. The corpus is surprisingly direct that the threshold is real and measurable — and that it's narrower than most people assume. The sharpest result comes from a study of 180 configurations finding that coordination *stops* helping above roughly 45% single-agent accuracy, while tool-coordination trade-offs actively *harm* the most complex tasks When does adding more agents actually help systems?. That's the counterintuitive part: the hardest tasks aren't where decomposition shines — they're where coordination overhead and error amplification (4–17× depending on topology) can sink you fastest.

Sources 6 notes

When does adding more agents actually help systems?

Across 180 configurations, three dominant effects predict multi-agent success: tool-coordination trade-offs harm complex tasks, coordination stops helping above 45% accuracy, and topology choice controls error amplification by 4–17×. Architecture-task alignment, not agent count, determines outcomes.

When do multi-agent systems actually outperform single agents?

Empirical analysis shows MAS performance gaps narrow with stronger models, with SAS outperforming in many cases. Three formal defect types—node-level bottlenecks, edge-level overwhelm, and path-level error propagation—explain when single agents win.

How does test-time scaling work at the agent level?

Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

At what task difficulty does multi-agent decomposition become worth the coordination cost?

Sources 6 notes

Next inquiring lines