Agentic and Multi-Agent Systems

When do multi-agent systems actually outperform single agents?

As individual LLMs grow more capable, does the advantage of splitting work across multiple agents still hold? This explores when coordination overhead makes MAS counterproductive.

Note · 2026-03-28 · sourced from Agentic Research
What makes multi-agent teams actually perform better? How does test-time scaling work at the agent level?

"Single-agent or Multi-agent Systems? Why Not Both?" (2025) provides an empirical and theoretical analysis of when multi-agent systems (MAS) help versus hurt, with a finding that challenges the default toward multi-agent architectures.

The diminishing advantage. Prior studies reported MAS accuracy superiority across diverse domains. However, as frontier LLMs rapidly advance in long-context reasoning, memory retention, and tool usage, many limitations that originally motivated MAS designs are being mitigated by single-agent capability improvements. The empirical study finds that across various agentic applications, the performance gap between MAS and SAS narrows with stronger models — and SAS consistently outperforms MAS in a substantial portion of cases.

Three MAS defect types formalized as dependency graph problems:

Node-level defect: Both MAS and SAS performance are bottlenecked by the critical agent responsible for the most difficult subtask. MAS cannot escape the ceiling set by its weakest critical component. Adding more agents does not help if the hardest subtask remains unsolved.

Edge-level defect: Downstream agents become overwhelmed by inputs from upstream agents. In multi-way conversations or prolonged iterative refinements, high in-degree nodes (summarizers, synthesizers) receive more information than they can process effectively, leading to overthinking on edge cases. This is "analogous to the overthinking of the reasoning model, but rather than being lost in thinking, the agent becomes overwhelmed by inputs from upstream agents." MAS aggravates the problem because agents process much more data.

Path-level defect: Indecisive errors propagate through chains of agent interactions. Crucial context is lost or diluted when intermediate outputs are summarized or filtered. Even small information loss causes irreversible errors downstream via snowball effects. The specific failure mode: correct solutions proposed in earlier rounds get lost during summarization before reaching the next agent — "this loss is unrecoverable, as downstream agents no longer have access to the full previous results."

The hybrid solution. Confidence-guided routing between SAS and MAS — request cascading — selectively offloads requests based on difficulty. The approach improves accuracy by 1.1-12% while reducing costs up to 88%. AIME (hardest math) is the exception where MAS consistently outperforms, illustrating MAS value for extremely difficult tasks.

This extends When does adding more agents actually help systems?: the scaling laws quantify MAS overhead, while this paper shows the overhead becoming less worthwhile as single-agent capability increases. Since Why do multi-agent LLM systems converge without real debate?, MAS suffers from both coordination overhead AND pseudo-agreement — making the case for SAS with selective MAS escalation.


Source: Agentic Research

Related concepts in this collection

Concept map
15 direct connections · 138 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

multi-agent system advantages diminish as single-agent LLM capabilities improve — three defect types in MAS dependency graphs explain when single beats multi