Agentic and Multi-Agent Systems

When does adding more agents actually help systems?

Multi-agent systems often fail in practice, but the reasons remain unclear. This research investigates whether coordination overhead, task properties, or system architecture determine when agents improve or degrade performance.

Note · 2026-02-23 · sourced from Agents Multi Architecture

The question of when multi-agent systems help and when they hurt has been answered with heuristics. This paper replaces heuristics with measurement. Across 180 configurations (5 architectures × 3 LLM families × 4 benchmarks), three dominant effects emerge:

1. Tool-coordination trade-off (β=−0.330, p<0.001): tool-heavy tasks suffer disproportionately from multi-agent overhead. The mechanism is token budget fragmentation — multi-agent systems split per-agent capacity, leaving insufficient tokens for complex tool orchestration. A 16-tool software engineering task under multi-agent coordination loses more than a 2-tool financial reasoning task.

2. Capability saturation (β=−0.408, p<0.001): once single-agent baselines exceed approximately 45% accuracy, coordination yields diminishing or negative returns. Coordination costs exceed improvement potential. This is a measurable threshold, not a vague guideline.

3. Topology-dependent error amplification: independent agents amplify errors 17.2× through unchecked propagation, while centralized coordination contains this to 4.4× via validation bottlenecks that catch errors before aggregation. The architecture is the error control mechanism.

The practical consequences are sharp. Centralized coordination improves performance by 80.9% on parallelizable tasks (financial reasoning). Decentralized coordination excels on dynamic web navigation (+9.2% vs +0.2%). But for sequential reasoning tasks, every multi-agent variant degrades performance by 39-70%. Architecture-task alignment, not agent count, determines success.

The predictive model (R²=0.513, 87% accuracy on held-out configurations) uses measurable task properties — not post-hoc analysis. This means architecture selection can be principled rather than intuitive. The underlying mechanisms are interpretable: fragmentation, overhead exceeding marginal gains, and error propagation without validation.

Since How should we balance parallel versus sequential compute at test time?, this finding provides the multi-agent instantiation: parallel multi-agent coordination helps for parallelizable tasks, hurts for sequential ones. The 45% saturation threshold adds a quantitative decision boundary that the TTS literature lacks.

MasRouter's per-query topology routing (from Arxiv/Routers): MasRouter directly addresses the topology-dependent error amplification finding. Rather than choosing a fixed topology and accepting its scaling limitations, MasRouter routes each query to the optimal collaboration mode (Chain/Tree/Graph) via a variational latent variable model. This transforms topology from a fixed architectural choice into a per-query routing decision — the system can use centralized coordination for tasks where error propagation matters (financial reasoning) and decentralized coordination for dynamic tasks (web navigation). The 87% prediction accuracy of the scaling laws framework suggests routing decisions could be validated: does MasRouter's topology selection correlate with what the scaling laws predict would work best? See What decisions must multi-agent routing systems optimize simultaneously?.

The endogeneity paradox: autonomy degree is itself a scaling variable. The largest coordination experiment to date (25,000 tasks, 8 models, 4-256 agents, Drop the Hierarchy and Roles) reveals that the optimal coordination topology is not fixed but depends on model capability. A hybrid protocol with fixed ordering but autonomous role selection outperforms both centralized (+14%) and fully autonomous (+44%) coordination. Below a capability threshold, the relationship reverses — weak models need rigid structure. This adds a fourth scaling law: the degree of endogenous coordination is capability-contingent. The topology-dependent error amplification finding from this note interacts with autonomy level: self-organizing agents with strong models develop voluntary self-abstention (agents withdraw when they lack competence) and dynamic role invention (5,006 unique roles from 8 agents), producing emergent structures that fixed topologies cannot match. See Do self-organizing agent teams outperform rigid hierarchies?.

SAS vs MAS capabilities converge as frontier models improve. "Single-agent or Multi-agent? Why Not Both?" (2025) finds that MAS benefits diminish as LLMs gain long-context reasoning, memory retention, and tool use — mitigating the limitations that originally motivated MAS designs. Three defect types formalized as dependency graph problems: node-level (bottleneck agent caps performance), edge-level (downstream agents overwhelmed by upstream inputs — analogous to overthinking from external information), path-level (indecisive errors propagate as crucial context is lost during inter-agent summarization). A hybrid SAS/MAS cascading approach using confidence-guided routing improves accuracy 1.1-12% while reducing costs up to 88%. The exception: AIME (hardest math) where MAS consistently outperforms, confirming MAS value for extreme difficulty.


Source: Agents Multi Architecture

Related concepts in this collection

Concept map
17 direct connections · 132 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

multi-agent scaling follows three quantitative laws — tool-coordination trade-off capability saturation at 45 percent and topology-dependent error amplification