What specific network sizes trigger coordination degradation in LLM systems?
This asks for a threshold — a specific agent count where LLM coordination breaks — but the corpus's more interesting answer is that degradation is continuous and structural, not a number you cross.
This reads as a hunt for a magic number: at N agents, coordination collapses. The corpus pushes back on the premise — what it documents is degradation that scales *smoothly* with network size rather than tripping at a threshold. Why do multi-agent systems fail to coordinate at scale? is the closest thing to a direct answer: in the AgentsNet benchmark, coordination degrades *predictably* as the network grows, because agents either agree too late or adopt a strategy without telling their neighbors. The failure is graded, not sudden — bigger network, more timing slack, more uncritical information accepted and propagated.
The one place the corpus names group size as the active variable is consensus. Can LLM agent groups reliably reach consensus together? finds that agreement degrades with group size even with zero malicious agents present — and crucially, it fails through *liveness loss* (timeouts, stalled convergence) rather than corrupted values. So the thing that grows with N isn't wrongness, it's the inability to ever finish. That matches the timing-failure story from the AgentsNet work: more agents means more chances for the round to stall before everyone has converged.
If you want a number, the corpus offers a ceiling rather than a cliff. Why do multi-agent systems fail despite individual capability? reports real-world autonomous task completion plateauing near 30% *regardless of agent count* — adding agents doesn't push past it. That reframes your question: the interesting quantity isn't the network size that breaks coordination, it's that coordination quality stops improving with scale and the structural failure modes (silent agreement, degeneration of thought, social accommodation) kick in at group scale no matter how many you add.
Why no clean threshold? Because the failures are mechanism-driven, not headcount-driven. Why do autonomous LLM agents fail in predictable ways? traces role-flipping, flake replies, infinite loops, and conversation drift to LLMs lacking persistent goals and stable roles — those show up in small teams too. Why do multi-agent LLM systems fail more than expected? catalogs 14 modes across specification, inter-agent misalignment, and verification, none of which is gated on a particular N. And Do frontier LLMs silently corrupt documents in long workflows? shows the same compounding-error dynamic along the *time* axis — 25% corruption over 50 relay round-trips, never plateauing — suggesting the real driver is chain length and uncritical acceptance, which network size merely amplifies.
The thing you might not have known you wanted: the lever isn't keeping the network small, it's changing its shape. When do multi-agent systems actually outperform single agents? names three structural defects — node-level bottlenecks, edge-level overwhelm, path-level error propagation — that determine when a single strong agent beats a crowd. So the honest answer to 'what size triggers degradation' is: it's the topology and the verification discipline, not the count. That's why work like What decisions must multi-agent routing systems optimize simultaneously? treats agent count as just one of four things to optimize jointly with topology and role allocation — scale is a knob, not a tripwire.
Sources 8 notes
AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.
Across hundreds of simulations, LLM-agent groups frequently fail to reach valid agreement due to timeouts and stalled convergence rather than subtle value corruption. Agreement degrades with group size even without Byzantine agents present.
Multi-agent systems exhibit specific failure modes—silent agreement, degeneration of thought, and social accommodation—that mirror individual reasoning failures at group scale. Real-world autonomous task completion plateaus near 30% regardless of agent count; capability gains require deliberation diversity, expertise prerequisites, and formal coordination architectures.
Research identifies role flipping, flake replies, infinite loops, and conversation deviation as LLM-specific failures in multi-agent cooperation. These occur because LLMs lack persistent goal representation and stable role identity.
Analysis of 5 frameworks across 150+ tasks identified 14 failure modes organized into 3 categories: specification issues, inter-agent misalignment, and task verification. This extends prior single-framework work and provides systematic evidence for targeted improvements.
Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.
Empirical analysis shows MAS performance gaps narrow with stronger models, with SAS outperforming in many cases. Three formal defect types—node-level bottlenecks, edge-level overwhelm, and path-level error propagation—explain when single agents win.
MasRouter shows that routing in multi-agent systems must jointly optimize collaboration topology, agent count, role allocation, and per-agent LLM assignment through a cascaded controller. This unified approach surpasses single-model routing by 3.51% accuracy while cutting HumanEval costs by 49%.