INQUIRING LINE

How do static team decomposition and dynamic agent selection compare in efficiency?

This explores whether it's more efficient to design a fixed team of specialized agents up front (static decomposition) or to let the system pick and prune agents on the fly during a task (dynamic selection) — and the corpus suggests the static-vs-dynamic framing matters less than how much each approach spends on tokens.


This question reads static team decomposition (fix the roles and structure before the task runs) against dynamic agent selection (add, score, and drop agents while the task is in flight) — and asks which is cheaper for the same quality. The collection's most useful move is to reframe it: the dominant cost driver in multi-agent systems isn't the structure at all, it's token budget. Two notes find that roughly 80% of the performance variance across multi-agent systems comes from how many tokens get spent, not from how clever the coordination is How does test-time scaling work at the agent level? What makes multi-agent teams actually perform better?. So before comparing static vs. dynamic, the efficient question is: which approach burns fewer tokens to reach the same answer?

On that score, dynamic selection has a clear lever. DyLAN scores each agent's contribution mid-task and deactivates the uninformative ones, trimming the team without any task-specific tuning — you stop paying for agents that aren't earning their tokens Can multi-agent teams automatically remove their weakest members?. That's pure efficiency: same or better output, fewer active participants. But dynamism isn't free either — routing and discovery have their own overhead, which is why one note proposes versioned capability vectors in a searchable index so that matching an agent to a subtask scales sub-linearly instead of growing with team size Can semantic capability vectors replace manual agent routing?.

The surprise is that a hybrid beats both extremes. A 25,000-task experiment found that fixing the *structure* (external ordering of who-goes-when) while keeping role assignment *autonomous* (agents pick their own specialization and abstain when incompetent) outperformed centralized static hierarchies by 14% and fully autonomous dynamic systems by 44% Do self-organizing agent teams outperform rigid hierarchies?. The lesson: static decomposition wastes effort by locking in roles a task may not need; fully dynamic selection wastes effort thrashing on coordination. The efficient sweet spot is static skeleton, dynamic muscle.

There's also a quieter efficiency axis the static/dynamic debate usually misses — *which model* sits in each slot. Most agentic subtasks are repetitive and well-defined, and small language models handle them at 10–30× lower cost, making a heterogeneous default (small models everywhere, large ones only where needed) the economically rational pattern regardless of whether the team is statically or dynamically composed Can small language models handle most agent tasks?. Pair that with the finding that any multi-agent setup only pays off when members carry real domain expertise — diverse-but-shallow teams underperform a single competent agent Does cognitive diversity alone improve multi-agent ideation quality? — and the efficiency picture sharpens.

The deepest reframe, though, is whether you should be running a multi-agent team at all. As single models get stronger, the multi-agent advantage shrinks, and a lone capable agent often wins outright — node bottlenecks, edge overload, and error propagation along the path are the formal ways extra agents start costing more than they add When do multi-agent systems actually outperform single agents?. So the most efficient answer to "static or dynamic?" sometimes is "neither — one good agent." If you do go multi-agent, the empirical verdict is consistent: spend dynamism on pruning waste (DyLAN-style deactivation) and cheap models on routine work, but keep the coordination structure fixed so the team isn't paying a token tax just to decide who's in charge.


Sources 8 notes

How does test-time scaling work at the agent level?

Research shows 80% of multi-agent performance variance comes from token budget, not coordination intelligence. LatentMAS and shared-KV-cache approaches offer ways to decouple performance gains from token costs.

What makes multi-agent teams actually perform better?

Research shows 80% of performance variance across multi-agent systems stems from token budget, not coordination intelligence. Latent communication and shared cache architectures bypass this token tax by avoiding natural language bottlenecks.

Can multi-agent teams automatically remove their weakest members?

DyLAN's three-step importance scoring mechanism (propagation, aggregation, selection) quantifies individual agent contributions and automatically removes uninformative agents during inference, optimizing team composition without task-specific tuning.

Can semantic capability vectors replace manual agent routing?

Versioned Capability Vectors embedded in HNSW indices couple semantic matching with policy and budget constraints, making capability discovery a first-class operation that scales sub-linearly as agent heterogeneity increases.

Do self-organizing agent teams outperform rigid hierarchies?

A 25,000-task experiment across 8 models and multiple agent counts showed that sequential protocols with external ordering but internal role selection outperform centralized systems by 14% and fully autonomous systems by 44%. Agents spontaneously invented specialized roles and self-abstained when incompetent.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

When do multi-agent systems actually outperform single agents?

Empirical analysis shows MAS performance gaps narrow with stronger models, with SAS outperforming in many cases. Three formal defect types—node-level bottlenecks, edge-level overwhelm, and path-level error propagation—explain when single agents win.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a systems researcher re-evaluating claims about static vs. dynamic team efficiency in LLM-agent systems. The question remains open: which decomposition pattern minimizes token cost for a given task quality?

What a curated library found — and when (dated claims, not current truth):
Library findings span 2023–2026 and center on token budget as the dominant cost driver (~80% of performance variance, 2024–2025). Key dated claims:
• Dynamic agent deactivation (DyLAN-style contribution scoring) cuts tokens by pruning low-value agents mid-task, no task-specific tuning needed (2024–2025).
• Hybrid approach — fixed coordination skeleton + autonomous role assignment — outperforms static hierarchies by 14% and fully dynamic systems by 44% on 25k-task benchmark (2026).
• Small language models handle ~80% of agentic subtasks at 10–30× lower cost; heterogeneous teams (small+large) dominate cost-efficiency regardless of static vs. dynamic composition (2025–2026).
• Multi-agent advantage shrinks as single-agent LLM capability improves; error propagation and coordination overhead often make one capable agent cheaper than a team (2026).
• Cognitive diversity alone does not drive quality without domain expertise; shallow-diverse teams underperform single competent agents (2025).

Anchor papers (verify; mind their dates):
• arXiv:2310.02170 (2023): Dynamic LLM-Agent Network — early framing of dynamic optimization.
• arXiv:2506.02153 (2025): Small Language Models are the Future of Agentic AI — heterogeneity as efficiency pattern.
• arXiv:2603.28990 (2026): Self-Organizing LLM Agents Outperform Designed Structure — hybrid skeleton finding.
• arXiv:2604.02460 (2026): Single-Agent LLMs Outperform Multi-Agent on Multi-Hop Reasoning — tension with multi-agent framing.

Your task:
(1) RE-TEST THE TOKEN THESIS. Does the ~80% variance attribution still hold under latest model scaling (2024–now)? Has improved single-agent reasoning (chain-of-thought, test-time compute) narrowed or widened the gap? Does dynamic deactivation remain cheaper than static teams, or have orchestration/routing overheads been reduced enough that the difference is marginal? Cite what has shifted the regime.
(2) Surface the strongest CONTRADICTING work from the last 6 months: any paper showing static hierarchies or single agents underperforming multi-agent dynamic setups, or arguing that coordination overhead is negligible.
(3) Propose 2 research questions that assume the regime may have moved: (a) whether test-time inference scaling (e.g., extended reasoning) in single agents has made multi-agent systems obsolete; (b) whether emergence of better foundation models has made role specialization (the core of dynamic selection) unnecessary.

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines