SYNTHESIS NOTE

Can multi-agent teams automatically remove their weakest members?

Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.

Synthesis note · 2026-02-23 · sourced from Agents

DyLAN (Dynamic LLM-Agent Network) introduces a systematic mechanism for multi-agent team optimization that addresses three properties simultaneously: task agnosticism, efficiency, and automatic team composition.

The core mechanism is the Agent Importance Score, computed through a three-step procedure:

Propagation — each agent rates its predecessors on their solution quality
Aggregation — for each agent, ratings from successors are compiled to quantify its contribution
Selection — after summing ratings across all time steps, top-performing agents are retained and low-performing agents deactivated

This creates a dynamic interaction architecture: agents viewed as nodes in a network exchange messages as edges across time steps. An LLM-empowered ranker ranks agents at inference time and deactivates low-performing ones for subsequent rounds, while an early-stopping mechanism prevents unnecessary iterations.

The insight connects to multiple threads in multi-agent reasoning:

Since Why do multi-agent LLM systems converge without genuine deliberation?, DyLAN's contribution scoring provides a partial solution — agents that merely agree without adding information would receive low importance scores and get deactivated. This prevents the noise-amplification problem documented in When does debate actually improve reasoning accuracy?.

The approach contrasts with Can extreme task decomposition enable reliable execution at million-step scale? (MAKER), which uses static decomposition with voting. DyLAN dynamically prunes the agent network during execution — a more adaptive but less parallelizable strategy. The trade-off maps onto How should we balance parallel versus sequential compute at test time?: static decomposition enables parallelism while dynamic selection enables adaptation.

The Agent Importance Score also provides a concrete implementation of the "contribution-based routing" that Can AI systems detect when they've genuinely reached agreement? advocates — but generalized beyond agreement detection to overall contribution quantification.

AgentVerse four-stage dynamic group adjustment (from Arxiv/Agents Multi): AgentVerse extends the dynamic team composition principle with a four-stage group problem-solving process that mirrors human group dynamics: (1) Expert Recruitment — dynamically adjusting team composition based on current problem-solving progress; (2) Collaborative Decision-Making — recruited agents discuss and formulate strategies until consensus; (3) Action Execution — agents interact with the environment to execute agreed actions; (4) Evaluation — comparing current state to desired goal, with feedback reward looping back to stage 1 for team re-composition. Unlike DyLAN's contribution scoring which prunes within a fixed network, AgentVerse's recruitment stage can introduce new agent profiles not in the original team. The evaluation-to-recruitment feedback loop enables adaptive team evolution over the course of problem-solving — the team that finishes may differ substantially from the team that started.

MasRouter's four-decision MASR framework (from Arxiv/Routers): MasRouter formalizes multi-agent system routing as four simultaneous decisions: collaboration topology, agent count, role allocation, and per-agent LLM selection. This reveals that DyLAN's contribution-based agent selection addresses only runtime optimization within an already-constructed network. MasRouter constructs the network itself — choosing topology, roles, and LLM assignments from scratch via a cascaded variational-probabilistic-multinomial controller. The two approaches are complementary: MasRouter for initial construction (design-time routing), DyLAN for runtime adaptation (inference-time pruning). Composing them would create a system that starts with an optimal network configuration AND adapts it during execution. See What decisions must multi-agent routing systems optimize simultaneously?.

Inquiring lines that use this note as a source 33

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 5

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

14 direct connections · 79 in 2-hop network ·medium cluster Open in graph ↗

Can multi-agent teams automatically remove their… Why do multi-agent LLM systems converge without ge… Can extreme task decomposition enable reliable exe… Can AI systems detect when they've genuinely reach… When does debate actually improve reasoning accura… What decisions must multi-agent routing systems op…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Why do multi-agent LLM systems converge without genuine deliberation? Multi-agent reasoning systems are designed to improve answers through debate, but often agents simply agree with early confident claims rather than genuinely disagreeing. What drives this pattern and how common is it?
the problem DyLAN partially addresses: uninformative agents get deactivated
Can extreme task decomposition enable reliable execution at million-step scale? Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
contrasting approach: static decomposition vs dynamic pruning
Can AI systems detect when they've genuinely reached agreement? When multiple AI agents debate, they often converge without actually deliberating. Can a dedicated agent reliably identify true agreement versus false consensus, and would that improve debate outcomes?
agreement detection as a special case of contribution scoring
When does debate actually improve reasoning accuracy? Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
deactivating low-quality agents could reduce error amplification
What decisions must multi-agent routing systems optimize simultaneously? Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?
MasRouter: design-time construction of the network DyLAN then prunes at runtime

Can multi-agent teams automatically remove their weakest members?

Related concepts in this collection 5

Related papers in this collection 8

Search by related questions 4