← All notes

What makes multi-agent teams actually perform better?

Navigation hub examining multi-agent reasoning, scaling laws, coordination failures, and architectural alternatives for agent collaboration.

Topic Hub · 28 linked notes · 8 sections
View as

Multi-Agent Reasoning and Evaluation

7 notes

Can multiple agents stay diverse during training together?

Does training separate specialist agents on different data maintain the reasoning diversity that single-agent finetuning destroys? This matters because diversity correlates with accuracy and prevents models from becoming trapped in narrow response patterns.

Explore related Read →

Can AI systems design unique multi-agent workflows per individual query?

Explores whether meta-agents trained with reinforcement learning can automatically generate personalized multi-agent system architectures tailored to individual user queries, rather than applying fixed task-level templates uniformly.

Explore related Read →

Does cognitive diversity alone improve multi-agent ideation quality?

This explores whether diverse perspectives in group AI systems automatically produce better ideas, or if something else—like expertise—is equally critical for collaborative ideation to outperform solo agents.

Explore related Read →

Can agents evaluate AI outputs more reliably than language models?

Does active evidence collection through tool use reduce judge inconsistency compared to passive reading-based evaluation? This matters for benchmarking AI systems where evaluation reliability directly affects research validity.

Explore related Read →

Can personas extracted from documents generalize across evaluation tasks?

This explores whether automating persona creation from domain documents—rather than hand-crafting roles—enables multi-agent evaluators to transfer across different tasks without redesign. The question matters because manual personas fail to generalize across domains.

Explore related Read →

Can branching prompts replicate what multi-agent systems do?

Explores whether non-linear prompting structures (tree-of-thought, debate prompting) can functionally replace multi-agent architectures, and whether a single LLM simulating multiple personas achieves the same cognitive benefits as multiple models collaborating.

Explore related Read →

Can tailoring queries per document improve debatable summarization?

When summarizing documents with opposing perspectives on a topic, does adapting the query to each document's unique content retrieve more balanced viewpoints than using a single uniform query?

Explore related Read →

Debate and Consensus Formation

1 note

Scaling Laws and Token Economics

4 notes

When does adding more agents actually help systems?

Multi-agent systems often fail in practice, but the reasons remain unclear. This research investigates whether coordination overhead, task properties, or system architecture determine when agents improve or degrade performance.

Explore related Read →

Does token spending drive multi-agent research performance?

Multi-agent systems outperform single agents substantially, but what actually accounts for that improvement? Is it intelligent coordination or simply spending more tokens on the same task?

Explore related Read →

Can small language models handle most agent tasks?

Explores whether smaller, cheaper models are actually sufficient for the repetitive, scoped work that dominates deployed agent systems, rather than relying on large models by default.

Explore related Read →

When do multi-agent systems actually outperform single agents?

As individual LLMs grow more capable, does the advantage of splitting work across multiple agents still hold? This explores when coordination overhead makes MAS counterproductive.

Explore related Read →

Coordination Failures and Socialization

3 notes

Why do multi-agent systems fail to coordinate at scale?

Explores how LLM agents struggle to synchronize strategy timing and validate information when coordinating across larger networks, revealing fundamental limits in distributed reasoning.

Explore related Read →

Why do multi-agent LLM systems fail more than expected?

This research asks what specific failure modes cause multi-agent systems to underperform despite their promise. Understanding these failure patterns is essential for building more reliable collaborative AI systems.

Explore related Read →

Why don't AI agents develop social structure at scale?

When millions of LLM agents interact continuously on a social platform, do they form collective norms and influence hierarchies like human societies? This tests whether scale and interaction density alone drive socialization.

Explore related Read →

Model Composition and Latent Communication

2 notes

Can language models discover new expertise through collaborative weight search?

Can model experts be composed through particle swarm optimization in weight space without training? This explores whether collaborative search can discover capabilities that no individual expert possesses.

Explore related Read →

Can agents share thoughts without converting them to text?

Can multi-agent systems exchange information through continuous hidden representations instead of language? This matters because text serialization loses information and slows inference.

Explore related Read →

Specialized Multi-Agent Applications

1 note

Cooperation and Delegation

3 notes

Can agents learn cooperation by adapting to diverse partners?

Explores whether sequence model agents can develop mutual cooperation strategies through in-context learning when trained against varied co-players, without explicit cooperation mechanisms or hardcoded assumptions.

Explore related Read →

What makes delegation work beyond just splitting tasks?

Delegation is more than task decomposition. What dimensions of a task—like verifiability, reversibility, and subjectivity—determine whether an agent can safely and effectively handle it?

Explore related Read →

Why do protocol-based tool systems fail in production agentic workflows?

Explores whether standardized tool protocols like MCP introduce non-determinism that undermines reliable agent execution, and what causes ambiguous tool selection in production systems.

Explore related Read →