AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs

Paper · arXiv 2507.08616 · Published July 11, 2025

Large-language models (LLMs) have demonstrated powerful problem-solving capabilities, in particular when organized in multi-agent systems. However, the advent of such systems also raises several questions on the ability of a complex network of agents to effectively self-organize and collaborate. While measuring performance on standard reasoning benchmarks indicates how well multi-agent systems can solve reasoning tasks, it is unclear whether these systems are able to leverage their topology effectively. Here, we propose AGENTSNET, a new benchmark for multi-agent reasoning. By drawing inspiration from classical problems in distributed systems and graph theory, AGENTSNET measures the ability of multi-agent systems to collaboratively form strategies for problem-solving, self-organization, and effective communication given a network topology. We evaluate a variety of baseline methods on AGENTSNET including homogeneous networks of agents which first have to agree on basic protocols for organization and communication. We find that some frontier LLMs are already demonstrating strong performance for small networks but begin to fall off once the size of the network scales. While existing multi-agent benchmarks cover at most 2–5 agents, AGENTSNET is practically unlimited in size and can scale with new generations of LLMs. As such, we also probe frontier models in a setup with up to 100 agents.

To systematically study how agents exchange information and collaborate, we employ a communication model that draws inspiration from classical distributed computing, while adapting to the capabilities and constraints of modern LLM-based agents. Our setup is based on the LOCAL model [25] from distributed algorithms, in which the computation proceeds in synchronous rounds and each agent can exchange messages only with its immediate neighbors on the communication graph. Agents must base their decisions exclusively on local information aggregated over multiple rounds of interaction. This model captures fundamental aspects of decentralized reasoning, where global strategies emerge from purely local exchanges without centralized control. Unlike nodes in deterministic systems, LLM-based agents exhibit stochastic behavior due to inherent randomness in their generation processes. This means that our model is most closely aligned with the randomized version of the LOCAL model.

Given a communication network, each node, that is, each agent, is instantiated as an instruction-tuned LLM that interfaces with its neighbors through a structured chat history. Initially, we provide each agent with a system prompt detailing the task, for example, COLORING, the rules of message-passing, the names of its neighbors, and a notification that the agent must output a result in its final response after a fixed number of rounds of message-passing; see Appendix A for the full system prompt.

Our key findings are:

Finding 1: Strategy coordination poses an essential challenge on AGENTSNET.

We find multiple failure cases due to issues with coordinating a strategy between agents. In some cases, agents agree on a common strategy too late during message-passing, leaving an insufficient number of message-passing rounds to implement the strategy. In other cases, agents do not coordinate their strategy at all. Concretely, agents assume some strategy in their initial chain-of-thought and then follow that strategy throughout message-passing without informing neighbors about their strategy.

Finding 2: Agents generally accept information sent by neighbors.

This includes key information about the network, proposed strategies, or candidate solutions. While generally enabling effective coordination, agents sometimes fail to question erroneous information, leading to incorrect solutions. Examples of such erroneous information are incorrect assumptions about the network topology or ineffective strategies proposed by other agents.

Finding 3: Agents help their neighbors resolving inconsistencies in candidate solutions.

We find multiple examples where agents detect conflicting color assignments in COLORING problems between other agents and assist in resolving these conflicts. We present detailed examples and failure cases in Appendix E.