Can AI Agents Agree?

Paper · arXiv 2603.01213

Large language models are increasingly deployed as cooperating agents, yet their behavior in adversarial consensus settings has not been systematically studied. We evaluate LLM-based agents on a Byzantine consensus game over scalar values using a synchronous all-to-all simulation. We test consensus in a no-stake setting where agents have no preferences over the final value, so evaluation focuses on agreement rather than value optimality. Across hundreds of simulations spanning model sizes, group sizes, and Byzantine fractions, we find that valid agreement is not reliable even in benign settings and degrades as group size grows. Introducing a small number of Byzantine agents further reduces success. Failures are dominated by loss of liveness, such as timeouts and stalled convergence, rather than subtle value corruption. Overall, the results suggest that reliable agreement is not yet a dependable emergent capability of current LLM-agent groups even in no-stake settings, raising caution for deployments that rely on robust coordination.

LLMs are increasingly deployed as interacting agents, and recent work documents common multi-agent failure patterns. Several approaches target robustness to malicious agents via coordination mechanisms. Closest to our setting, Chen et al. (2023) study benign numeric consensus via LLM negotiation and analyze the effects of agent number, personality, and topology. Our focus differs in three ways. First, we introduce a controlled Byzantine fraction. Second, we enforce a validity constraint requiring the decided value to be an initial honest proposal. Third, we separate validity from liveness and show that failures are dominated by liveness loss. Complementary work frames multi-agent reliability through a Byzantine fault-tolerance lens and proposes aggregation mechanisms such as confidence-weighted consensus. Concurrently, Grötschla et al. (2025) also observe that multi-agent LLM coordination degrades as network size increases. In a Mixture-of-Agents setting, Wolf et al. (2025) similarly find that a single deceptive agent suffices to nullify performance gains.

Even in benign, no-stake settings without Byzantine agents, LLM-agent groups frequently fail to reach valid consensus within the round limit, and performance declines as group size increases. Under adversarial conditions, the likelihood of valid consensus decreases further, with failures primarily resulting from no-consensus outcomes, even within our limited threat model. These findings indicate that current LLM agents are not yet reliable social decision-makers: agreement, which is essential for cooperation, delegation, and safety-critical coordination, remains fragile in our controlled, no-stake testbed. Our study is limited by testing only a single Byzantine strategy and two model sizes from one family, and future work should investigate diverse adversarial behaviors, heterogeneous agent populations, and larger-scale deployments. We hope this work will inspire further research to address in greater depth the fundamental question of whether AI agents can reliably achieve agreement.

Can AI Agents Agree?

Synthesis notes that discuss concepts related to this paper