Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences
Decision conferences are structured, collaborative meetings that bring together experts from various fields to address complex issues and reach a consensus on recommendations for future actions or policies. These conferences often rely on facilitated discussions to ensure productive dialogue and collective agreement. Recently, Large Language Models (LLMs) have shown significant promise in simulating real-world scenarios, particularly through collaborative multi-agent systems that mimic group interactions. In this work, we present a novel LLM-based multi-agent system designed to simulate decision conferences, specifically focusing on detecting agreement among the participant agents. To achieve this, we evaluate six distinct LLMs on two tasks: stance detection, which identifies the position an agent takes on a given issue, and stance polarity detection, which identifies the sentiment as positive, negative, or neutral. These models are further assessed within the multi-agent system to determine their effectiveness in complex simulations. Our results indicate that LLMs can reliably detect agreement even in dynamic and nuanced debates. Incorporating an agreement-detection agent within the system can also improve the efficiency of group debates and enhance the overall quality and coherence of deliberations, making them comparable to real-world decision conferences regarding outcome and decision-making. These findings demonstrate the potential for LLM-based multi-agent systems to simulate group decision-making processes.
Key areas of application include societal simulation, gaming, psychology, recommender systems, economy, and policy-making [1].
Key areas of application include societal simulation, gaming, psychology, recommender systems, economy, and policy-making [1]. A promising application of large language model multi-agent collaborative systems is in decision conferencing and related to this also in expert elicitation workshops, which can be used for healthcare policymaking, classification of controlled substances and elicitation of informative priors for Bayesian statistical models [5]. Decision conferences are often used as a tool for addressing and resolving important issues within an organization [6]. Participants are guided by an impartial facilitator in a process without a fixed agenda, and without formal presentations, to debate and exchange opinions, creating a shared understanding of the issue and committing to a way forward that can be agreed upon by all participants [6]. Decision conferences are a form of computer-supported cooperative work (CSCW) but are inherently human-centric, with computers used only for modelling the issues as seen by the participants [7]. Due to their human-centric focus and specific objectives, which prioritize fostering shared understanding and a common sense of purpose, decision conferences aim to reach an agreement among all participants regarding the discussed issues [7]. The field of group decision-making has been extensively studied, but there is relatively little research on decision conferences [8]. The increasing use of LLMs to model human social behaviour and communication raises questions about their potential for simulating decision conferences and studying communication in social groups within specific contexts [9]. LLMs are often used for interaction and collaboration in social settings and simulating real-world scenarios, such as translation, question answering, essay writing, programming, and group discussions aimed at uncovering clues [10]. Their potential for group agreement and decision-making remains largely unexplored, despite their significant potential for application in politics and corporate decision-making. This gap leaves several questions related to the agreement in decision conferences and the usage of LLMs in multi-agent collaborative systems. Is it possible to use LLMs to detect agreement accurately? Is it then possible to use LLMs to detect agreement in open discussions of participants with different perspectives where differences in opinions are actively sought? How can a system that simulates decision conferences with open discussions and different opinions of participants be simulated using LLMs and which parts are important? Does the detection of agreement within the simulated system help the LLM agents to engage more and does it lead to a more realistic outcome of the debate? We aim to address these questions
Our results demonstrate that LLMs can effectively perform zero-shot agreement detection across a wide range of topics and with varying contexts, making them wellsuited for decision conferences that span diverse subject areas. Additionally, we show that open-source and smaller LLMs can be applied to this task. We also find that incorporating a dedicated agent for agreement detection is crucial to ensure a detailed and conclusive debate among participating agents, leading to outcomes comparable to real-world decision conferences.
There is a growing interest in using these systems to simulate real-world tasks. For instance, multi-agent systems have been created to manage comprehensive business planning, involving stages such as market segmentation, customer profiling, strategy formulation, competitor analysis, and sales material creation [15]. Similarly, multiagent systems in software development integrate LLMs to streamline key activities, allowing for seamless natural language communication without requiring specialized models at each stage [16]. These systems’ core is multi-agent reasoning, where agents collaborate to solve tasks. While simple debate-based approaches—where agents discuss a topic and use the resulting conclusions—can be effective, they also face challenges, such as agents getting stuck on incorrect viewpoints or failing to reach consensus, stalling progress [4, 14, 16].
Additionally, some research explores the social aspects of collaborative multi-agent systems, including the use of “social simulacra” to simulate realistic interactions within a virtual community of agents [19, 20]. Another emerging application is the simulation of international conflict resolution, where LLM-based multi-agent systems model conflict dynamics and potential resolutions [21].
These simulations not only assess LLM capabilities but also provide insights into the behaviour of the systems they simulate, such as identifying triggers for conflict or analyzing discourse structures in political speech [12, 21]. Similarly, this work evaluates LLMs for their ability to detect agreement within decision-making contexts. While the primary focus is on LLM performance, the findings could contribute to a deeper understanding of decision conferences and future research in collaborative decisionmaking using LLM-based multi-agent systems.
2.3 Collaborative Decision-Making
Decision-making is a crucial human capability for everyday life. The use of Large Language Models (LLMs) to simulate decision-making processes is a new development that has expanded the possibilities for exploring complex social dynamics. Current research focuses on making decision-making systems understandable, especially in contexts where their decisions could have real-world implications. For example, researchers have studied how sentiment influences opinion dynamics in multi-agent systems and have identified the determinants of LLM-assisted decision-making [22, 23].
Another area of research investigates the interactions among individual agents when solving complex tasks that involve decision-making, such as those related to team-building or survival tasks. In these contexts, the dynamics of agreement and consensus-building are evaluated, particularly in relation to how these systems mirror or differ from human decision-making processes. Studying how agents negotiate, collaborate, or conflict during task resolution provides valuable insights into the behaviour of LLM-based systems in real-world decision-making scenarios [10].
Algorithm 1 Speaker Selection Function used in the Simulated Decision Conference
if length(messages) ≤ 1 then Speaker ← moderator else if last speaker = judge agent then if Agreement ∈ message content then Speaker ← evaluation agent else if Debate ∈ message content then Speaker ← moderator agent end if else if last speaker = evaluation agent then else if last speaker = moderator agent then Speaker ← participant1 agent else if last speaker = participant1 agent then Speaker ← participant2 agent else if last speaker = participant2 agent then Speaker ← judge agent else Speaker ← auto In addition to the schematic in Figure 1, the speaker selection function 1 includes an additional evaluator agent. This agent provides an evaluation of the debate of the participant agents by scoring them on a scale of one to ten according to factors such as:
- Clarity: How clear is the exchange? Are the statements and responses easy to understand?
- Relevance: Do the responses stay on topic and contribute to the conversation’s purpose?
- Conciseness: Is the dialogue free of unnecessary information or redundancy?
- Politeness: Are the participants respectful and considerate in their interaction?
- Engagement: Do the participants seem interested and actively involved in the dialogue?
- Flow: Is there a natural progression of ideas and responses? Are there awkward pauses or interruptions?
- Coherence: Does the dialogue make logical sense as a whole?
- Responsiveness: Do the participants address each other’s points adequately?
- Language Use: Is the grammar, vocabulary, and syntax appropriate for the context of the dialogue?
- Emotional Intelligence: Are the participants aware of and sensitive to the emotional tone of the dialogue?