Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper · arXiv 2502.06060 · Published February 9, 2025

Communicating in natural language is a powerful tool in multiagent settings, as it enables independent agents to share information in partially observable settings and allows zero-shot coordination with humans. However, most prior works are limited as they either rely on training with large amounts of human demonstrations or lack the ability to generate natural and useful communication strategies. In this work, we train language models to have productive discussions about their environment in natural language without any human demonstrations. We decompose the communication problem into listening and speaking. Our key idea is to leverage the agent’s goal to predict useful information about the world as a dense reward signal that guides communication. Specifically, we improve a model’s listening skills by training them to predict information about the environment based on discussions, and we simultaneously improve a model’s speaking skills with multi-agent reinforcement learning by rewarding messages based on their influence on other agents. To investigate the role and necessity of communication in complex social settings, we study an embodied social deduction game based on Among Us, where the key question to answer is the identity of an adversarial imposter. We analyze emergent behaviors due to our technique, such as accusing suspects and providing evidence, and find that it enables strong discussions, doubling the win rates compared to standard RL. We release our code and models at https://socialdeductionllm.github.io/.

A longstanding goal of multi-agent artificial intelligence is the development of independent agents that can communicate using a shared language. Communication is especially necessary in “partially observable” settings, where each agent only has a limited view of the world and therefore benefits from sharing knowledge with other agents to achieve its goal. In particular, “social deduction” games are settings where each agent’s goal is to deduce information about the environment by communicating with other agents – requiring each player to learn how to parse messages from other players while effectively sharing important information needed for game completion.

In this work, we study the hidden-role game of Among Us [18] as a specific instance of a challenging social deduction game to investigate the importance of communication, illustrated in Fig. 1. Hidden-role games [4, 19] are a class of environments where players are split into an uninformed majority and a smaller informed hidden subteam, which we refer to as crewmates and imposters respectively. These two teams are adversaries, resulting in a zero-sum game, where the goal of the crewmates is to deduce the identity of imposters to vote them out. Unlike other popular hidden role games such as the game of Mafia [2], where statements from players are unfalsifiable, Among Us is based in a 2D embodied environment, allowing discussions and intuitions to be grounded in specific observations. In the game, crewmates try to complete an assigned set of tasks scattered across the environment while imposters try to kill all crewmates. If a player reports the corpse of an eliminated crewmate – killed by an imposter – the game moves to a discussion phase with a free-form chat followed by a voting period, where all players vote to eject a suspected imposter. For crewmates, success in the discussion phase would mean correctly voting out the imposter, while success for imposters means avoiding suspicion from the crewmates to continue staying in the game as long as possible.

Just training with RL significantly boosts the performance relative to the base models, even significantly outperforming the 7B parameter model. However, we still find that RL without the additional listening loss struggles to reason about the identity of imposters. Even when warm-starting the RL policy from 𝜋L, we find that it quickly loses the ability to identify imposters, instead voting for any agent with equal probability. When we instead only trained with listening – using the loss 𝐿L only – the model, 𝜋L, does not know which actions are effective or how to discuss details about the environment, but it is an effective baseline due to the fact that predicting the identity of the imposter is valuable in Among Us.

When combining RL and the listening loss, we find success rates again increase dramatically, with further improvements when adding our denser speaking rewards, as agents can now differentiate between helpful and unhelpful messages when training. We ultimately find that our full model achieves twice the win rate of the RL-only baseline in the base environment. Note that the difference in scores when adding the additional speaking term is relatively small. Even without the explicit speaking reward, the language model produces coherent messages, often sharing their current suspicions during discussion rounds, thus benefiting from discussion even without additional rewards. This is an interesting emergent behavior as it shows that speaking is indirectly improved by training the model to listen better.