Collaborative Reasoner: Self-Improving Social Agents with Synthetic Conversations

Paper · Source

With increasingly powerful large language models (LLMs) and LLM-based agents tackling an ever-growing list of tasks, we envision a future where numerous LLM agents work seamlessly with other AI agents and humans to solve complex problems and enhance daily life. To achieve these goals, LLM agents must develop collaborative skills such as effective persuasion, assertion and disagreement, which are often overlooked in the prevalent single-turn training and evaluation of LLMs. In this work, we present Collaborative Reasoner (Coral ), a framework to evaluate and improve the collaborative reasoning abilities of language models. In particular, tasks and metrics in Coral necessitate agents to disagree with incorrect solutions, convince their partners of a correct solution, and ultimately agree as a team to commit to a final solution, all through a natural multi-turn conversation. Through comprehensive evaluation on six collaborative reasoning tasks covering domains of coding, math, scientific QA and social reasoning, we show that current models cannot effectively collaborate due to undesirable social behaviors, collapsing even on problems that they can solve singlehandedly. To improve the collaborative reasoning capabilities of LLMs, we propose a self-play method to generate synthetic multi-turn preference data and further train the language models to be better collaborators. Experiments with Llama-3.1, Ministral and Qwen-2.5 models show that our proposed self-improvement approach consistently outperforms finetuned chain-of-thought performance of the same base model, yielding gains up to 16.7% absolute. Human evaluations show that the models exhibit more effective disagreement and produce more natural conversations after training on our synthetic interaction data. 1

Coral is a comprehensive framework focused on evaluating and enhancing the collaborative reasoning skills of language models. More specifically, given a reasoning problem (e.g., math, physics, theory-of-mind), Coral emulates human-AI collaboration and requires two agents to work together on the problem through a multi-turn conversation. Along with solving the problem correctly, it also requires agents to agree with each other before committing to a final solution of a given problem. Consequently, learning to disagree to incorrect solutions, i.e., assertiveness, asking clarifying questions, and convincing the partner of a correct solution, i.e., persuasiveness, are required to succeed. We evaluate several frontier open and closed sourced LLMs on 6 reasoning tasks under this collaborative setting, spanning domains across coding, math, scientific question answering and social story comprehension. Compared with single-agent approaches such as chain-of-thought prompting, we find even these frontier models are inconsistent at leveraging collaboration to better approach these tasks. Further analysis on social behaviors via our designed social metrics reveals a tendency for agents to be overly agreeable (> 90% agreement score), regardless of reasoning correctness, limiting their ability to challenge incorrect solutions and reducing collaboration efficacy.

When modeling reasoning problems in a single-turn, it is common to first generate a sequence that represents the thinking process (e.g., chain-of-thought) followed by the final answer. However, in a multi-turn conversational setting, each turn may not conclude with a clear final answer, as the agents may be planning the steps, debating on a fact, or as in Fig. 1, asking a clarification question. Moreover, agreement can be partial (e.g., “I agree that X, but that doesn’t mean Y.”) or of higher order (e.g., “I agree that my previous disagreement is unwarranted.”), which makes measuring of agreement between agents in a multi-turn setting quite challenging. These metrics below are automatically derived using belief extraction without human annotation, enabling scalable analysis of social behaviors.