CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

Paper · arXiv 2303.17760 · Published March 31, 2023

The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their “cognitive” processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named roleplaying . Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems,

This predicament is raising a crucial question: can we replace human intervention with an autonomous communicative agent capable of steering the conversation toward task completion with minimal human supervision?

we propose a novel cooperative agent framework named role-playing to automate cooperation between communicative agents. Specifically, our proposed approach involves using role-playing with inception prompting to autonomously guide the communicative agents toward task completion. Only a preliminary idea is needed from human to guide the conversations toward complex task-solving.

However, generating instruction datasets is a crucial challenge in building instruct-based LLMs, with existing datasets ranging from crowdsourced to generated. Hand-crafted instruction instances are available in [120], while leveraging previously crowdsourced NLP datasets is a less labor-intensive curation approach [121, 71, 78, 53]. LLMs have been explored for data generation in [101, 63, 68, 114], and Self-Instruct [119] proposes a semi-automated process for instruction instance generation. Unnatural-Instruction [48] collects instruction instances by prompting a language model with only three seed examples and paraphrasing the generated instances to expand the dataset. There is also a large chunk of work that has proposed methods for automatic dataset creation [67, 57, 19, 75, 20, 98, 59, 96, 129, 62, 130, 86, 8]. Another important challenge is prompt engineering. The quality of the prompt used to guide LLMs significantly affects its performance [91, 12, 66]. While LMs pre-trained on large data can implicitly learn tasks with few-shot prompting, hand-crafted prompts may not always suffice. Automated prompt generation methods have been proposed, such as gradient-guided search [104], mining-based and paraphrasing-based techniques [54], a meta-prompt [93], and automatic instruction selection and generation [136]. In this work, we introduce a conversational LLM auto-prompting method called Inception Prompting, which enables agents to prompt each other to solve tasks through Role-Playing. The AI user continuously provides instructions to the AI assistant for task-solving. This enables us to save the streaming instruction-solution pairs and create diverse, instructional, conversational, and task-oriented datasets.

Our prompt engineering occurs solely at the beginning of role-playing, for task specification and role assignment. Once the conversation phase commences, the AI assistant and AI user prompt each other automatically in a loop until termination. As such, we refer to our technique as Inception Prompting. Our Inception prompt consists of three prompts: the task specifier prompt PT , the assistant system prompt PA, and the user system prompt PU. As an example, we consider the inception prompt of the AI Society scenario. The templates for these prompts of AI Society role-playing are shown in Figure 2. The task specifier prompt contains information about the roles of the AI assistant and AI user in the role-playing session. Therefore, the task specifier agent can take a preliminary task/idea as input and generate a specific task using imagination. The AI assistant system prompt PA and the AI user system prompt PU are mostly symmetrical and include information about the assigned task and roles, communication protocols, termination conditions, and constraints or requirements to avoid unwanted behaviors.

Prompt Engineering. To delve deeper into the details in Figure 2, we start by chunking the various parts of the AI assistant system prompt PA shown below:

• Never forget you are a <ASSISTANT_ROLE> and I am a <USER_ROLE>. This assigns the chosen role to the assistant agent and provides it with information about the user’s role.

• Never flip roles! Never instruct me! This prevents agents from flipping roles. In some cases, we have observed the assistant and the user switching roles, where the assistant suddenly takes control and instructs the user, and the user follows those instructions.

• You must decline my instruction honestly if you cannot perform the instruction due to physical, moral, legal reasons or your capability and explain the reasons. This prohibits the agent from producing harmful, false, illegal, and misleading information.

• Unless I say the task is completed, you should always start with: Solution: <YOUR_SOLUTION>. <YOUR_SOLUTION> should be specific, and provide preferable implementations and examples for task-solving. This encourages the assistant always responds in a consistent format, avoiding any deviation from the structure of the conversation, and preventing vague or incomplete responses, which we refer to as flake responses, such as "I will do something".

• Always end your solution with: Next request. This ensures that the assistant keeps the conversation going by requesting a new instruction to solve. For the AI user system prompt PU, we strive to maintain as much symmetry as possible with respect to the AI assistant system prompt. Apart from the opposite role assignment, the user system prompt differs from the assistant prompt in the following ways:

• You must instruct me ... to complete the task ONLY in the following two ways: 1. Instruct with a necessary input: ...; 2. Instruct without any input: ... This follows the typical data structure of instruction-following, which allows the generated instruction-solution pairs to be easily used for fine-tuning LLMs.

• Keep giving me instructions and necessary inputs until you think the task is completed. When the task is completed, you must only reply with a single word <CAMEL_TASK_DONE>. We introduce an end-of-task token, namely, <CAMEL_TASK_DONE>. This token is used once the user believes the task is done. This ensures that the chat is terminated when the user is satisfied. Without doing so, the agents might fall into a chatting loop where they keep on saying “thank you” to each other or “goodbye” indefinitely.

Role Flipping: One challenge we encountered was role flipping, where the assistant and user switch roles during the conversation. This issue typically arises when the assistant starts providing instructions or commands instead of following the user’s prompts, which can lead to confusion and a reversal of roles. To avoid role flipping, it is crucial for the assistant not to ask questions, as this can also contribute to the problem.

• Assistant Repeats Instruction: Another challenge that we observed was the assistant simply repeating the user’s instructions without any role flipping occurring.

• Flake Replies: We also observed instances where the assistant agent responds with a flake reply, often taking the form of "I will...". These messages do not contribute to the task at hand, as the assistant promises to take action but ultimately fails to follow through.

• Infinite Loop of Messages: An interesting challenge that we encountered was when the assistant and user engage in an infinite loop of meaningless conversation, such as repeatedly thanking each other or saying goodbye without progressing the task.

Through our analysis, we show that achieving autonomous cooperation is challenging due to issues like conversation deviation, role flipping, and termination conditions.