ProAgent: Building Proactive Cooperative Agents with Large Language Models

Paper · arXiv 2308.11339 · Published August 22, 2023

Building agents with adaptive behavior in cooperative tasks stands as a paramount goal in the realm of multi-agent systems. Current approaches to developing cooperative agents rely primarily on learning-based methods, whose policy generalization depends heavily on the diversity of teammates they interact with during the training phase. Such reliance, however, constrains the agents’ capacity for strategic adaptation when cooperating with unfamiliar teammates, which becomes a significant challenge in zero-shot coordination scenarios. To address this challenge, we propose ProAgent, a novel framework that harnesses large language models (LLMs) to create proactive agents capable of dynamically adapting their behavior to enhance cooperation with teammates. ProAgent can analyze the present state, and infer the intentions of teammates from observations. It then updates its beliefs in alignment with the teammates’ subsequent actual behaviors.

untapped potential lies in investigating how LLM-based agents can effectively cooperate with other AI agents or humans.

The overview of our ProAgent framework, as is depicted in Fig. 1, involves constant interaction between agents and the environment. The inference pipeline of ProAgent is a hierarchical process that involves multiple interactions between the LLMs and the task at hand. We break down the pipeline into five key stages:

Knowledge Library and State Grounding. The pipeline starts with acquiring Knowledge Library specific to the current task and transforming the raw tensor state information into Language-based State description that the LLM can effectively comprehend.

High-level Skill Planning. Receiving the aligned language-based state, the LLM-based Planner then analyzes the current scene, infers the intention about the teammate agent’s intentions, and plans a skill for the current agent.

Belief Correction. The belief in the teammate agent’s intention is further corrected by the Belief Correction mechanism.

Skill Validation and Action Execution. The selected skill will be validated by the Verificator and a replan is needed if the current skill fails. If a valid skill is selected, and the Controller module decomposes it into low-level actions, allowing ProAgent to effectively interact with the task or environment. The controller can be rule-based, or RL-based methods.

Memory Storage. Throughout the pipeline, all relevant information involved in the prompt, planning process, validation process, and belief correction process is stored in the Memory module. This accumulated knowledge helps in making informed decisions and adjusting behavior over time.