Strategic Reasoning with Language Models

Paper · arXiv 2305.19165 · Published May 30, 2023

This paper introduces an approach that uses pretrained LLMs with few-shot chain-of-thought examples to enable strategic reasoning for AI agents. Our approach uses systematically generated demonstrations of reasoning about states, values, and beliefs to prompt the model. Using extensive variations of simple matrix games, we show that strategies that are derived based on systematically generated prompts generalize almost perfectly to new game structures, alternate objectives, and hidden information. Additionally, we demonstrate our approach can lead to human-like negotiation strategies in realistic scenarios without any extra training or fine-tuning. Our results highlight the ability of LLMs, guided by systematic reasoning demonstrations, to adapt and excel in diverse strategic scenarios.1

….

The LLM can then generalize to new scenarios through few-shot in-context examples of these systematically generated prompts. To capture human-like strategic reasoning, an agent needs to 1) search through the space of states and actions: for example, a bot that negotiates with a vendor must understand the space of inventory and how its offers will affect the negotiation, 2) assign values to these states and actions: the bot must understand which items are valuable to it, and what the vendor values, 3) form beliefs about the partially-observable world: based on the vendor’s actions, the bot must infer how much the vendor values the items. We develop an automated “prompt compiler” that can be used to systematically generate these demonstrations.

….

This paper aims to address the limitation of existing AI algorithms in their ability to adapt to new contexts by exploring the potential of language models to engage in strategic reasoning—the ability to foresee potential actions of others in the pursuit of possibly conflicting objectives, and to devise optimal strategies accordingly. This concept, central to game theory, entails reasoning about the interplay between multiple agents with divergent interests. Large language models (LLMs) have recently been shown to express human-like strategies [1, 14] and flexibility in reasoning, potentially understanding nuanced and contextual information [35]. Further, since language models are trained on a variety of data sources, they can be adapted to different tasks and environments, making them suitable for flexible reasoning and potentially generalization to new scenarios. Despite these successes, LLMs can however be brittle and unreliable in their reasoning, especially when reasoning about agents, social contexts [28] and planning [37].

Progress in game-playing AI, driven by breakthroughs in reinforcement learning (RL), self-play, and integrating them with tree search have led to successful strategic agents for Chess, Go, Starcraft, Poker and DOTA [31, 32, 38, 17, 5].

However, they were limited in producing agents that were adept at adapting to novel situations, such as new rules or objectives [15]. Recently, Cicero [2] demonstrated how language models could be used to create versatile agents capable of interacting and negotiating with humans through dialogue, by combining techniques in strategic reasoning and language modelling to create a dialogue-based agent that could play Diplomacy.

Large language models have been shown to be successful at reasoning [35] in a variety of contexts, especially when paired with prompting techniques that allow them to think through their steps [21, 39]. Other techniques have shown how reasoning can further be improved by breaking down the steps in a problem [12], combining language models as modules or cascades [10, 34], by fine-tuning/ bootstrapping [40, 36] its own reasoning and by tuning through human feedback [24]. In spite of these successes, LLMs are limited in their reasoning about agents [28] and there have been few attempts to apply these models to complex strategic reasoning tasks.

We provide a systematic approach for generating prompts that incorporate structure based on common strategic reasoning techniques. Specifically, we will consider prompting strategies based on search, value assignment, and belief tracking. To demonstrate our approach and prompting strategies, let us first describe two game settings that we study in this work: matrix games and negotiation games.