Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper · arXiv 2407.18219 · Published July 25, 2024

Even the strongest proprietary large language models (LLMs) do not quite exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, we develop RISE: Recursive IntroSpEction, an approach for fine-tuning LLMs to introduce this capability, despite prior work hypothesizing that this capability may not be possible to attain. Our approach prescribes an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. RISE poses fine-tuning for a single-turn prompt as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by principles in online imitation learning and reinforcement learning, we propose strategies for multi-turn data collection and training so as to imbue an LLM with the capability to recursively detect and correct its previous mistakes in subsequent iterations.

Our contribution is an algorithm RISE: Recursive Introspection (Figure 1) that utilizes these insights to improve the self-improvement capability of an LLM over the course of multiple attempts at a given prompt. In each iteration, our approach bootstraps on-policy rollouts from the learner with better responses at the next turn obtained by running best-of-N (using a success indicator on the task) on multiple revision candidates obtained by sampling from the learner itself or using responses from a more capable model, whichever is more convenient.

Typically these works focus on building prompting techniques for effective multi-turn interaction with external tools [5, 7, 14, 32, 49, 54, 56], sequentially refining predictions by reflecting on actions [7, 15, 63], asking the model to verbalize its thoughts [33, 52, 65], asking the model to critique and revise itself [31, 40] or by using other models to critique a primary model’s responses [2, 12, 20, 54]. Although a subset of this work does improve its own responses, this self-correction ability often requires access to detailed error traces (e.g., execution traces from code compilers [7, 31]) in order to succeed.