Automatic Prompt Optimization with "Gradient Descent" and Beam Search
We propose a simple and nonparametric solution to this problem, Prompt Optimization with Textual Gradients (ProTeGi), which is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language “gradients” that criticize the current prompt, much like how numerical gradients point in the direction of error ascent. The natural language gradients are then “propagated” into the prompt by editing the prompt in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency.
we overcome the discrete optimization barrier by mirroring the steps of gradient descent within a text-based Socratic dialogue (Zeng et al., 2022), substituting differentiation with LLM feedback and backpropagation with LLM editing. In detail, we use minibatches of training data to produce “gradients” in natural language, i.e., descriptions of the current prompts’ flaws with respect to the minibatch, then edit the current prompt in the opposite semantic direction of the gradient. These steps become the expansion part of a wider beam search over the space of prompts, increasing algorithmic efficiency by treating the problem of beam candidate selection as an instance of the best arm identification problem (Audibert et al., 2010).