Large Language Model Programs
In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embedding it within an algorithm or program. To demonstrate the benefits of this approach, we present an illustrative example of evidence-supported question-answering.
As an alternative, we propose embedding LLMs into a program or algorithm. Crucially, instead of the LLM being responsible for maintaining the current state of the program (i.e. its context), the LLM, for each step of the program, is only presented with a step-specific prompt and context. Hiding information which is irrelevant to the current step allows us to focus on isolated subproblems whose results are further combined in future calls to the LLM. This intuitive approach allows us to extend the ability of an LLM to more complex tasks which are currently too difficult either because of a lack of ability or an architectural constraint such as an insufficiently large context.
Another drawback of training on random internet text is that such raw LLMs also struggle to be conversational, follow instructions, use tools, or interact with an environment.
Another limitation of LLMs arises from their decoder-only Transformer architecture, which has a finite context that restricts their capability to only process information within the predefined context. While this architecture may be advantageous for tasks like translation or language modelling, it struggles to learn certain classes of algorithms accurately, which can affect its ability to generalize beyond its training distribution (Csord´as et al., 2021).
To overcome such limitations, we propose LLM programs, a general approach to enhance the capability of an LLMbased system. Using an LLM program, we recognise the limitation of the LLM as a general agent. Instead of further training the model, we recursively deconstruct the expected behaviour into simpler steps that the LLM can perform to a sufficient degree. These individual steps are then strung together by a classic computer program (such as Python) that parses the outputs of previous steps, uses control flow, and augments the prompts of succeeding steps.