Language models are weak learners

Paper · arXiv 2306.14101 · Published June 25, 2023

“A central notion in practical and theoretical machine learning is that of a weak learner, classifiers that achieve better-than-random performance (on any given distribution over data), even by a small margin. Such weak learners form the practical basis for canonical machine learning methods such as boosting. In this work, we illustrate that prompt-based large language models can operate effectively as said weak learners. Specifically, we illustrate the use of a large language model (LLM) as a weak learner in a boosting algorithm applied to tabular data. We show that by providing (properly sampled according to the distribution of interest) text descriptions of tabular data samples, LLMs can produce a summary of the samples that serves as a template for classification and achieves the aim of acting as a weak learner on this task. We incorporate these models into a boosting approach, which in some settings can leverage the knowledge within the LLM to outperform traditional tree-based boosting. The model outperforms both few-shot learning and occasionally even more involved fine-tuning procedures, particularly for tasks involving small numbers of data points. The results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.

We now describe the main methodology of our paper, which uses LLMs to generate weak learners, and in turn, uses these weak learners within a boosting framework. We refer to the full method as Summary Boosting, as the core learning process is one that uses a language model to create a summary of (specifically chosen) samples from the dataset; these summaries themselves function as prompts by which we can make predictions on new examples. Finally, we use boosting to construct an ensemble of these summaries that gives the overall predictions on new data points.”

In this paper, we align these two threads of research and ask a simple question: can LLMs also serve as weak learners in a boosting framework, specifically on tabular data (where boosting methods are most commonly applied)? We answer this question largely in the affirmative. Specifically, we show that by appropriately converting tabular data to text form, and asking LLMs to summarize a carefully chosen set of examples from the data, we produce a summary of the examples that can serve as a template (i.e., a prompt) for a tabular data classifier, and one which typically achieves this weak learning aim. This enables us to correspondingly integrate this collection of LLM-generated weak learners into a boosting framework.