Schema-learning and rebinding as mechanisms of in-context learning and emergence
“In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs). Yet the mechanisms that underlie it are poorly understood. In this paper, we demonstrate that comparable ICL capabilities can be acquired by an alternative sequence prediction learning method using clone-structured causal graphs (CSCGs). Moreover, a key property of CSCGs is that, unlike transformer-based LLMs, they are interpretable, which considerably simplifies the task of explaining how ICL works. Specifically, we show that it uses a combination of (a) learning template (schema) circuits for pattern completion, (b) retrieving relevant templates in a context-sensitive manner, and (c) rebinding of novel tokens to appropriate slots in the templates. We go on to marshall evidence for the hypothesis that similar mechanisms underlie ICL in LLMs. For example, we find that, with CSCGs as with LLMs, different capabilities emerge at different levels of overparameterization, suggesting that overparameterization helps in learning more complex template (schema) circuits. By showing how ICL can be achieved with small models and datasets, we open up a path to novel architectures, and take a vital step towards a more general understanding of the mechanics behind this important capability.”
In a pre-trained sequence model, in-context learning (ICL), or few-shot prompting, is the ability to learn a new task from a small set of examples presented within the context (the prompt) at inference time. Surprisingly, large language models (LLMs) trained on sufficient data exhibit ICL, even though they are trained only with the objective of next token prediction [1, 2]. A good deal of the ongoing excitement surrounding LLMs arises from this unexpected capacity, since it dramatically enlarges their set of potential applications. Attempts to understand this capability are ongoing and take a variety of forms, including higher-level normative accounts using Bayesian inference [3], and mechanistic explanations involving implicit gradient descent [4] or induction heads [5]. Despite this, the mechanisms that underlie ICL in LLMs remain somewhat mysterious.
In this paper, we take an alternative approach. We reveal the conditions that drive ICL in a different sequence learning model called a clone-structured causal graph (CSCG) [6, 7]. Using a combination of new and standard datasets, we show how a CSCG assigns non-zero probabilities to sequences never seen during training in a way that, thanks to the model’s causal graph structure, is open to explicit and mechanistic interpretation. We hypothesize that similar mechanisms will exist in transformer-based LLMs, and show how this could be the case.ci