System 2 Attention (is something you might need too)

Paper · arXiv 2311.11829 · Published November 20, 2023
Reasoning by ReflectionAlignment

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response.

we posit that the underlying problem is inherent in the way the transformer itself is built, and in particular its attention mechanism. That is, soft attention tends to assign probability to a large portion of the context, including irrelevant portions, tends to overly focus on repeated tokens partly due to the way it is trained (Holtzman et al., 2019; Welleck et al., 2019)...

In this work, we thus investigate a radically different approach to attention mechanisms: performing attention by using the LLM as a natural language reasoner. Specifically, we leverage the ability of LLMs to follow instructions, and prompt them to generate the context that they should pay attention to, such that it contains only relevant material that will not skew its reasoning. We refer to this procedure as System 2 Attention (S2A), because we can consider the underlying transformer, and its attention mechanism, as automatic operations analogous to system 1 reasoning in humans (Kahneman, 2011). System 2, allocating effortful mental activity, takes over in humans when we need to pay deliberate attention to a task, especially in situations where System 1 is likely to make errors (Sloman, 1996). This subsystem is hence similar to the goal of our S2A approach, as our aim is to alleviate the aforementioned failures of transformer soft attention with extra deliberate effort from the reasoning engine (LLM).

Large Language Models obtain excellent reasoning capabilities and a vast quantity of knowledge through their pre-training process. Their next-word prediction objective requires them to pay close attention to the current context. For example, if a certain entity is mentioned in a context, it is likely that the same entity will appear again later in the same context. Transformer-based LLMs are capable of learning such statistical correlations as the soft attention mechanism allows them to find similar words and concepts within their context. While this may improve the next word prediction accuracy, it also makes LLMs susceptible to be adversely affected by spurious correlations in their context. For example, it is known that the probability of a repeated phrase increases with each repetition, creating a positive feedback loop (Holtzman et al., 2019). Generalizing this issue to so-called non-trivial repetition (Roller et al., 2020), models tend to repeat related topics in the context as well, not just specific tokens, because the latent representation is likely predictive of more tokens from that same topic space. When the context contains opinion that the model copies this is termed sycophancy (Perez et al., 2022), but in general we argue this issue is related to any kind of context as discussed above, not just the issue of agreement with opinions