Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
In-context learning (ICL) is a critical emerging capability of large language models (LLMs), enabling few-shot learning during inference by including a few demonstrations (demos) in the prompt. However, it has been found that ICL’s performance can be sensitive to the choices of demos and their order. This paper investigates an unexplored new positional bias of ICL for the first time: we observe that the predictions and accuracy can drift drastically when the positions of demos, system prompt, and user message in LLM input are varied. This bias, we refer to as DEMOS’ POSITION IN PROMPT bias (DPP bias). We design a systematic evaluation pipeline to study this type of positional bias across classification, QA, summarization, and reasoning tasks. We introduce two metrics, ACCURACY-CHANGE and PREDICTION-CHANGE, to quantify net gains and output volatility induced by demos’ position change.
We discover a novel positional bias in in-context learning (ICL): DPP bias, in which moving an unchanged block of demos from the start of a prompt to the end can swing task accuracy by up to 20 percents and flip almost half of a model’s predictions (see Fig. 1). This phenomenon, purely spatial, independent of demo content, challenges the widespread assumption that large language models learn robustly from any properly formatted context. Despite growing awareness of prompt sensitivity, the role of demo positioning where demos are placed relative to instructions, queries, or other contextual elements remains underexplored. Prior studies have focused primarily on demo selection (Liu et al., 2022), or template phrasing (Cho et al., 2024; Voronov et al., 2024), leaving a gap in understanding how spatial arrangements modulate ICL efficacy. This paper addresses this gap through a systematic investigation of positional effects across eight tasks spanning classification, reasoning, and generation.
2.2 Mechanistic Hypothesis
Recent research attributes positional bias in transformer-based models to intrinsic architectural tendencies, notably primacy bias and induction heads. Olsson et al. (2022) and Chan et al. (2022) highlight that transformers disproportionately emphasize early tokens due to induction head mechanisms, causing initial context to steer subsequent predictions significantly. Similarly, Xiao et al. (2024) note sequential processing biases towards earlier context, which impact performance when crucial information appears later in the sequence. Additionally, Liu et al. (2023) observed that tokens in the middle positions of sequences receive less attention, leading to performance degradation. Bietti et al. (2023) further supports this by linking primacy bias to underlying transformer memory mechanisms. While these hypotheses illuminate why order matters, empirical work on how they interact with prompt roles (system vs. user) is scarce. We provide the first role-aware stress test of these mechanisms.