LLM Reasoning and Architecture Design & LLM Interaction Language Understanding and Pragmatics

How much does demo position alone affect in-context learning accuracy?

Moving demonstrations from prompt start to end without changing their content produces surprisingly large accuracy swings. Does spatial position in the prompt matter more than what demonstrations actually contain?

Note · 2026-02-23 · sourced from Context Engineering

In-context learning performance is sensitive to which demos are selected and in what order they appear — that much was known. What the DPP bias paper reveals is a different and larger effect: where the entire demo block sits relative to other prompt components (system prompt, user message) matters more than the content of the demos themselves.

Moving an unchanged block of demonstrations from the start of a prompt to the end can swing task accuracy by up to 20% and flip almost half of the model's predictions. This is a purely spatial effect. The demos are identical — their position relative to instructions and queries is the only variable. The effect spans classification, QA, summarization, and reasoning tasks, measured via two metrics: ACCURACY-CHANGE (net accuracy shift) and PREDICTION-CHANGE (output volatility from repositioning).

The mechanistic hypothesis draws on three architectural tendencies: primacy bias (transformers disproportionately emphasize early tokens due to induction head mechanisms), sequential processing bias (earlier context steers subsequent predictions more strongly), and lost-in-the-middle (tokens in middle positions receive less attention). These are known individually, but the DPP paper provides the first role-aware stress test — examining how these biases interact with prompt roles (system vs. user).

This extends the ordering sensitivity documented in Why do chain-of-thought examples fail across different conditions? to a larger spatial scale. CoT exemplar brittleness finds 3.3% accuracy swings from shuffling exemplars among themselves. DPP bias finds 20% swings from repositioning the entire block — roughly 6x the magnitude, operating at prompt-architecture level rather than within-exemplar level. Both share a root cause: LLMs process prompts as sequential narratives, not as unordered information sets.

The connection to How much does the order of premises actually matter for reasoning? reinforces the pattern: ordering effects appear at every spatial granularity — within premises (30%), within exemplar sets (3.3%), and across prompt components (20%). The shared mechanism is that Does transformer attention architecture inherently favor repeated content? — positional prominence in the attention window is an architectural property, not a training artifact.

The practical implication for prompt engineering is that demo placement is not a formatting choice but a performance-critical decision. For production systems using ICL, the position of demonstrations relative to instructions should be treated as a hyperparameter — and one that may need task-specific tuning.

Source: Context Engineering

Related concepts in this collection

Why do chain-of-thought examples fail across different conditions? Chain-of-thought exemplars show surprising sensitivity to order, complexity level, diversity, and annotator style. Understanding these brittleness dimensions could reveal what makes reasoning prompts robust or fragile.
extends order sensitivity from within-exemplar (3.3%) to prompt-architecture level (20%); shared sequential-processing mechanism at different spatial scales
How much does the order of premises actually matter for reasoning? When you rearrange the order of logical premises in a deduction task, does it change how well language models can solve it? This tests whether LLMs reason abstractly or process input sequentially.
ordering effects at every granularity: premises, exemplars, prompt components
Does transformer attention architecture inherently favor repeated content? Explores whether soft attention's tendency to over-weight repeated and prominent tokens explains sycophancy independent of training. Questions whether architectural bias precedes and enables RLHF effects.
primacy bias and positional prominence as architectural root cause

Concept map

15 direct connections · 157 in 2-hop network ·dense cluster

How much does demo position alone affect in-cont… Why do chain-of-thought examples fail across diffe… How much does the order of premises actually matte… Does transformer attention architecture inherently…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

demo position in prompt creates a spatial bias that swings ICL accuracy by up to 20 percent independent of demo content