LLM Reasoning and Architecture Design & LLM Interaction Language Understanding and Pragmatics

How much does demo position alone affect in-context learning accuracy?

Moving demonstrations from prompt start to end without changing their content produces surprisingly large accuracy swings. Does spatial position in the prompt matter more than what demonstrations actually contain?

Note · 2026-02-23 · sourced from Context Engineering
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

In-context learning performance is sensitive to which demos are selected and in what order they appear — that much was known. What the DPP bias paper reveals is a different and larger effect: where the entire demo block sits relative to other prompt components (system prompt, user message) matters more than the content of the demos themselves.

Moving an unchanged block of demonstrations from the start of a prompt to the end can swing task accuracy by up to 20% and flip almost half of the model's predictions. This is a purely spatial effect. The demos are identical — their position relative to instructions and queries is the only variable. The effect spans classification, QA, summarization, and reasoning tasks, measured via two metrics: ACCURACY-CHANGE (net accuracy shift) and PREDICTION-CHANGE (output volatility from repositioning).

The mechanistic hypothesis draws on three architectural tendencies: primacy bias (transformers disproportionately emphasize early tokens due to induction head mechanisms), sequential processing bias (earlier context steers subsequent predictions more strongly), and lost-in-the-middle (tokens in middle positions receive less attention). These are known individually, but the DPP paper provides the first role-aware stress test — examining how these biases interact with prompt roles (system vs. user).

This extends the ordering sensitivity documented in Why do chain-of-thought examples fail across different conditions? to a larger spatial scale. CoT exemplar brittleness finds 3.3% accuracy swings from shuffling exemplars among themselves. DPP bias finds 20% swings from repositioning the entire block — roughly 6x the magnitude, operating at prompt-architecture level rather than within-exemplar level. Both share a root cause: LLMs process prompts as sequential narratives, not as unordered information sets.

The connection to How much does the order of premises actually matter for reasoning? reinforces the pattern: ordering effects appear at every spatial granularity — within premises (30%), within exemplar sets (3.3%), and across prompt components (20%). The shared mechanism is that Does transformer attention architecture inherently favor repeated content? — positional prominence in the attention window is an architectural property, not a training artifact.

The practical implication for prompt engineering is that demo placement is not a formatting choice but a performance-critical decision. For production systems using ICL, the position of demonstrations relative to instructions should be treated as a hyperparameter — and one that may need task-specific tuning.


Source: Context Engineering

Related concepts in this collection

Concept map
15 direct connections · 157 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

demo position in prompt creates a spatial bias that swings ICL accuracy by up to 20 percent independent of demo content