What anchoring effects shape how users rate items in sequence?

This reads the question as being about sequence-dependent rating bias — the way an item's score gets pulled by what came just before it, or by prior expectations the reader brings in — rather than about ranking algorithms per se.

This explores anchoring in the human sense: when people rate items one after another, does the order itself bend the numbers? The corpus doesn't have a single paper that runs the classic anchoring experiment, but it circles the same territory from several angles worth stitching together. The sharpest entry point is the finding that not every rating measures a stable inner preference at all. One line of work shows that annotation responses actually decompose into three different things — genuine preferences, non-attitudes, and preferences *constructed on the spot* — and you can tell them apart by whether they stay consistent when the measurement conditions change Do all annotation responses measure the same underlying thing?. That constructed-in-the-moment category is exactly where anchoring lives: if a rating is built fresh each time, the surrounding context (including the previous item) is part of what builds it.

The second thread is about priors. Whether connected products converge to similar ratings or diverge depends on the *type* of recommendation link between them — "frequently bought together" versus "co-viewed" networks pull ratings in different directions, because each surfaces products to a different audience carrying different expectations Do different recommender types shape opinion convergence differently?. So the anchor isn't only the last thing you saw; it's the expectation the system primed you with before you even arrived. Scaled up, feeds become persuasion infrastructure where these priming and contamination effects compound across a whole population How do recommendation feeds shape what people see and believe?.

There's also a clean example of a non-content cue hijacking judgment: people rate AI responses higher when there are simply *more* citations, even when those citations are irrelevant — citation count works as a decoupled trust heuristic, almost as strong when the sources are useless as when they're real Do users trust citations more when there are simply more of them?. That's anchoring by a surface feature rather than by position, and it's a reminder that ratings latch onto whatever salient signal is cheapest to read.

On the machine side, sequence order turns out to be a latent variable that's easy to ignore and easy to recover. Language models doing ranking disregard the temporal order of a user's history by default, but recency-focused prompts switch the sensitivity back on Why do language models ignore temporal order in ranking? — and recency-weighted recall beats similarity-weighted recall when summarizing what a user actually prefers Does abstract preference knowledge outperform specific interaction recall?. Recency is the algorithmic cousin of a recency anchor: the most recent item gets disproportionate weight unless you deliberately correct for it. Even in conversational recommendation, the *order* items get mentioned carries dependency information that bag-of-mentions models throw away Does conversation order matter for recommending items in dialogue?.

The honest synthesis: the collection has strong material on the *ingredients* of anchoring — constructed preferences, priming by prior expectation, surface-cue heuristics, recency weighting, and order-dependence — but no study that isolates a numeric anchoring effect in human sequential ratings directly. If that exact effect is what you're chasing, the constructed-preference and recommender-convergence pieces are the closest doorways, and they suggest the more useful question isn't "is there an anchor" but "which signal is the anchor borrowing its weight from."

Sources 7 notes

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Do different recommender types shape opinion convergence differently?

Research shows that frequently-bought-together and co-viewed recommendation networks produce different opinion convergence patterns. The mechanism: each recommender type attracts different audience segments with different prior expectations, shaping both who sees products together and how they rate them.

How do recommendation feeds shape what people see and believe?

Research shows recommendation systems operate as political actors: feed weights influence producer behavior, network topology drives opinion convergence, and automation enables targeted persuasion at population scale. These effects compound through rating contamination and selection biases.

Do users trust citations more when there are simply more of them?

Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.

Why do language models ignore temporal order in ranking?

LLMs can extract preferences from interaction histories but disregard temporal order by default. Recency-focused prompts and in-context examples activate latent order-sensitivity, improving ranking without retraining.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Does conversation order matter for recommending items in dialogue?

TSCR models items and entities in the order they appear in CRS dialogue, using transformers to learn dependencies between sequential mentions. This recovers information that bag-of-mentions approaches discard, improving recommendation accuracy on standard benchmarks.

What anchoring effects shape how users rate items in sequence?

Sources 7 notes

Next inquiring lines