Where does LLM recommendation bias actually come from?
Do conversational AI systems inherit popularity bias from their training data or from the datasets they're deployed on? Understanding the source matters for knowing how to fix it.
When GPT-4 recommends in conversational-recommendation benchmarks, the most-frequently-recommended items are not the most-popular items in the dataset. They are the most-popular items in some external distribution — presumably the LLM's pretraining corpus.
Empirically: on ReDIAL, popular movies in ground truth like "Avengers: Infinity War" appear about 2% of the time. On Reddit-Movie, popular ground-truth movies like "Everything Everywhere All at Once" appear less than 0.3%. But GPT-4's recommendations concentrate on different items: "The Shawshank Redemption" appears around 5% on ReDIAL and 1.5% on Reddit. The same kinds of items dominate across datasets even though the datasets have different population biases.
This is a different kind of popularity bias than the one collaborative filtering produces. CF popularity bias amplifies the most-clicked items in your training data; the LLM bias imports popularity from a corpus the LLM saw before any of this data existed. It cannot be debiased by the usual dataset-level correction methods because the bias source isn't in the dataset.
The risk is bias-amplification loops: LLM CRS deployed in a recommendation product trains future user behavior on its biased outputs, which shifts the dataset toward LLM-pretraining-popular items, which next-generation LLMs ingest, which deepens the concentration. Different datasets that should produce different recommendations converge on the same set of "canonical popular items" inherited from the web's general distribution.
The implication for production systems: pretraining-corpus popularity is a domain-shift effect that LLM-as-CRS inherits by construction. Mitigating it requires either dataset-aware fine-tuning or post-hoc re-ranking by dataset-specific popularity priors — and probably both.
Source: Recommenders Conversational
Related concepts in this collection
-
Where do recommendation biases come from in language models?
Do LLM-based recommenders inherit systematic biases from pretraining that differ fundamentally from traditional collaborative filtering systems? Understanding these sources matters for building fairer, more accurate recommendations.
extends: this is the empirical instance of the three-bias taxonomy — popularity-bias from pretraining at 5% vs 2% measurable rate
-
Do LLMs in conversational recommendation systems use collaborative or content knowledge?
Conversational recommenders powered by LLMs might rely on either collaborative signals (user interaction patterns) or content/context knowledge (semantic understanding). Understanding which signal dominates would reveal how to design and deploy these systems effectively.
complements: content-not-CF reliance is the mechanism by which pretraining popularity leaks in — LLMs use what they know, which is corpus-popular items
-
Does embedding dimensionality secretly drive popularity bias in recommenders?
Conventional wisdom treats low-dimensional models as overfitting protection. But does this practice inadvertently cause recommenders to systematically favor popular items, reducing diversity and fairness regardless of the optimization metric used?
complements: classical popularity overfitting and LLM-pretraining popularity-leak are parallel mechanisms — both undermine the diversity assumption
-
Why do language models ignore temporal order in ranking?
When LLMs rank items based on interaction history, do they actually use sequence order or treat it as a set? Understanding this gap matters for building effective LLM-based recommenders.
complements: zero-shot LLM ranking inherits both popularity-bias and order-blindness — both are pretraining-distribution artifacts
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM CRS recommendations exhibit popularity bias inherited from pretraining corpus not from target dataset