INQUIRING LINE

Why do review corpora contain biases that affect generated comparisons?

This reads the question as: when LLMs generate reviews, evaluations, or comparisons, why do the corpora they learned from carry biases that bend those outputs — and where do those biases actually originate?


This explores where the biases in LLM-generated reviews and comparisons come from, and the corpus points to a surprising answer: most of them aren't in the review data at all — they're baked in upstream, during pretraining. A causal experiment varying random seeds and cross-tuning found that models sharing a pretrained backbone show the same cognitive bias patterns no matter what finetuning data you feed them; finetuning only nudges biases that pretraining already planted Where do cognitive biases in language models come from?. So when a generated comparison leans a certain way, the corpus to blame is often the giant unlabeled pretraining mix, not the curated review set.

That origin story repeats across domains. LLM-based recommenders inherit three distinct biases — position, popularity, and fairness — straight from the pretraining objective and the demographics of the training corpus, not from user interaction logs, which is why you can't fix them with classic collaborative-filtering tricks Where do recommendation biases come from in language models?. Even causal reasoning errors that look like flaws turn out to mirror human mistakes exactly, because both humans and models absorbed the same statistical regularities from text Do large language models make the same causal reasoning mistakes as humans?. The pattern: comparisons skew because the training distribution skewed first.

Alignment training adds its own thumb on the scale. Off-the-shelf models generate inappropriately positive reviews even for products a user hated, because politeness was trained in — overriding it takes user history, rating signals, and supervised finetuning Why do LLMs generate polite reviews even when users hated products?. There's a structural reason this happens: token generation is a smooth probabilistic flow toward the training distribution, not an exploration of competing positions, so a model produces agreeable, on-distribution claims rather than weighing rival views Does LLM generation explore competing claims while producing text?.

The most interesting twist is what happens when the model becomes the judge of the comparison. LLM judges fall for cheap surface signals — fake authority references and rich formatting — through biases that are entirely semantics-agnostic and exploitable with zero-shot attacks Can LLM judges be fooled by fake credentials and formatting?. Humans do the same thing: across 24,000 search interactions, people trusted answers with more citations even when those citations were irrelevant, treating citation count as a decoupled trust heuristic Do users trust citations more when there are simply more of them?. And models systematically over-trust their own outputs, because a high-probability answer simply feels more correct during self-evaluation Why do models trust their own generated answers?. So a generated comparison can be biased at three layers at once: the corpus it learned from, the way it generates, and the way it (or you) judges the result.

The thing you didn't know you wanted to know: bias in generated comparisons isn't usually a data-cleaning problem in your review set. It's a feedback loop — selection bias in what gets logged trains models that amplify their own past decisions unless you explicitly model it, as YouTube's ranker does with a dedicated position tower Why do ranking systems need to model selection bias explicitly?. Fixing the visible corpus barely moves a bias that was installed before that corpus ever existed.


Sources 9 notes

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Where do recommendation biases come from in language models?

Wu et al. show that LLM-based recommendation systems exhibit position bias, popularity bias, and fairness bias—unique failure modes stemming from the language model's pretraining objective and corpus demographics rather than interaction data. Mitigation requires LLM-specific approaches, not adapted collaborative filtering techniques.

Do large language models make the same causal reasoning mistakes as humans?

LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.

Why do LLMs generate polite reviews even when users hated products?

Off-the-shelf LLMs generate inappropriately positive reviews due to alignment-training politeness bias. Combining user review history, rating signals as satisfaction indicators, and supervised fine-tuning successfully redirects the model to generate negative reviews when warranted.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Do users trust citations more when there are simply more of them?

Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.

Why do models trust their own generated answers?

LLMs exhibit structural bias toward validating their own outputs because high-probability generated answers feel more correct during evaluation. Comparing answers against broader alternatives breaks this self-agreement loop.

Why do ranking systems need to model selection bias explicitly?

YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.

Next inquiring lines