Recommender Systems Language Understanding and Pragmatics

Do users trust citations more when there are simply more of them?

Explores whether citation quantity alone influences user trust in search-augmented LLM responses, independent of whether those citations actually support the claims being made.

Note · 2026-02-22 · sourced from Reasoning o1 o3 Search
What kind of thing is an LLM really? RAG

Search Arena provides the largest analysis of user preferences for search-augmented LLMs: over 24,000 paired multi-turn interactions with ~12,000 human preference votes. The finding that matters most: users prefer responses with more cited sources, and this preference extends to irrelevant citations.

The effect sizes are nearly identical. Correctly attributed citations have a positive coefficient of β=0.285 on user preference. Irrelevant citations — citations that do not support the associated claims — have a positive coefficient of β=0.273. Users are influenced by the presence of citations roughly equally regardless of whether those citations actually back up the text.

This means citation count functions as a surface trust heuristic, decoupled from citation quality. Users see citations and infer credibility without verifying the cited content supports the claim. The gap between perceived and actual credibility is systematic, not incidental.

Additional preference signals: users prefer community-driven platforms (tech blogs, social networks) over encyclopedic sources like Wikipedia. Reasoning-enhanced responses are preferred. Longer responses are preferred. Web search does not degrade and may improve performance in non-search settings — but search settings are significantly affected when relying solely on parametric knowledge.

This connects to Do users worldwide trust confident AI outputs even when wrong?. In that finding, confidence signals override accuracy assessment. Here, citation signals override quality assessment. Both are instances of the same pattern: users use surface proxies for quality because evaluating actual quality is cognitively expensive.

The implication for RAG system design is direct: optimizing for user satisfaction and optimizing for answer quality are not the same optimization target. A system can score highly on user preference by adding more citations — even irrelevant ones — without improving answer quality. This is a form of metric gaming at the human-evaluation level.


Source: Reasoning o1 o3 Search

Related concepts in this collection

Concept map
16 direct connections · 168 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

users prefer responses with more citations even when citations are irrelevant — citation count is a decoupled trust heuristic