Recommender Systems

Why do the same users rate items differently each time?

User ratings are assumed to be clean preference signals, but do they actually fluctuate unpredictably? This matters because recommender systems rely on ratings as ground truth, yet temporal inconsistency and individual rating styles may contaminate that signal.

Note · 2026-05-03 · sourced from Recommenders General
What breaks when specialized AI models reach real users? How do people build trust with conversational AI?

The conventional reason recommender systems prefer explicit ratings (star ratings, thumbs up/down) over implicit feedback (clicks, watch time) is that explicit ratings are clean preference data. The user is directly stating "I like this." Amatriain, Pujol, and Oliver's experimental study evaluates this assumption and finds it doesn't hold.

The study has users rate the same items multiple times across spaced sessions. The same user gives substantially different ratings to the same item depending on when they rate. The variation is not just at the noise margin — users sometimes shift by multiple stars on the same item across sessions. The number of stars on a 5-star scale is not a stable property of the user's preference; it depends on mood, context, recently consumed alternatives, and rating-style at the moment.

The noise comes from multiple sources. Temporal inconsistency: the user's true preference may have shifted, but more often the rating itself fluctuates around a stable preference. Rater-specific style: some users use the full scale, some use only the top half, and these styles drift. Anchoring effects: a rating depends on what other items the user has recently rated.

The implication for recommender systems: rating data is preference data plus rating-noise plus rater-style, and conflating them produces biased models. Treating "5 stars" as a categorical labeling of "liked" understates the noise; treating the difference between 4 and 5 stars as meaningful overstates user precision. The paper undermines the cleanliness assumption that justified the field's preference for explicit ratings, which combined with the implicit-feedback availability and self-selection issues elsewhere in the literature, suggests the choice between explicit and implicit signals is more nuanced than the methodological canon admits.


Source: Recommenders General

Related concepts in this collection

Concept map
13 direct connections · 89 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

explicit user ratings are noisy — temporal inconsistency and rater idiosyncrasy contaminate the supposed ground truth