What feedback loops form between recommender choice and review data?
This explores the closed loop where a recommender's own choices about what to show end up shaping the ratings and reviews it later trains on — and how that contaminated data then steers the next round of recommendations.
This explores the closed loop where a recommender's own choices about what to show end up shaping the ratings and reviews it later trains on — and how that contaminated data then steers the next round of recommendations. The corpus treats this not as one loop but several stacked on top of each other, operating at different layers of the system.
The cleanest statement of the mechanism is the selection-bias loop. When a ranking system decides what to surface, users can only rate what they were shown — so the training data is a record of the model's past decisions, not of true preference. Left unmodeled, the system converges on "degenerate equilibria that amplify their own past decisions," which is why YouTube's ranker bolts on a separate position tower specifically to subtract that bias before learning Why do ranking systems need to model selection bias explicitly?. A subtler version hides in the embeddings themselves: when dimensionality is too small, the model overfits toward already-popular items to maximize ranking quality, niche items get starved of exposure, and the imbalance compounds over time — a feedback loop disguised as a hyperparameter Does embedding dimensionality secretly drive popularity bias in recommenders?.
The second loop runs through the review data on the human side, and it turns out ratings were never the clean independent signal they look like. Moe and Trusov decomposed online ratings into baseline quality, social-dynamics influence, and noise — and found prior ratings measurably push subsequent ones, with effects that compound through future ratings Do online ratings actually reflect independent customer opinions?. Even at the level of a single person, the same user rates the same item differently across sessions, so the data encodes rating *behavior* as much as preference Why do the same users rate items differently each time?. Implicit signals like clicks and watches don't escape this either — they carry preference and confidence as two tangled magnitudes that explicit stars collapse into one Can implicit feedback reveal both preference and confidence?.
Where it gets genuinely interesting is that the *type* of recommender determines which loop you get. "Frequently-bought-together" and "co-viewed" links don't just connect different products — they pull in different audiences with different prior expectations, so connected items' ratings either converge or diverge depending on the network structure the recommender imposes Do different recommender types shape opinion convergence differently?. The recommender isn't a neutral observer of review data; its architecture is upstream of what the reviews end up saying. Seen at scale, this is why one synthesis frames feeds as persuasion infrastructure: feed weights shape producer behavior, topology drives opinion convergence, and the whole thing compounds through rating contamination and selection bias How do recommendation feeds shape what people see and believe?.
The thing you might not have known you wanted to know: the corpus also shows the loop can be run *deliberately and benignly*. Instead of letting review data passively contaminate the model, newer systems treat reviews as a retrieval source they actively pull from — fetching reviews whose sentiment matches the user's stance to enrich sparse dialogue Can review sentiment alignment fix sparse CRS dialogue?, or retrieving aspect-relevant reviews to explain recommendations for users with thin histories Can retrieval enhancement fix explainable recommendations for sparse users?. The same review-to-recommender coupling that creates runaway bias when ignored becomes a controllable signal when the system chooses what to draw on rather than just absorbing whatever its past choices produced.
Sources 9 notes
YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.
Research shows that when user/item embedding dimensions are too small, recommender systems overfit toward popular items to maximize ranking quality. This compounds over time as niche items receive insufficient exposure, and cannot be fixed post-hoc without treating dimensionality as a fairness hyperparameter.
Moe and Trusov decomposed ratings into baseline quality, social-dynamics influence, and error, finding that prior ratings meaningfully affect subsequent ones. These effects have both immediate sales impact and long-term compounding effects through future ratings, though high opinion variance can eventually dampen the distortion.
Amatriain et al. found that the same user gives substantially different ratings to the same item across sessions, shifting by multiple stars. This noise stems from temporal inconsistency, rater-specific biases, and anchoring effects—making ratings reflect both preference and rating-behavior rather than stable preference alone.
Hu, Koren, and Volinsky show that implicit signals (watches, purchases, clicks) encode preference and confidence as two distinct dimensions. Explicit ratings collapse these into one number, losing information about certainty in the preference estimate.
Research shows that frequently-bought-together and co-viewed recommendation networks produce different opinion convergence patterns. The mechanism: each recommender type attracts different audience segments with different prior expectations, shaping both who sees products together and how they rate them.
Research shows recommendation systems operate as political actors: feed weights influence producer behavior, network topology drives opinion convergence, and automation enables targeted persuasion at population scale. These effects compound through rating contamination and selection biases.
RevCore demonstrates that retrieving user reviews with polarity matching the user's stance—then integrating them into dialogue history and generation—produces more informative and aligned recommendations. Sentiment-coordinated filtering prevents contradictory context that random review retrieval would introduce.
ERRA combines model-agnostic review retrieval with personalized aspect selection to address data sparsity that embedded methods cannot solve. Retrieval augmentation provides richer signal when user history is sparse, while aspect personalization ensures explanations match user context rather than generic defaults.