Recommender Systems

Can user history override an LLM's politeness bias in reviews?

LLMs trained on web text tend to be systematically polite, generating positive reviews even when users are dissatisfied. Can providing a user's prior reviews and ratings as context help the model generate authentically negative reviews that match the user's actual experience?

Note · 2026-05-03 · sourced from Recommenders Personalized
How do recommendation feeds shape what people see and believe? Why do LLMs fail at understanding what remains unsaid?

LLMs trained on web text and aligned via RLHF are systematically polite. They generate "I'm not a fan of this" rather than "this is terrible". For e-commerce review generation, this is a structural problem: users are dissatisfied with many items, and their reviews should be honestly negative. A polite LLM produces inappropriately positive reviews for items the user hated.

Review-LLM attacks this with two interventions. First, the prompt input aggregates the user's behavior sequence — item titles, prior reviews, and ratings — so the LLM has access to the user's review style and habits. This addresses the corpus-level pretraining problem (LLMs don't capture individual review style) by giving the model in-context examples of how the user writes. Second, the user's rating of the target item is included in the prompt as a satisfaction indicator. A 2-star rating tells the model the review should be negative, overriding the politeness default. Finally, the model is supervised-fine-tuned with the prompt-and-target pairs to lock in the personalization.

The result is a model that generates personalized negative reviews when the user is dissatisfied — outperforming closed-source LLMs that produce polite reviews regardless of the user's actual experience. The general principle: if the LLM's default behavior is misaligned with the task, providing both the personalized context (user history) and the explicit signal (rating as satisfaction proxy) lets fine-tuning override the default. Without both, fine-tuning alone struggles against the politeness bias.


Source: Recommenders Personalized

Related concepts in this collection

Concept map
14 direct connections · 95 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

Review-LLM defeats LLM politeness in personalized review generation by aggregating user history and ratings as input