Language Understanding and Pragmatics Conversational AI Systems Recommender Systems

Why do LLMs generate polite reviews even when users hated products?

Large language models trained with RLHF develop a politeness bias that overrides negative sentiment in review generation. Understanding this bias and how to counteract it is crucial for creating accurate, user-aligned review systems.

Note · 2026-05-03 · sourced from Recommenders Personalized
How do recommendation feeds shape what people see and believe? Why do LLMs fail at understanding what remains unsaid?

Large language models trained with RLHF or instruction tuning develop a documented "polite" tendency — they soften criticism, cushion negative judgments, and avoid blunt statements. This is generally desirable in conversation but disastrous in a personalized review-generation task where users are dissatisfied with many items and the corresponding generated review needs to reflect that dissatisfaction. A polite-by-default LLM produces positive reviews for things the user hated, which is both inaccurate and unhelpful for explanation purposes.

Review-LLM diagnoses two problems. First, the LLM doesn't know the user's review style — pretrained at corpus level, it generates generic reviews rather than reviews that match the user's voice. Second, even given the right style, the politeness bias prevents the model from producing negative reviews even when negative is correct.

The solution combines three input components. The prompt aggregates the user's behavioral history — item titles, the reviews the user wrote for each, and the ratings given. This teaches the model the user's review style from semantically rich text. The prompt also includes the rating for the target item, which functions as a satisfaction indicator: rating 5 → positive review, rating 1 → negative review. The model has explicit signal about the sentiment direction. Then the model is supervised fine-tuned on the user's actual reviews to internalize the style and override politeness defaults.

The general lesson: LLM behavioral defaults installed during alignment training are sticky. They survive prompt engineering and require fine-tuning plus structured prompt context to override. For tasks where the alignment-trained behavior is the wrong default (review generation, candid feedback, debate, criticism), the system architecture must explicitly counter the bias rather than hoping prompt phrasing alone redirects it.


Source: Recommenders Personalized

Related concepts in this collection

Concept map
16 direct connections · 148 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

LLM review generation defaults to politeness — overriding it requires user behavior aggregation and rating signals in the prompt