Why do users naturally express recommendations critiques instead of positive preferences?
This explores why people tend to react to recommendations by saying what's wrong ("that doesn't work for me") rather than stating clean positive preferences — and what the corpus says about working with that grain instead of against it.
This explores why people naturally critique recommendations instead of articulating positive preferences, and what systems do about it. The corpus doesn't offer a single tidy psychological theory, but read laterally it suggests something useful: critique is the *natural* unit of feedback because it's grounded in a concrete item in front of you. It's far easier to look at a suggestion and say "this doesn't look good for a date" than to introspect and produce "I prefer more romantic options." The first is a reaction; the second is an abstraction. The most direct treatment of this is Can language models bridge the gap between critique and preference?, which takes the negative reaction as given and uses few-shot LLM prompting to *translate* it into the positive preference a retrieval system can actually act on — bridging the gap rather than asking users to close it themselves.
There's a deeper reason positive self-report is unreliable, which makes critique not just easier but arguably more honest. Why do the same users rate items differently each time? shows that when people *do* try to state preference directly — via star ratings — the same user rates the same item differently across sessions, swinging by multiple stars, because ratings reflect mood, anchoring, and personal rating style as much as taste. So the "positive preference" we wish users would volunteer may be partly fiction. A pointed critique of a specific recommendation carries less of that noise: it's anchored to something real.
The corpus also surfaces a striking asymmetry on the *machine* side that mirrors the human one. Why do LLMs generate polite reviews even when users hated products? and Can user history override an LLM's politeness bias in reviews? show that RLHF-trained models have the *opposite* bias — they default to polite positivity and have to be actively fine-tuned, with rating signals and user history, to express negativity at all. So humans lean toward critique while aligned models lean toward praise. That tension is worth sitting with: the feedback channel users find natural is exactly the one models are trained to suppress.
Finally, the corpus hints that critique isn't a deficiency to be corrected but a richer signal to be cultivated. Do recommendation strategies beyond preference questions work better? found that successful human recommenders don't interrogate people for preferences at all — they share opinions, experiences, and similarity signals, and good recommendation emerges conversationally. And Can review sentiment alignment fix sparse CRS dialogue? shows systems do better when they match the *polarity* of what a user expresses rather than ignoring it. The throughline: stop treating critique as a failed attempt at preference-stating, and start treating it as the native language of taste — one that, with the right translation layer, points more accurately at what someone wants than a forced positive ever could.
Sources 6 notes
Few-shot LLM prompting can convert natural negative feedback like "doesn't look good for a date" into positive preferences like "prefer more romantic," enabling retrieval systems to find better-matching recommendations without fine-tuning.
Amatriain et al. found that the same user gives substantially different ratings to the same item across sessions, shifting by multiple stars. This noise stems from temporal inconsistency, rater-specific biases, and anchoring effects—making ratings reflect both preference and rating-behavior rather than stable preference alone.
Off-the-shelf LLMs generate inappropriately positive reviews due to alignment-training politeness bias. Combining user review history, rating signals as satisfaction indicators, and supervised fine-tuning successfully redirects the model to generate negative reviews when warranted.
Review-LLM defeats the politeness bias inherent in RLHF-trained models by aggregating user behavior sequences (prior reviews, item ratings) in the prompt and fine-tuning on these contextualized examples. This dual intervention—personalized context plus explicit satisfaction signals—allows the model to generate authentically negative reviews matching user dissatisfaction.
Analysis of 1,001 human recommendation dialogues shows successful recommendations correlate with personal opinion sharing, encouragement, similarity signals, and credibility appeals—not just preference questions. Opinion and experience sharing appear in 30% and 27% of recommendation sentences respectively.
RevCore demonstrates that retrieving user reviews with polarity matching the user's stance—then integrating them into dialogue history and generation—produces more informative and aligned recommendations. Sentiment-coordinated filtering prevents contradictory context that random review retrieval would introduce.
Next inquiring lines
- What constrains LLM generation beyond default politeness in review contexts?
- Does transforming critiques into preferences change how conversational recommenders should decide when to ask versus recommend?
- What other conversation structures besides mention order carry predictive information for recommendation?