Do comparisons help users evaluate items better than isolated descriptions?
Can framing product evaluations relationally—by comparing to other items—ground assessment in user reasoning better than absolute descriptions? This matters because recommendation explanations often ask users to do comparison work mentally.
Standard recommendation explanations evaluate items in isolation: "this piano sounds natural." A user has to do the comparison work in their head, judging this evaluation against their experience with other pianos. Comparative recommendations ground the evaluation by referencing another item: "This piano sounds more natural than my Sony NWZ-A855." The relational frame embeds the comparison the user would otherwise construct.
Comparing Apples to Apples generates these comparative sentences from user reviews. A BERT classifier, fine-tuned on manually labeled examples, identifies comparative sentences in product reviews. From a corpus of 258,816 comparative sentences and associated reviews, the system extracts aspects (sound quality, price-to-value, longevity) and their associated sentiments per item. These aspects feed into abstractive generation: the system generates new comparative sentences highlighting features relevant to a particular user, using product and user information as conditioning.
Two aspects are personalizable: which features matter to the user (extracted from their review history), and which positive or negative aspects to emphasize. A user who has historically focused on price will get price comparisons; one who has focused on sound quality will get sound comparisons. Human evaluation on Comparativeness, Relevance, and Fidelity confirms the generated sentences are both true to the source material and useful for purchase decisions.
The general principle: when evaluation is the goal, relational explanations carry more information than absolute ones because relational framing matches how humans evaluate. A recommendation system producing relational descriptions is closer to user reasoning than one that lists attributes per item.
Source: Recommenders LLMs
Related concepts in this collection
-
Can retrieval enhancement fix explainable recommendations for sparse users?
When users have few historical interactions, embedded recommendation models struggle to generate personalized explanations. Can augmenting sparse histories with retrieved relevant reviews—selected by aspect—overcome this fundamental data limitation?
extends: aspect-aware generation is the same architectural move — aspects are the bridge between sparse user signal and informative output
-
Can review sentiment alignment fix sparse CRS dialogue?
Conversational recommender systems struggle with brief dialogues that lack item-specific detail. Can retrieving reviews that match user sentiment polarity enrich both dialogue context and response generation?
complements: both leverage review corpora to supplement sparse direct signal — comparative for evaluation depth, sentiment-coordinated for justification depth
-
Why do LLMs generate polite reviews even when users hated products?
Large language models trained with RLHF develop a politeness bias that overrides negative sentiment in review generation. Understanding this bias and how to counteract it is crucial for creating accurate, user-aligned review systems.
complements: aspect-controlled comparative generation is one way to constrain LLM review output beyond default politeness
-
Can modeling multiple user personas improve recommendation accuracy?
Single-vector user representations compress all tastes into one place, potentially crowding out minority interests. Can representing users as multiple weighted personas adapt better to what's being scored and produce more accurate predictions?
complements: relational explanations and persona-mixture both ground recommendation in user-specific frame — comparison-relational vs persona-relational
-
Why do online reviewers publish negative ratings despite positive experiences?
When people post reviews publicly, do they adjust their honest opinions to seem more discerning? Schlosser's experiments test whether audience awareness shifts how people rate products compared to private ratings.
tension with: comparative-aspect generation pulls from a corpus that is itself biased — the source review pool is not a neutral substrate
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
comparative recommendations ground item evaluation by referencing other items — abstractive aspect-controlled generation from review-extracted aspects