How should systems handle contradictory opinions in user reviews?
When customers disagree about a product or service, should dialogue systems present all perspectives or select one? Understanding how to aggregate and balance diverse opinions affects whether users trust the response.
Most task-oriented dialogue research focuses on factual knowledge — FAQs, product specifications, service guides. But in many TOD tasks, users care about subjective insights: the experiences, opinions, and preferences of other customers. "Is the WIFI reliable?" or "Does the restaurant have a good atmosphere?" require subjective knowledge that factual databases cannot provide.
SK-TOD (Subjective-Knowledge-based Task-Oriented Dialogue) formalizes this gap. The key challenge: even for the same aspect of a product or service, customers may have different opinions. A hotel's WIFI might have 70% positive and 30% negative reviews. The system's response should include BOTH perspectives along with their proportions — two-sided responses have been recognized as more credible and valuable for customers.
This introduces three new challenges beyond standard TOD:
- Knowledge source shift — from structured databases to unstructured user reviews
- Opinion aggregation — synthesizing diverse, sometimes contradictory viewpoints
- Balanced presentation — representing both sides proportionally rather than cherry-picking
Current TOD approaches trained on factual knowledge fail at this because they are designed to retrieve single correct answers, not to aggregate and balance multiple perspectives.
Multi-source enrichment as partial fix: M-OS (Multi-Source Opinion Summarization) demonstrates that enriching review-based opinion summaries with technical specifications and product descriptions produces 87% user preference over standard opinion-only summaries. The mechanism: factual enrichment enables precise product comparisons that review-only approaches lack, addressing decision fatigue and information overload. M-OS evaluates across 7 dimensions (fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, specificity) and achieves ρ=0.74 Spearman correlation with human judgment. The implication for SK-TOD: combining subjective review aggregation WITH factual specifications creates more useful and complete responses than either alone.
This connects to a broader theme in the vault. Since Can LLMs generate more novel ideas than human experts?, LLMs have difficulty with evaluative tasks in general. Aggregating subjective reviews requires exactly this evaluative stance — weighing perspectives, judging representativeness, and presenting a balanced view rather than a confident single answer.
Related concepts in this collection
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
subjective knowledge aggregation requires evaluative stance
-
Why do readers interpret the same sentence so differently?
How much of annotation disagreement in NLP reflects genuine interpretive multiplicity rather than error? This explores whether social position and moral framing systematically generate competing but equally valid readings.
subjective reviews embody irreducibly multiple interpretations of the same experience
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
task-oriented systems that incorporate subjective knowledge from user reviews need to aggregate diverse opinions including positive and negative perspectives for credibility