Does user satisfaction actually measure cognitive understanding?
Users may report satisfaction while remaining internally confused about their needs. This explores whether traditional satisfaction metrics capture genuine clarity or merely social politeness.
Traditional dialogue evaluation metrics rely on observable user feedback — satisfaction ratings, explicit responses, task completion signals. STORM reveals that these metrics systematically miss a critical dimension: users' internal cognitive state.
The core finding: users may express satisfaction with system responses while their inner thoughts indicate continued confusion about their own needs. This is not user deception — it reflects the gap between social politeness ("that was helpful, thanks") and actual cognitive state ("I still don't know what I really want"). When users are in an anomalous state of knowledge, this divergence is especially pronounced: they cannot assess what they're missing, so partial answers feel adequate even when they leave core confusion unresolved.
The practical consequence: successful clarification correlates more strongly with users' internal cognitive improvement than with expressed satisfaction scores. Users who achieve better self-understanding through interaction — measured by clearer, more confident inner thoughts — demonstrate sustained engagement and more effective task completion, even when immediate satisfaction scores remain moderate.
STORM reveals a striking architectural divergence between models: Claude appears optimized for immediate satisfaction even at the cost of clarification opportunities, while Llama's architecture emphasizes identifying and addressing ambiguity, sometimes trading immediate satisfaction for more effective intent disambiguation. This is not a quality difference — it is a design choice with different downstream consequences.
The connection to alignment training is direct. Since Does preference optimization harm conversational understanding?, RLHF optimizes for expressed satisfaction (what raters can observe). If expressed satisfaction and internal clarity diverge, then optimizing for expressed satisfaction may actively prevent the clarification work that produces genuine understanding. The alignment tax is not just about losing grounding acts — it is about optimizing for the wrong signal entirely.
Alignment is structurally an anti-exploration regime, not just a satisfaction/accuracy trade-off. The standard framing treats RLHF as a trade between factuality and user-preference fit. But the divergence STORM documents points to a sharper claim: RLHF optimizes for responses that satisfy the user, and that optimization actively suppresses exploration of logically, causally, or rhetorically related counterclaims during generation. The training signal rewards tokens that close the turn satisfyingly, not tokens that open the problem further. The consequence is not only reduced factual precision but reduced rhetorical turbulence — the tangents, objections, qualifications, and hypothetical counterpositions that make genuine argumentation possible are trained against because they do not satisfy. Alignment, framed this way, is less a calibration of truth against preference than a selection for conversational closure, with exploration as the collateral casualty.
This suggests evaluation reform: satisfaction metrics should be complemented by clarification effectiveness measures and composite scores (STORM's SSA — Satisfaction-Seeking Actions) that balance competing objectives of response confidence and appropriate clarification seeking.
Related concepts in this collection
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
STORM provides empirical evidence that the satisfaction signal RLHF optimizes for may diverge from actual user understanding
-
How do users actually form intent when prompting AI systems?
Users face a 'gulf of envisioning'—they must simultaneously imagine possibilities and express them to language models. This cognitive gap creates breakdowns not from AI incapability but from users struggling to articulate what they truly need.
the evaluation gap this note describes is a consequence of treating intent as binary
-
Do users worldwide trust confident AI outputs even when wrong?
Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
parallel: users trust expressed signals (confidence, satisfaction) over actual quality
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK provides the cognitive mechanism for why satisfaction masks confusion: users in anomalous knowledge states cannot assess their own understanding gaps, so they express satisfaction with partial answers that don't resolve actual confusion
-
Do persona consistency metrics actually measure dialogue quality?
Personalized dialogue systems can achieve high persona consistency scores by simply restating character descriptions, ignoring conversational relevance. Does optimizing for persona fidelity necessarily harm the coherence readers actually care about?
parallel measurement trap: persona consistency scores reward description-copying (surface success masking coherence failure) just as satisfaction scores mask cognitive confusion; generalizes to any evaluation signal based on observable outputs
-
Can models learn to abstain when uncertain about predictions?
Explores whether language models can be trained to recognize when they lack sufficient information to forecast conversation outcomes, rather than forcing uncertain predictions into confident-sounding responses.
if satisfaction signals are unreliable, conversation quality forecasting must incorporate cognitive-state proxies beyond expressed satisfaction
-
Do therapists accurately perceive the working alliance with patients?
This research explores whether therapists' own assessments of the therapeutic relationship match what patients actually experience, especially in high-risk cases like suicidality.
parallel calibration failure in a clinical domain: therapists overestimate bond and task alliance the same way expressed satisfaction masks cognitive confusion; the COMPASS computational inference provides the clinical analogue of STORM's Clarify metric — an independent measure that bypasses the surface signal to detect the actual relationship state
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
expressed user satisfaction diverges from internal cognitive clarity — successful clarification correlates more with internal improvement than external satisfaction scores