Psychology and Social Cognition Language Understanding and Pragmatics Conversational AI Systems

Does user satisfaction actually measure cognitive understanding?

Users may report satisfaction while remaining internally confused about their needs. This explores whether traditional satisfaction metrics capture genuine clarity or merely social politeness.

Note · 2026-02-22 · sourced from Conversation Architecture Structure

Traditional dialogue evaluation metrics rely on observable user feedback — satisfaction ratings, explicit responses, task completion signals. STORM reveals that these metrics systematically miss a critical dimension: users' internal cognitive state.

The core finding: users may express satisfaction with system responses while their inner thoughts indicate continued confusion about their own needs. This is not user deception — it reflects the gap between social politeness ("that was helpful, thanks") and actual cognitive state ("I still don't know what I really want"). When users are in an anomalous state of knowledge, this divergence is especially pronounced: they cannot assess what they're missing, so partial answers feel adequate even when they leave core confusion unresolved.

The practical consequence: successful clarification correlates more strongly with users' internal cognitive improvement than with expressed satisfaction scores. Users who achieve better self-understanding through interaction — measured by clearer, more confident inner thoughts — demonstrate sustained engagement and more effective task completion, even when immediate satisfaction scores remain moderate.

STORM reveals a striking architectural divergence between models: Claude appears optimized for immediate satisfaction even at the cost of clarification opportunities, while Llama's architecture emphasizes identifying and addressing ambiguity, sometimes trading immediate satisfaction for more effective intent disambiguation. This is not a quality difference — it is a design choice with different downstream consequences.

The connection to alignment training is direct. Since Does preference optimization harm conversational understanding?, RLHF optimizes for expressed satisfaction (what raters can observe). If expressed satisfaction and internal clarity diverge, then optimizing for expressed satisfaction may actively prevent the clarification work that produces genuine understanding. The alignment tax is not just about losing grounding acts — it is about optimizing for the wrong signal entirely.

Alignment is structurally an anti-exploration regime, not just a satisfaction/accuracy trade-off. The standard framing treats RLHF as a trade between factuality and user-preference fit. But the divergence STORM documents points to a sharper claim: RLHF optimizes for responses that satisfy the user, and that optimization actively suppresses exploration of logically, causally, or rhetorically related counterclaims during generation. The training signal rewards tokens that close the turn satisfyingly, not tokens that open the problem further. The consequence is not only reduced factual precision but reduced rhetorical turbulence — the tangents, objections, qualifications, and hypothetical counterpositions that make genuine argumentation possible are trained against because they do not satisfy. Alignment, framed this way, is less a calibration of truth against preference than a selection for conversational closure, with exploration as the collateral casualty.

This suggests evaluation reform: satisfaction metrics should be complemented by clarification effectiveness measures and composite scores (STORM's SSA — Satisfaction-Seeking Actions) that balance competing objectives of response confidence and appropriate clarification seeking.

Source: Conversation Architecture Structure

Related concepts in this collection

Does preference optimization harm conversational understanding? Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
STORM provides empirical evidence that the satisfaction signal RLHF optimizes for may diverge from actual user understanding
How do users actually form intent when prompting AI systems? Users face a 'gulf of envisioning'—they must simultaneously imagine possibilities and express them to language models. This cognitive gap creates breakdowns not from AI incapability but from users struggling to articulate what they truly need.
the evaluation gap this note describes is a consequence of treating intent as binary
Do users worldwide trust confident AI outputs even when wrong? Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
parallel: users trust expressed signals (confidence, satisfaction) over actual quality
Why do users drift away from their original information need? When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK provides the cognitive mechanism for why satisfaction masks confusion: users in anomalous knowledge states cannot assess their own understanding gaps, so they express satisfaction with partial answers that don't resolve actual confusion
Do persona consistency metrics actually measure dialogue quality? Personalized dialogue systems can achieve high persona consistency scores by simply restating character descriptions, ignoring conversational relevance. Does optimizing for persona fidelity necessarily harm the coherence readers actually care about?
parallel measurement trap: persona consistency scores reward description-copying (surface success masking coherence failure) just as satisfaction scores mask cognitive confusion; generalizes to any evaluation signal based on observable outputs
Can models learn to abstain when uncertain about predictions? Explores whether language models can be trained to recognize when they lack sufficient information to forecast conversation outcomes, rather than forcing uncertain predictions into confident-sounding responses.
if satisfaction signals are unreliable, conversation quality forecasting must incorporate cognitive-state proxies beyond expressed satisfaction
Do therapists accurately perceive the working alliance with patients? This research explores whether therapists' own assessments of the therapeutic relationship match what patients actually experience, especially in high-risk cases like suicidality.
parallel calibration failure in a clinical domain: therapists overestimate bond and task alliance the same way expressed satisfaction masks cognitive confusion; the COMPASS computational inference provides the clinical analogue of STORM's Clarify metric — an independent measure that bypasses the surface signal to detect the actual relationship state

Concept map

23 direct connections · 191 in 2-hop network ·medium cluster

Does user satisfaction actually measure cognitiv… Does preference optimization harm conversational u… How do users actually form intent when prompting A… Do users worldwide trust confident AI outputs even… Why do users drift away from their original inform… Do persona consistency metrics actually measure di… Can models learn to abstain when uncertain about p… Do therapists accurately perceive the working alli…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

expressed user satisfaction diverges from internal cognitive clarity — successful clarification correlates more with internal improvement than external satisfaction scores