How do users signal satisfaction through implicit cues that training data misses?

This explores the gap between the rich, implicit ways people show they're satisfied — hesitation, follow-up questions, emotional tone, the confidence behind a click — and the flattened signals that actually make it into training data. The corpus suggests the problem isn't that implicit cues are absent; it's that they carry more dimensions than our reward signals are built to capture, so the extra information gets collapsed and thrown away.

Start with what an implicit signal actually contains. When someone watches, clicks, or buys, that action encodes two separate things: *which* thing they prefer, and *how sure* we should be about that preference Can implicit feedback reveal both preference and confidence?. An explicit star rating squashes both into one number and loses the confidence dimension entirely. So even before training begins, the way we record feedback discards information that was right there in the behavior — the certainty signal evaporates.

The more unsettling finding is that the signal we *do* collect can point the wrong way. Users routinely report being satisfied while remaining internally confused, especially when they don't know what they don't know — and it's sustained engagement, not the satisfaction score, that actually tracks whether they understood anything Does user satisfaction actually measure cognitive understanding?. Train on the expressed score and you optimize for a feeling that diverges from the real outcome. This is exactly the mechanism behind the "alignment tax": RLHF rewards confident, complete-sounding answers and penalizes the clarifying questions and understanding-checks that good communication depends on, cutting those grounding acts to a fraction of human levels Does preference optimization harm conversational understanding?. The model learns the cue that's easy to score (looks helpful) and misses the cue that matters (was actually understood).

The same blind spot shows up emotionally. When users disclose feelings, LLMs default to problem-solving — a hallmark of *low-quality* therapy — because the helpfulness bias baked in by training reads "give a solution" as the satisfying move, missing the implicit cue that the person wanted to be heard, not fixed Do LLM therapists respond to emotions like low-quality human therapists?. The interesting counter-moves in the corpus all try to recover a cue that ordinary training data misses. One uses a simulated user's *emotional trajectory* across a conversation as the reward, so the model optimizes for how the person feels over time rather than per-turn approval Can emotion rewards make language models genuinely empathic?. Another takes negative, indirect feedback ("this doesn't look right for a date") and translates it into the positive preference hiding inside it ("prefer something more romantic"), reading the satisfaction signal buried in a complaint Can language models bridge the gap between critique and preference?.

The through-line worth taking away: "satisfaction" isn't one thing a dataset can label. It's a bundle of preference, confidence, emotional state, and genuine comprehension — and standard training pipelines tend to capture only the loudest, most explicit slice while the quieter, more honest cues fall through. The frontier work here is less about collecting more feedback and more about *decompressing* the feedback we already have.

Sources 6 notes

Can implicit feedback reveal both preference and confidence?

Hu, Koren, and Volinsky show that implicit signals (watches, purchases, clicks) encode preference and confidence as two distinct dimensions. Explicit ratings collapse these into one number, losing information about certainty in the preference estimate.

Does user satisfaction actually measure cognitive understanding?

STORM shows users express satisfaction despite internal confusion, especially when unaware of knowledge gaps. Sustained engagement correlates with actual self-understanding, not immediate satisfaction ratings.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Do LLM therapists respond to emotions like low-quality human therapists?

Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.

Can emotion rewards make language models genuinely empathic?

RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.

Can language models bridge the gap between critique and preference?

Few-shot LLM prompting can convert natural negative feedback like "doesn't look good for a date" into positive preferences like "prefer more romantic," enabling retrieval systems to find better-matching recommendations without fine-tuning.

How do users signal satisfaction through implicit cues that training data misses?

Sources 6 notes

Next inquiring lines