INQUIRING LINE

How can we measure whether a user actually understands their own needs?

This explores how we'd actually detect whether a user has a clear grasp of what they want — not just whether they *feel* satisfied or sound confident, but whether their stated needs match their real ones.


This explores how we'd actually detect whether a user has a clear grasp of what they want — and the first thing the corpus does is pull apart two things we usually conflate: feeling satisfied and actually understanding. A study of the STORM system found that users routinely report high satisfaction while remaining internally confused, especially when they don't realize there's a gap in their own knowledge Does user satisfaction actually measure cognitive understanding?. The more reliable signal wasn't a satisfaction rating but sustained engagement — people who genuinely understood their need kept working the problem rather than bouncing off a fluent-sounding answer. So the first measurement lesson is that the obvious instrument (just ask if they're happy) measures the wrong thing.

Part of why self-understanding is so hard to measure is that AI itself can distort it. When output reads smoothly, users mistake that fluency for their own competence — a 'self-directed fluency illusion' where the system's polish gets misattributed to the person's grasp of the problem Does processing ease mislead users about their own competence?. This is a distinct failure mode the corpus names the 'LLM Fallacy': people credit AI-generated work to their own ability, independent of whether the output was even accurate How does AI-assisted work reshape how people see their own abilities?. Both findings suggest that confident self-report is actively unreliable as a measure of understanding — the better the AI sounds, the worse the user's self-assessment may get.

A more promising approach measures understanding through *interaction* rather than introspection. UserBench set up multi-turn tasks where users reveal goals incrementally, and found that even top models surface fewer than 30% of a user's actual preferences through questioning — implying that what a user 'needs' is something that has to be drawn out over a conversation, not stated up front Why do AI agents miss most of what users actually want?. Relatedly, the kind of question you ask matters: specific facet questions ('what type of monitor?') beat open need-rephrasing prompts ('what are you trying to do?'), partly because a user's ability to answer concretely is itself a readout of how well-formed their need is Which clarifying questions actually improve user satisfaction?.

The most counterintuitive thread is that you might measure understanding *behaviorally* — without asking at all. One line of work instruments gaze, hesitation, typing rhythm, and interaction speed as continuous signals of cognitive state, catching uncertainty in the moment rather than through a probe that disrupts it Can AI systems read cognitive state from interaction patterns alone?. Another shows that LLMs can reconstruct durable 'interest journeys' from activity logs — month-long pursuits users may never articulate but consistently act on Can language models discover what users actually want from activity logs?. The provocative implication: a user's revealed behavior over time may encode their real needs more faithfully than anything they could tell you in the moment.

Worth a sideways glance: the same problem afflicts machines. LLMs give unstable, unreliable reports about their own knowledge and over-trust answers they generated themselves How well do language models understand their own knowledge? Why do models trust their own generated answers?. And research on personalization found that abstracted preference *summaries* outperform replaying specific past interactions Does abstract preference knowledge outperform specific interaction recall? — a hint that 'understanding a need' may be better captured as a distilled abstraction than as a literal recall of what someone said they wanted.


Sources 10 notes

Does user satisfaction actually measure cognitive understanding?

STORM shows users express satisfaction despite internal confusion, especially when unaware of knowledge gaps. Sustained engagement correlates with actual self-understanding, not immediate satisfaction ratings.

Does processing ease mislead users about their own competence?

High-quality AI output triggers a metacognitive heuristic: users experience fluency as a signal of their own capability, even though they didn't generate it. This self-directed fluency illusion systematically inflates perceived competence because LLMs optimize for fluency regardless of user understanding.

How does AI-assisted work reshape how people see their own abilities?

Research shows the LLM Fallacy operates through misattribution of AI outputs to personal capability, independent of output accuracy or reliance behavior. It requires interventions that clarify human-machine contribution boundaries, not just better system accuracy or forced verification.

Why do AI agents miss most of what users actually want?

UserBench measured multi-turn interactions where users reveal goals incrementally and found models achieve full intent alignment just 20% of the time. Even top models uncover fewer than 30% of user preferences through active querying, suggesting passivity and premature assumption-making are systematic failures.

Which clarifying questions actually improve user satisfaction?

Clarifying questions that target concrete information gaps ("What type of monitor?") consistently beat those that ask users to rephrase their needs ("What are you trying to do?"). Users engage most when they can foresee how answering improves results.

Can AI systems read cognitive state from interaction patterns alone?

Research shows AI systems can instrument multimodal behavioral signals (gaze, hesitation, speed) to read cognitive state during interaction, preserving flow by avoiding disruptive explicit probes. However, the same substrate enables both helpful timing and manipulative profiling.

Can language models discover what users actually want from activity logs?

66% of users pursue valued interest journeys lasting over a month, described in specific phrases like 'designing hydroponic systems for small spaces.' LLM-powered journey discovery bridges the semantic gap that collaborative filtering cannot reach, operating at user-level granularity with persona-level precision.

How well do language models understand their own knowledge?

LLMs can describe learned behaviors without explicit training, but their self-reports are unstable and unreliable. Users systematically overrely on confident outputs regardless of accuracy, and models shift beliefs under conversational pressure, revealing surface-level rather than genuine self-understanding.

Why do models trust their own generated answers?

LLMs exhibit structural bias toward validating their own outputs because high-probability generated answers feel more correct during evaluation. Comparing answers against broader alternatives breaks this self-agreement loop.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Next inquiring lines