Conversational AI Systems Design & LLM Interaction Psychology and Social Cognition

Why can't users articulate what they want from AI?

Explores the cognitive gap between imagining possibilities and expressing them as prompts. Why language interfaces create a harder envisioning task than traditional UI affordances.

Note · 2026-02-22 · sourced from Conversation Architecture Structure

Post angle for Medium/LinkedIn

AI can answer any question you can think to ask. The problem is that you often can't think of the right question.

STORM calls this the "gulf of envisioning" — the cognitive difficulty users face in simultaneously imagining what's possible and expressing it as a prompt. Unlike conventional interfaces with predictable affordances (buttons, menus, forms), language interfaces require users to envision possibilities and their expressions at the same time. This is a fundamentally harder cognitive task.

The double gap:

On the USER side: intent is not a thing you HAVE — it's a thing that MATURES through interaction. You start with a vague sense ("I want to plan a trip"), constraints resolve progressively ("somewhere warm, in February, under $3000"), stability fluctuates (new information destabilizes), and structural signals you're not even aware of (implicit assumptions, cultural markers) carry meaning you can't articulate.

On the AI side: since Why can't advanced AI models take initiative in conversation?, models are trained to respond to what you say, not to help you figure out what to say. They treat your intent as a binary state (present or absent) rather than a maturation process. They cannot detect that your expression hasn't reached cognitive readiness for system action.

The convergence of three research programs:

STORM — formalizes intent as continuous maturation with the "Clarify" metric measuring internal cognitive improvement. Users may express satisfaction while internally confused about their own needs.
Insert-expansions from CA — provides the interaction framework: when AI can't immediately answer, it should probe the user (clarify intent, scope response) rather than silently chain tool calls and diverge. The "user-as-a-tool" paradigm.
Decision-oriented dialogue — formalizes the information asymmetry: user knows preferences, AI has database, neither can share everything. Success requires determining what information is decision-relevant.

The design implication: This is not a model capability problem to be solved by better models. It's a design problem requiring fundamental changes to how AI interactions are structured. The fix isn't a smarter answer — it's a better conversation about what the question should be.

The hook: AI can answer any question. The problem is that you often can't think of the right question — and AI can't help you get there.

Conversational Prompt Engineering (CPE) demonstrates a partial bridge. A three-party system (user, system, model) where the LLM generates data-driven questions from user-provided unlabeled data, uses responses to shape an initial instruction, then shares outputs and uses feedback to refine both instruction and outputs. The key insight: the model's ability to analyze data and suggest "dimensions of potential output preferences" helps users discover requirements they couldn't initially articulate. However, CPE still requires users to evaluate outputs — the envisioning gap is narrowed by scaffolded interaction but not eliminated. This is a meaningful design finding: structured dialogue around model-generated proposals shifts the user's cognitive task from open-ended envisioning to constrained evaluation, which is significantly easier. The gulf can be narrowed not by making users better at articulating intent, but by changing what they're asked to do.

Source: Conversation Architecture Structure

Key sources:

Why do users drift away from their original information need? — ASK is the upstream cognitive cause of the gulf: users know their knowledge is incomplete but cannot specify what is missing, producing the vague intent that the gulf describes
How do users actually form intent when prompting AI systems?
When should AI agents ask users instead of just searching?
Can AI agents communicate efficiently in joint decision problems?
Why can't advanced AI models take initiative in conversation?
Does user satisfaction actually measure cognitive understanding?

Original note title

the gulf of envisioning — users cant articulate what they want and AI cant help them figure it out