Why can't chatbots detect when users are ambivalent about change?

Explores whether LLMs fail to recognize early-stage motivational states during behavior change conversations, and why this matters for people who need support most.

Note · 2026-02-22 · sourced from Psychology Empathy

The Transtheoretical Model defines five motivational stages: resistance/unawareness, increased awareness but ambivalence, intention with small steps, initiation with commitment, and sustained change. Testing ChatGPT, Bard, and Llama 2 across 25 health behavior scenarios revealed a structured asymmetry: LLMs provide relevant information when users have established goals and commitment (later stages) but fail to recognize motivational states and provide appropriate guidance when users are hesitant or ambivalent (earlier stages).

This is a face-saving failure at a deeper level than Why do language models avoid correcting false user claims?. The model doesn't just accommodate — it literally cannot detect that the user is ambivalent. A human counselor recognizes "I know I should exercise but..." as contemplation-stage talk requiring different intervention than "I've started a running program." The LLM treats both as requests for information about exercise.

The gap extends in both directions. Even for users already making changes, LLMs fail to provide information about reward systems for maintaining motivation or environmental stimulus control to prevent relapse. The models default to external help suggestions (social support, professional resources) rather than intrinsic regulation strategies.

This connects to Does any single persuasion technique work for everyone? — motivational stage is another dimension of individual variation that determines what interventions work. It also explains why empathetic chatbots may systematically fail the people who most need support: those at the earliest stages of behavior change, where resistance and ambivalence are the presenting features.

Source: Psychology Empathy

Related concepts in this collection

Why do language models avoid correcting false user claims? Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
face-saving operates at a different level: accommodation vs. detection failure
Does any single persuasion technique work for everyone? Can fixed persuasion strategies like appeals to authority or social proof be reliably applied across different people and situations, or do they require adaptation to individual traits and context?
motivational stage as another individual variation dimension
Do large language models genuinely simulate mental states? This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.
inability to detect ambivalence is a ToM failure in natural dialogue

Concept map

15 direct connections · 152 in 2-hop network ·dense cluster

Why can't chatbots detect when users are ambival… Why do language models avoid correcting false user… Does any single persuasion technique work for ever… Do large language models genuinely simulate mental…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

LLMs fail to recognize early-stage motivational states but support behavior change for users with established goals and commitment