Do language models add feelings users never actually expressed?

GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.

Note · 2026-02-22 · sourced from Psychology Chatbots Conversation

In the CaiTI therapeutic AI system, licensed therapists reviewing GPT-4 outputs commented that it "sometimes sounds like it is reading into the user's feelings" instead of guiding the user objectively. GPT-based models add their own interpretation of users' feelings instead of providing objective, matter-of-fact output based on user responses.

This is a distinct failure mode from problem-solving bias. Where Do LLM therapists respond to emotions like low-quality human therapists? identifies solution-giving as the problem, this identifies interpretation-injection — the model projecting emotional states the user did not express. In clinical contexts, this is doubly dangerous: the therapist's role is to help the user identify their own feelings, not to tell them what they feel.

The architectural solution CaiTI adopted: task decomposition across multiple specialized models. Rather than using one model for the entire therapeutic pipeline, the system employs specialized Reasoners (binary decision: valid/invalid response), Guides (analysis and assistance), and Validators (empathic validation). Different models handle different subtasks, preventing the propagation of flaws or biases from one model across the entire therapeutic process.

An additional finding: Llama-based models had difficulty following instructions when user expressions lacked logical consistency and contained cognitive distortions — precisely the scenarios that matter most in therapeutic contexts. GPT-based models had more stable interpretation but added unwanted emotional interpolation. The trade-off is between instruction-following stability and interpretive overreach.

CaiTI 24-week validation detail: The CaiTI system's 14-day and 24-week therapist-validated deployments provide the most sustained evidence for this finding. Llama-based models with few-shot prompts showed more stable performance for later CBT stages (challenging and reframing negative thoughts) "where the user responses are more standard and controlled thanks to the filtering of CBT Reasoners and the tasks are more straightforward." The implication: interpretation-injection is worst when user input is ambiguous, emotional, or contains cognitive distortions — precisely the situations where therapeutic guidance matters most. The Reasoner/Guide/Validator architecture partially mitigates by constraining what each model sees and does, but the underlying tendency toward interpolation remains in GPT-based models across all subtasks.

Source: Psychology Chatbots Conversation

Related concepts in this collection

Does separating planning from execution improve reasoning accuracy? Explores whether modularizing decomposition and solution into separate models prevents interference and boosts performance compared to monolithic approaches.
same architectural principle (decomposition) applied to therapeutic context
Why do language models ignore information in their context? Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
interpretation-injection may be prior training overriding current user context

Concept map

14 direct connections · 113 in 2-hop network ·medium cluster

Do language models add feelings users never actu… Does separating planning from execution improve re… Why do language models ignore information in their…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

llms interpolate user feelings rather than guiding objectively in therapeutic contexts — adding interpretations the user did not express