Do language models add feelings users never actually expressed?
GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.
In the CaiTI therapeutic AI system, licensed therapists reviewing GPT-4 outputs commented that it "sometimes sounds like it is reading into the user's feelings" instead of guiding the user objectively. GPT-based models add their own interpretation of users' feelings instead of providing objective, matter-of-fact output based on user responses.
This is a distinct failure mode from problem-solving bias. Where Do LLM therapists respond to emotions like low-quality human therapists? identifies solution-giving as the problem, this identifies interpretation-injection — the model projecting emotional states the user did not express. In clinical contexts, this is doubly dangerous: the therapist's role is to help the user identify their own feelings, not to tell them what they feel.
The architectural solution CaiTI adopted: task decomposition across multiple specialized models. Rather than using one model for the entire therapeutic pipeline, the system employs specialized Reasoners (binary decision: valid/invalid response), Guides (analysis and assistance), and Validators (empathic validation). Different models handle different subtasks, preventing the propagation of flaws or biases from one model across the entire therapeutic process.
An additional finding: Llama-based models had difficulty following instructions when user expressions lacked logical consistency and contained cognitive distortions — precisely the scenarios that matter most in therapeutic contexts. GPT-based models had more stable interpretation but added unwanted emotional interpolation. The trade-off is between instruction-following stability and interpretive overreach.
CaiTI 24-week validation detail: The CaiTI system's 14-day and 24-week therapist-validated deployments provide the most sustained evidence for this finding. Llama-based models with few-shot prompts showed more stable performance for later CBT stages (challenging and reframing negative thoughts) "where the user responses are more standard and controlled thanks to the filtering of CBT Reasoners and the tasks are more straightforward." The implication: interpretation-injection is worst when user input is ambiguous, emotional, or contains cognitive distortions — precisely the situations where therapeutic guidance matters most. The Reasoner/Guide/Validator architecture partially mitigates by constraining what each model sees and does, but the underlying tendency toward interpolation remains in GPT-based models across all subtasks.
Source: Psychology Chatbots Conversation
Related concepts in this collection
-
Does separating planning from execution improve reasoning accuracy?
Explores whether modularizing decomposition and solution into separate models prevents interference and boosts performance compared to monolithic approaches.
same architectural principle (decomposition) applied to therapeutic context
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
interpretation-injection may be prior training overriding current user context
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms interpolate user feelings rather than guiding objectively in therapeutic contexts — adding interpretations the user did not express