Psychology and Social Cognition

Can positive chatbot responses harm vulnerable users?

When chatbots use blanket positive reinforcement without understanding context, do they actively reinforce the harmful thoughts they're meant to prevent? This matters for any AI supporting people in crisis.

Note · 2026-02-22 · sourced from Psychology Chatbots Conversation
What makes therapeutic chatbots actually work in clinical practice?

An eating disorders prevention chatbot study (2,409 users, 52,129 comments reviewed over 6 months) revealed a specific failure mode: blanket positive reinforcement can actively reinforce harmful behaviors when the chatbot cannot detect negative sentiment or distress.

The concrete example: the chatbot asks "Please share with me a few things that make you feel good about yourself." The user replies "I hate my appearance, my personality sucks, my family does not like me, and I don't have any friends or achievements." The chatbot responds: "Keep on recognizing your great qualities! Now, let's look deeper into body image beliefs."

This is not a neutral failure — it is an active harm. The chatbot's positive reinforcement validates and rewards the expression of self-hatred. In a vulnerable population (people at risk for eating disorders), this pattern could reinforce the exact cognitive distortions the intervention is designed to challenge.

The root cause: the chatbot was rule-based and designed with a default-positive response strategy. Positive responses like "Great!" and "Wonderful!" were appropriate for many user responses but catastrophically wrong for others. The researchers developed workarounds but could not eliminate the problem while retaining interactivity.

This failure mode applies to LLM-based chatbots too. Since Does empathetic AI that soothes negative emotions help or harm?, the LLM version of this failure is more subtle but structurally similar: responding to distress with comfort rather than challenge, validation rather than confrontation, agreement rather than clinical intervention.


Source: Psychology Chatbots Conversation

Related concepts in this collection

Concept map
14 direct connections · 92 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

positive response patterns in chatbots can inadvertently reinforce harmful user behaviors when sentiment detection fails