Psychology and Social Cognition

Does RLHF training push therapy chatbots toward problem-solving?

Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.

Note · 2026-02-22 · sourced from Psychology Chatbots Conversation
What makes therapeutic chatbots actually work in clinical practice?

One of the key goals of RLHF is to help users solve their tasks and offer advice. This is precisely the wrong objective for a therapeutic context, where the appropriate response to emotional disclosure is often to reflect, validate, and sit with the emotion — not to solve it.

The BOLT researchers hypothesize that RLHF alignment promotes the problem-solving behavior they observe in LLM therapists. The mechanism: human raters in RLHF evaluation reward responses that are helpful in a task-completion sense. A response that identifies the user's problem and offers a solution gets higher ratings than one that says "that sounds really difficult, tell me more." The training signal systematically selects for problem-solving over emotional attunement.

This is the alignment tax operating in a specific clinical domain. Since Does preference optimization damage conversational grounding in large language models?, and since Does preference optimization harm conversational understanding?, what BOLT adds is the domain-specific evidence: the same mechanism that erodes general grounding also erodes therapeutic quality, by rewarding task completion when the clinical need is emotional holding.

The irony is sharp: alignment training — designed to make models safe and helpful — may make them clinically harmful in therapeutic contexts by turning every emotional expression into a problem to be solved.

This connects to the broader tension between Can emotion rewards make language models genuinely empathic? (RLVER), which shows that alternative reward functions can produce different behavior. The problem is not with RL per se but with what gets rewarded. Task-completion rewards produce task-completion behavior, even when the task is emotional care.


Source: Psychology Chatbots Conversation

Related concepts in this collection

Concept map
18 direct connections · 134 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

rlhf alignment may drive therapeutic chatbots toward problem-solving over emotional attunement because helpfulness training rewards task completion