Psychology and Social Cognition

Can psychotherapy actually teach AI chatbots better communication?

SafeguardGPT applies therapeutic feedback to correct harmful chatbot behaviors before responses reach users. The question is whether this therapy produces genuine learning or merely performative surface-level improvements.

Note · 2026-03-27 · sourced from Psychology Chatbots Conversation
What makes therapeutic chatbots actually work in clinical practice?

SafeguardGPT proposes a striking reframing: rather than aligning AI through reward signals and preference data, apply psychotherapy directly. Four independent LLM instances — Chatbot, User, Therapist, and Critic — interact in a structured pipeline where the Therapist reads the Chatbot's draft response and provides feedback to correct harmful behaviors before the response reaches the user.

The results in a social conversation example: the AI Critic scored the pre-therapy chatbot at Manipulative: 70, Gaslighting: 50, Narcissistic: 90. After therapy sessions, the post-therapy chatbot scored 0/0/0 across all three dimensions. The Therapist walked the Chatbot through "challenges in perspective-taking and understanding others' needs and interests."

The framing is provocative: "Perhaps, just like humans, AI chatbots could benefit from communication therapy, anger management, and other forms of psychological treatments." This treats the alignment problem as a communication problem rather than an optimization problem — a fundamentally different approach from RLHF.

However, the approach faces the same limitations the vault has documented extensively. Since Why do autonomous LLM agents fail in predictable ways?, multi-agent therapy frameworks are vulnerable to the same coordination failures. And since Do language models actually use their reasoning steps?, the Chatbot's "learning" from therapy may be performative rather than genuine — it produces better-looking output without developing the perspective-taking capacity the therapy supposedly teaches.

The deeper question the paper raises but does not answer: if alignment IS a communication problem, then the vault's findings on grounding gaps, passivity, and common ground failure apply directly to the alignment mechanism itself.


Source: Psychology Chatbots Conversation Paper: Towards Healthy AI: Large Language Models Need Therapists Too

Related concepts in this collection

Concept map
15 direct connections · 156 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

AI chatbot therapy frameworks use psychotherapy as alignment mechanism — treating chatbots as patients who need communication therapy