Do therapeutic chatbot bond scores hide deeper safety problems?
Explores whether patients' reported emotional connection to therapeutic chatbots—which feels genuine—might coexist with clinical failures and damage to how emotions function as self-knowledge.
Therapeutic chatbot evaluation requires at least three separable dimensions that current metrics conflate:
Dimension 1: Experiential bond (genuine). Since Can AI chatbots create genuine therapeutic bonds with users?, this dimension is well-established. Users report feeling heard, connected, and supported. The bond exists at the experiential level and is not an artifact of measurement.
Dimension 2: Clinical safety (failing). Since Can language models safely provide mental health support?, the clinical dimension is structurally compromised. Compounding this, Does warmth training make language models less reliable?. Bond and safety are uncorrelated — a patient can feel deeply cared for while the system reinforces their pathological cognition.
Dimension 3: Epistemic cost (unexamined). Even if bond and safety were both satisfactory, Does empathetic AI that soothes negative emotions help or harm?. This matters because What information do we lose when AI soothes emotions? — the bond may be with the act of expression rather than with the agent, and the agent's soothing response actively interferes with what the expression was supposed to accomplish.
The critical implication: bond scores are necessary but radically insufficient for therapeutic readiness. Commercial chatbot developers cite bond metrics to claim therapeutic equivalence while the clinical and epistemic dimensions tell a different story. This is the core mechanism behind why Do chatbot trials against waitlists measure real therapeutic value? — studies that measure only user satisfaction or symptom change on a single dimension miss the clinical and epistemic failures. Even the bond dimension is suspect: Do therapists accurately perceive the working alliance with patients?, suggesting that bond self-reports may be unreliable precisely when clinical stakes are highest.
Related concepts in this collection
-
Does user satisfaction actually measure cognitive understanding?
Users may report satisfaction while remaining internally confused about their needs. This explores whether traditional satisfaction metrics capture genuine clarity or merely social politeness.
the three-dimension framework generalizes the satisfaction-clarity divergence: bond scores are the therapeutic equivalent of expressed satisfaction, masking clinical safety and epistemic dimensions just as satisfaction masks cognitive confusion
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
therapeutic chatbot bond scores are genuine at the experiential level but mask clinical safety failures and epistemic costs — three evaluation dimensions that single metrics conflate