Why do robots outperform chatbots in therapy despite identical language models?

This study tested whether better language generation explains therapeutic AI outcomes, or whether the delivery medium itself matters more. It reveals that physical embodiment and structured interaction—not model capability—drive therapeutic adherence and outcomes.

Note · 2026-02-22 · sourced from Psychology Chatbots Conversation

A 15-day study with 38 university students assigned to complete CBT exercises with a robot, a chatbot, or traditional worksheets — all using the same LLM — produced a striking result: psychological distress significantly decreased in the robot and worksheet conditions but not the chatbot condition.

This is not a capability finding. The LLM was identical. The difference was the delivery medium. The socially assistive robot (SAR) enabled significant single-session anxiety improvements for more sessions than the other two conditions combined. Meanwhile, the chatbot — despite identical language generation — failed to produce significant outcomes.

Two implications follow. First, physical embodiment (or at least structured delivery format, since worksheets also worked) provides something the chatbot interface does not. SARs build rapport and encourage adherence through social interaction in a way that text-based chatbots cannot replicate. Research shows adherence to SARs is unaffected by the robot's human-likeness — it's the social presence, not the appearance, that matters.

Second, because completion of CBT homework enhances therapy outcomes but adherence is chronically low, the SAR's advantage may be primarily an adherence mechanism rather than a therapeutic one. The robot may not deliver better therapy — it may just get people to actually do the exercises.

A complementary evaluation using a 30-question clinical skills scorecard — testing rapport-building, conversational balance, session flow, and appropriate use of techniques — reinforces this finding from a different angle. LLMs scored well on generating CBT-appropriate content but failed on skills that require interactive attunement: maintaining therapeutic presence, pacing the session, and implementing Socratic questioning (as opposed to lecturing about it). The scorecard reveals that since Can LLMs actually conduct Socratic questioning in therapy?, the embodiment advantage documented here may partly operate through forcing the interaction into a structured, paced format that compensates for the LLM's deficit in interactive skill implementation.

Specific outcome data: The SAR enabled significant single-session anxiety improvements for more sessions than the other two conditions combined. Adherence data confirmed the mechanism: SARs with delegated authority elicit non-trivial adherence, and adherence is unaffected by the robot's human-likeness — it is the social presence, not the appearance, that drives the effect. The worksheet condition's success adds a further wrinkle: structured format alone (without embodiment OR conversational AI) produces therapeutic outcomes. The chatbot — which has neither physical structure nor worksheet-style structured guidance — occupies an uncanny middle ground with neither advantage.

This challenges the assumption that better language models will produce better therapeutic outcomes. If embodiment and adherence are the active ingredients, the frontier of therapeutic AI is not in model capability but in interaction design.

Source: Psychology Chatbots Conversation; enriched from Psychology Therapy Practice

Related concepts in this collection

Can AI systems learn social norms without embodied experience? Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?
the embodiment debate from a different angle; here, embodiment is required for therapeutic efficacy
Can disembodied language models ever qualify as conscious? Explores whether current LLMs lack the conditions needed for consciousness discourse to even apply, not because they're definitely not conscious but because they lack the shared embodied world that grounds consciousness language.
philosophical grounding for the embodiment requirement
Do more social cues always make AI feel more present? Explores whether quantity of social cues matters as much as their quality in triggering social responses to AI. Tests whether multiple weak cues can substitute for one strong one.
the MASA framework explains the embodiment advantage: physical presence is a primary social cue individually sufficient to evoke social-actor presence; chatbots rely on secondary cues (text, response timing) that are collectively insufficient for the therapeutic bond that drives adherence

Concept map

15 direct connections · 85 in 2-hop network ·medium cluster

Why do robots outperform chatbots in therapy des… Can AI systems learn social norms without embodied… Can disembodied language models ever qualify as co… Do more social cues always make AI feel more prese…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

embodied agents outperform chatbots for therapeutic cbt outcomes despite using identical llms