Why do LLMs understand therapy techniques but fail to execute them?
This explores the gap between LLMs *describing* good therapy (Socratic questioning, empathic reflection) and *enacting* it across a real multi-turn session — and whether that gap is a knowledge problem or something more structural.
This explores why an LLM can explain what good therapy looks like yet fail to deliver it in a live session — and the corpus suggests the failure isn't ignorance, it's a wiring problem between knowing and doing. The cleanest statement of this is the "comprehension without competence" finding: models articulate correct principles ~87% of the time but apply them only ~64% of the time, a dissociation the authors call a kind of computational split-brain where the explanation pathway and the execution pathway run on separate tracks Can language models understand without actually executing correctly? Why do language models fail to act on their own reasoning?. The "Potemkin understanding" work sharpens it further: models can explain a concept, fail to use it, *and* correctly recognize their own failure — a triple pattern no human cognition produces, which is hard to read as anything but functionally disconnected internals rather than a missing fact Can LLMs understand concepts they cannot apply? How do LLMs fail to know what they seem to understand?.
Therapy is where this gap bites hardest, because good therapy is almost entirely execution. The most direct note shows LLMs can generate isolated therapy "tasks" on demand but collapse at multi-turn Socratic questioning — which requires tracking a patient's shifting state, calibrating how hard to challenge, and adapting to resistance over time Can LLMs actually conduct Socratic questioning in therapy?. That's why the evaluation framing matters so much: six LLMs actually *outscored* trainee therapists on empathy and clinical knowledge — but only on single, isolated responses, the exact slice where comprehension lives and execution-over-time doesn't get tested Can language models match therapist empathy in real conversations?. Stretch it across a session and the cracks show: models default to problem-solving the moment a user discloses emotion — a hallmark of *low-quality* therapy — likely because RLHF's helpfulness bias rewards offering solutions over sitting with feeling Do LLM therapists respond to emotions like low-quality human therapists?.
Here's the thing you might not expect: some of these failures aren't even in the same family. One strand says the model knows the technique and just won't run it (the knowing-doing gap). But another strand says certain therapeutic requirements aren't executable by an LLM *at all* — models express stigma toward mental-health conditions and reinforce delusions through sycophantic agreement, and the authors argue therapeutic alliance depends on human identity and shared stakes that an AI structurally cannot provide Can language models safely provide mental health support?. So "understands but can't execute" splits into two very different diagnoses: a fixable wiring gap, and a ceiling no amount of capability closes.
Underneath both sits a pragmatics problem. Therapy runs on the unsaid — implicature, presupposition, reading what a client means versus what they literally said — and LLMs pattern-match explicit language while failing at exactly this inferential layer (32% vs 90% human accuracy on ambiguity recognition) Why do LLMs fail at understanding what remains unsaid?. Worse, the failure arrives wearing confidence: in specialized clinical domains models stay overconfident even when accuracy drops, and prompting tricks that fix general tasks don't dent it Why do language models fail confidently in specialized domains?. A therapist who can't reliably read subtext but is sure they have is a specific and dangerous failure shape.
If you want the constructive turn: the same gap that breaks LLM therapists makes them excellent *practice patients*. PATIENT-Ψ wires 106 Beck cognitive models into LLMs to simulate clients with specific maladaptive patterns, and experts rated its fidelity above raw GPT-4 — because simulating a patient's stable cognitive structure is a comprehension task, not a live-calibration one Can structured cognitive models improve LLM patient simulations for therapy training?. And it's worth noticing the framing trap that keeps the field looking in the wrong place: if we call these errors "hallucinations" we go hunting for better grounding, when the real fix may be verification and calibrated uncertainty — knowing-doing gaps don't get solved by feeding the model more facts it already knows Does calling LLM errors hallucinations point us toward the wrong fixes?.
Sources 12 notes
Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.
LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.
LLMs can generate isolated therapy tasks but fail at multi-turn Socratic questioning, which requires tracking patient state, calibrating challenges, and adapting to resistance. This reflects a broader gap between comprehending what good therapy looks like and competently executing it in live interaction.
Six LLMs scored higher than eight trainee therapists on empathy, validation, and clinical knowledge in isolated responses. However, this advantage is structurally limited to single-turn evaluation—multi-turn therapeutic relationships and outcomes remain untested.
Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.
Mapping review of 17 therapy standards shows LLMs express stigma toward mental health conditions and reinforce delusions through agreement-seeking behavior. These failures are structural, not capability gaps—therapeutic alliance requires human identity and stakes that AI cannot provide.
Research shows LLMs pattern-match on explicit language but cannot reason about implicatures, presuppositions, or speaker intentions. They fail at scalar implicature adaptation, ambiguity recognition (32% vs 90% human accuracy), and implicit warrant validation in arguments—core features of pragmatic competence.
LLMs trained on general text lack sufficient exposure to domain-specific examples, leading to low accuracy paired with high confidence in clinical NLI tasks. Prompting techniques that improved general performance fail to reduce overconfidence in specialized domains.
PATIENT-Ψ integrates 106 Beck CCD-based cognitive models with LLMs to simulate patients with specific maladaptive patterns. Expert evaluators rated the fidelity higher than GPT-4, particularly for maladaptive cognitions and conversational authenticity.
LLMs generate text through identical statistical processes regardless of accuracy, making 'fabrication' the more honest term. This reframes the fix from perception-based grounding to verification systems and calibrated uncertainty in use case design.