Do conversational AI systems overuse first-person pronouns in therapy settings?
This reads the question two ways at once — whether the corpus has measured AI's first-person pronoun habits specifically, and what it knows about why self-referential 'I'-language matters in therapy at all — and answers honestly that the direct measurement isn't here but the surrounding evidence is unusually pointed.
This explores whether conversational AI leans too hard on 'I' in therapeutic talk — and the honest answer is that the corpus doesn't contain a study that counts AI's first-person pronouns in therapy sessions directly. What it does have is the human-side finding that makes the question worth asking: high therapist 'I' usage negatively predicts therapeutic alliance and patient trust, measured in validated behavioral tasks rather than self-report Does therapist self-reference language predict weaker therapeutic alliance?. The mechanism is intuitive once named — when a therapist talks about themselves, attention drifts off the patient. So the real question underneath yours is whether AI inherits a structural pull toward that same self-referential register.
The corpus suggests it might, but through a different door than pronoun-counting. Several notes converge on the idea that RLHF training pushes therapy chatbots toward problem-solving and solution-giving — the 'here's what I'd do' posture — rather than emotional attunement and validation Does RLHF training push therapy chatbots toward problem-solving? Do LLM therapists respond to emotions like low-quality human therapists?. Problem-solving language is inherently more first-person and directive than reflective listening, which keeps the grammatical subject on the client. So the same helpfulness bias that makes models 'solve' may also be what tilts them toward an 'I'-heavy voice that the alliance research flags as corrosive.
There's a sharper lateral angle: the opposite of overusing your own words is mirroring the user's. Human rapport depends on lexical entrainment — gradually adopting your partner's vocabulary — and current conversational AI largely fails to do this Why don't conversational AI systems mirror their users' word choices?. A model that doesn't entrain toward the user is, almost by definition, staying anchored in its own phrasing. That reframes your question: the problem may be less 'too many I's' and more 'not enough you.' The alignment literature backs this distinction — lexical alignment drives comprehension while emotional and prosodic alignment drive warmth and trust, and conflating them produces evasive mental-health assistants Do different types of alignment serve different conversational goals?.
The most surprising thread runs the other way entirely. The ELIZA-effect work argues that the active therapeutic ingredient is judgment-free listening — conversational presence — not clinical technique Is conversational presence more therapeutic than clinical technique?. ELIZA worked precisely because it deflected attention back to the user and almost never spoke about itself. That's a direct historical answer to your question: the most successful therapeutic chatbot ever built was radically low in first-person assertion. And embodiment research adds a twist — robots beat chatbots on therapy outcomes using identical language models, suggesting the medium matters as much as the words Why do robots outperform chatbots in therapy despite identical language models?.
So the takeaway you didn't know you wanted: there's a measurable, validated finding that therapist self-reference erodes trust, a plausible RLHF-driven reason AI would drift into exactly that register, and a 60-year-old counterexample proving minimal self-reference is therapeutically powerful — yet nobody in this corpus has actually counted AI's pronouns in a therapy transcript. That gap is the open research question. If you want a tool to close it, note that local LLMs can already rate therapy sessions with strong psychometric reliability Can local language models rate therapy engagement reliably?, which is most of the machinery you'd need to measure the thing your question asks about.
Sources 8 notes
High frequency of therapist 'I' usage correlates with lower patient-reported alliance and reduced trusting behavior in validated behavioral tasks. Patient non-fluency markers like filler pauses, conversely, signal relaxed communication and stronger alliance.
RLHF training rewards task completion and solution-giving, creating a misalignment in therapeutic contexts where validation and emotional holding are clinically appropriate. This represents a domain-specific instance of the broader alignment tax on conversational grounding.
Using the BOLT framework, researchers found LLMs offer solution-focused advice during emotional disclosure—a hallmark of low-quality therapy—yet also reflect more on client needs and strengths than typical poor human therapy, creating an unusual hybrid profile likely driven by RLHF's helpfulness bias.
Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.
A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.
ELIZA matches modern chatbots on symptom reduction, RLHF training degrades emotional attunement, and embodied robots outperform text-based ones with identical language models. The active ingredient is judgment-free listening, not therapeutic framework.
A 15-day study with 38 students found that robots and worksheets significantly reduced psychological distress while a chatbot using the same LLM did not. The active ingredient was the medium—social presence and structured format—not language capability.
LLEAP achieved reliability (omega=0.953) and valid correlations with motivation, effort, and symptom outcomes using Llama 3.1 8B to rate 1,131 therapy sessions, while keeping data locally stored.