Using Linguistic Synchrony to Evaluate Large Language Models for Cognitive Behavioral Therapy

Paper · Source
Psychology Therapy Practice

Synchrony describes responsive communication between individuals and is known to be important in building social relationships and supporting mental health outcomes (Delaherche et al., 2012; Klein, 2023). The phenomenon manifests through various modalities, including physical body movements (mirrored body language) (Ramseyer and Tschacher, 2011), vocals (pitch matching) (Imel et al., 2014), and language (linguistic style matching) (Niederhoffer and Pennebaker, 2002), across a variety of contexts (Kidby et al., 2023; Bonny and Jones, 2023). Synchrony is associated with building a sense of affiliation and improving cooperation and rapport (Vail et al., 2022); it is critical in therapist-client relationships (Colton, 2022). In this work, we focus on linguistic synchrony in the context of mental health therapy.

which we operationalize through the normalized Conversational Linguistic Distance (nCLiD) (Nasir et al., 2019), and two measures of the quality of client self-disclosures - intimacy and engagement. Then we compare the performance of the LLM to trained therapists and non-expert online peer supporters in a CBT setting (Figure 1). We show that the LLM is outperformed by both groups. This indicates that LLMs are not yet at the level of humans in generating high-quality therapeutic responses, and we suggest that synchrony can serve as an evaluation criterion for LLMs in mental health contexts

Descriptive intimacy involves the dis closure of private facts, while evaluative intimacy involves the disclosure of personal opinions and information. Engagement is the extent to which a patient actively participates in the therapeutic process beyond simply being present

The concept of interpersonal synchrony describes when the participants of an interaction adapt and converge on each other’s behaviors over time.

focusing on linguistic synchrony, which refers to the similarity between interlocutors in semantics, syntax, or style.

Both Cho et al. (2023) and (Chiu et al., 2024) simulate the client side of the LLM-client conversation due to ethical concerns of having an LLM advise vulnerable populations. However, this prevents a realistic evaluation of LLMs for therapy. The LLM-participant dataset used in this work comes from an IRB-approved study (Kian et al., 2024) in which they deployed LLMs in an interactive CBT homework context with students (Section 4), which provides a step towards more realistic evaluations of LLMs in therapy

framework. Descriptive intimacy involves the disclosure of private facts, while evaluative intimacy involves the disclosure of personal opinions and feelings.

The CBT exercise transcripts were annotated for three variables: descriptive intimacy, evaluative intimacy, and engagement. Four undergraduate annotators (two female, two male) were trained through workshops led by graduate student instructors for two weeks to annotate the data for the selected variables.

This means that higher synchrony is associated with higher intimacy and active engagement (Supporting H1a, H1b, and H1c). We hypothesize that in a therapeutic setting, a therapist’s linguistic synchrony with the client encourages greater self-disclosures and, subsequently, higher levels of intimacy and engagement.

Colton (2022) also found that linguistic synchrony “catalyzes” the therapeutic bond, which is further supported by Vail et al. (2022).

The significant relationship between linguistic synchrony and descriptive intimacy, evaluative intimacy, and engagement indicates that nCLiD shows promise as a measure of therapeutic outcomes.