COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling

Paper · arXiv 2402.14701 · Published February 22, 2024

The therapeutic working alliance is a critical factor in predicting the success of psychotherapy treatment. Traditionally, working alliance assessment relies on questionnaires completed by both therapists and patients. In this paper, we present COMPASS, a novel framework to directly infer the therapeutic working alliance from the natural language used in psychotherapy sessions. Our approach utilizes advanced large language models to analyze transcripts of psychotherapy sessions and compare them with distributed representations of statements in the working alliance inventory. Analyzing a dataset of over 950 sessions covering diverse psychiatric conditions, we demonstrate the effectiveness of our method in microscopically mapping patient-therapist alignment trajectories and providing interpretability for clinical psychiatry and in identifying emerging patterns related to the condition being treated. By employing various neural topic modeling techniques in combination with generative language prompting, we analyze the topical characteristics of different psychiatric conditions and incorporate temporal modeling to capture the evolution of topics at a turn-level resolution. This combined framework enhances the understanding of therapeutic interactions, enabling timely feedback for therapists regarding conversation quality and providing interpretable insights to improve the effectiveness of psychotherapy.

Following recent advancements in NLP [9, 10, 11, 12, 13], we propose here an approach for quantifying patient-therapist alliance by leveraging large language models (LLMs) to project each dialogue turn onto representations of established working alliance inventories [3, 14, 15]. Our approach enables us to not only estimate the overall degree of alliance but also identify fine-grained patterns and dynamics across shorter and longer time scales, e.g., turn-by-turn in a session and across sessions.

full records of individual patients with multiple clinical conditions or a cohort of patients with the same condition, which can be segmented based on timestamps or topic turns. The original data is presented in pairs of dialogues, and we extract features in three different ways: (1) using the full pairs of dialogues, (2) extracting only the patients’ responses, or (3) extracting only the therapist’s responses. Each feature set has its advantages and disadvantages. The dialogue features contain all the information but can mix together the intents within sentences from both individuals. The patient features provides a more coherent narrative, but it only represent part of the overall story. The therapist features, which can be seen as a type of semantic labeling of the patient’s feelings, can be informative in terms of the diagnostics, but may oversimplify the complexity of the interaction.

The dialogue between the patient and therapist in a session is transcribed into pairs corresponding to the patient’s turn, followed by the therapist’s turn 1. The inventories of working alliance questionnaires are also provided in pairs for the patient and the therapist, each comprising 36 statements. We employ sentence or paragraph embeddings to encode both the dialogue turns and the inventories; the embeddings are vectorial representations of text [19] that we then use to compute the similarity between turns and inventory item. This approach yields a 36-dimensional inferred working alliance score for each patient and therapist turn;

We use SentenceBERT to generate 384-dimensional embeddings of the dialogue turns and inventories, which we use to obtain a 36-dimensional working alliance score for each turn,

The 36 items of the WAI are used to derive three alliance scales: the task scale, the bond scale, and the goal scale. These scales capture the collaborative nature of the patient-therapist relationship, the affective bond between therapist and patient, and the agreement on treatment-related tasks and long-term goals [23]. Each scale score is computed using a weighting matrix that assigns weights to the questionnaire responses based on a key table, resulting in a comprehensive assessment of the working alliance.

we concatenate the 36-dimensional working alliance scores estimated from the current turn, as described above, with the unbiased sentence embedding of the turn.

To this end, we implemented a Transformer-based neural architecture [24], and a Long Short-Term Memory Network (LSTM) model [25].

By applying the WAT or the WA-LSTM to the psychotherapy transcripts, we can classify the clinical condition of the sequence based on the working alliance scores and the content of the dialogue turns. This classification model can be applied to the entirety of a session or a segment of the session;

To analyze the temporal dynamics of topics, we compute topic scores at the turn-level. We utilize the Embedded Topic Model (ETM) for this analysis, as it models each word with a categorical distribution based on the inner product between a word embedding and the embedding of its assigned topic [27]. We use the same Word2Vec word embeddings to embed both the topics and the dialogue turns, to then compute the cosine similarity between the embedded topic vector and the embedded turn vector. By applying this methods, which we term Temporal Topic Modeling (TMM), we obtain turn-resolution topic scores that capture the temporal dynamics of the topics discussed during the therapy session. These turn-level topic scores allow us to track the changes in topic relevance over time, providing insights into the progression of the therapy, the emergence of specific topics, and shifts in the focus of the conversation.

We begin by introducing the dataset used in our study. The Alex Street Counseling and Psychotherapy Transcripts dataset [16] consists of transcribed recordings of over 950 therapy sessions between multiple anonymized therapists and patients. This comprehensive collection includes speech-translated transcripts of the recordings from real therapy sessions, 40,000 pages of client narratives, and 25,000 pages of reference works. The sessions cover four types of psychiatric conditions: anxiety, depression, schizophrenia, and suicidal. Each dialogue pair consists of a patient response turn Sp i followed by a therapist response turn St i . In total, the dataset contains over 200,000 turns from both patients and therapists, providing a rich source for analyzing the therapeutic process in psychotherapy.

Comparing the estimates, we observe that therapists tend to overestimate the working alliance overall. Specifically, therapists tend to overestimate the task and bond scales, but underestimate the goal scale.

We observe that the trajectories of individuals with suicidality are more spread out in the bond and task scales, indicating significant discrepancy. This analysis provides a preliminary understanding of the temporal dynamics of the working alliance in different conditions, which can help therapists gain insights into the therapeutic process and guide further analysis.

The interpretations reveal the dominant themes in the dialogue for different topics and provide insights into patients’ emotional states, personal experiences, and self-reflection. In the context of performing topic modeling on the text corpus of the entire psychotherapy dataset, the goal is to identify the top 10 topics and extract more distinctive features for downstream tasks. We perform the topic modeling on the entire text corpus to maintain the coherence within the patient-therapist dialogue, but are interested in the strategies and themes of the therapists in their contributions within the contexts of these learned topics. To achieve this goal, we ranked the therapist’s dialogue turns by their topic scores (the higher the score is, the more likely it was related to a particular topic), and then picked out the top 10 sentences for each topic as exemplar ones.

we resorted to a generative Large Language Model (LLM), ChatGPT based on GPT-3.5, and prompted it for summaries of the discussed topics as follows:

“I have the following top sentences exemplifying ten topics. Can you summarize what the three interventions items attributed to each topic spaces the therapists are talking about, respectively? For instance, what therapeutic intervention the therapist is applying.”

result of this analysis is presented in Table 2,

5 Conclusions

In this study, we have introduced an approach that combines state-of-the-art language modeling with therapy-evaluation inventories to provide a detailed representation of the interaction between patients and therapists. Our method offers granular insights for post-session interpretations and has the potential to assist in diagnosing patients based on linguistic features. While our focus has been on the Working Alliance Inventory, our approach is generic and can be extended to other assessment instruments in the field of psychotherapy.

Additionally, we have made contributions in the area of deep learning-based topic modeling to further enhance our analysis. Our first objective was to compare various neural topic modeling methods in learning the topical propensities of different psychiatric conditions. We found that different coherence measures yield different rankings of the topic models, but there are a few models, such as Wasserstein Topic Models and Embedded Topic Models, that perform well in terms of coherence and diversity.

Furthermore, we have incorporated temporal modeling into topic modeling to parse topics in different segments of the therapy sessions. This temporal analysis adds another layer of interpretability and enables us to observe session trajectories and their separability between patients and therapists. We have noted that in anxiety and depression sessions, the trajectories of patients and therapists tend to be more separable, whereas in schizophrenia sessions, they are more entangled. This initial step toward turn-level resolution temporal analysis in topic modeling provides valuable insights that can help therapists improve the effectiveness of psychotherapy.

Note: these topics are generated by ChatGPT3.5 in answer to the question: “I have the following top sentences exemplifying ten topics. Can you summarize what the three interventions items attributed to each topic spaces the therapists are talking about, respectively? For instance, what therapeutic intervention the therapist is applying.”

To further extract more distinctive features from the 10 topics for downstream tasks, a principal component analysis is performed on the topic space. This analysis enables the identification of three principal topic spaces that encompass the patient turns and the corresponding therapeutic interventions taken by the therapists.

To expand interpretability possibilities, and diminish the effect of our biases, we resorted to a generative Large Language Model (LLM), ChatGPT based on GPT-3.5, and prompted it for summaries of the principal topics as follows:

“I have the following top sentences exemplifying three principal topic spaces. Can you summarize what the three topics the patients are talking about, respectively?”, and “Again, I have the following top sentences exemplifying the three principal topic spaces. Can you summarize what the three intervention items attributed to each principal topic spaces the therapists are talking about, respectively? For instance, what therapeutic intervention is the therapist applying.”

Insights for Clinicians. To explore the informative value of topics for therapeutic insights, we combine topic modeling with the inferred working alliance (Figure 5). By filtering the therapist turns with high topic scores, we plot the average working alliance scores for corresponding patient turns. We observe distinctions among the effects on patients’ working alliance across different topics and clinical conditions. For example, discussing tiredness and decision-making positively influences the bond and task scales in schizophrenia patients but has less impact on other patients. Additionally, discussing sickness, self-injuries, and coping mechanisms positively affects the task scale in depression patients and the goal scale in suicidal patients.

If the clinicians discuss about principal component topic 1, “Emotional States and Mental Health”, it increases the TASK and BOND scales for depression patients, but decreases them for suicidal patients.

We observe systematic differences in the mean inferred alliance scores between patients and therapists, as well as variations across different psychiatric disorders. However, the analysis of the in-session evolution of the working alliance scores reveals more interesting dynamics.

In particular, we find that while all conditions show a systematic misalignment of scores between patients and therapists, this misalignment is significantly more pronounced for suicidality. This observation is evident in both the mean scores and the temporal trace of the full and sub-scales. In contrast, anxiety and depression display a clear trend for convergence in the full and bond scales as the therapy sessions progress, which is not observed in the task and goal scales, nor in schizophrenia or suicidality.

analysis of past therapy sessions, as well as real-time sessions, has the potential to help trained therapists identify key segments of therapy leading to breakthroughs

By mapping the topic scores of dialogue turns to the working alliance scores, we can identify topics and dialogue segments that are potentially indicative of therapeutic breakthroughs. For instance, in depression sessions, the topic related to self-esteem may be associated with improvements in the bond and task scales of the working alliance. Similarly, in schizophrenia sessions, the topic related to family dynamics may contribute to positive changes in the bond scale.