Working Alliance Transformer for Psychotherapy Dialogue Classification

Paper · arXiv 2210.15603 · Published October 27, 2022
Psychology Chatbots ConversationEmotionsPsychology Therapy Practice

As a predictive measure of the treatment outcome in psychotherapy, the working alliance measures the agreement of the patient and the therapist in terms of their bond, task and goal. Long been a clinical quantity estimated by the patients’ and therapists’ self-evaluative reports, we believe that the working alliance can be better characterized using natural language processing technique directly in the dialogue transcribed in each therapy session. In this work, we propose the Working Alliance Transformer (WAT), a Transformer-based classification model that has a psychological state encoder which infers the working alliance scores by projecting the embedding of the dialogues turns onto the embedding space of the clinical inventory for working alliance.

The alliance entails a number of cognitive and emotional aspects of the interaction between these two agents, such as their shared understanding of the objectives to be attained and the tasks to be completed, as well as the bond, trust, and respect that will develop during the course of the therapy. While traditional methods to quantify the alliance depend on self-evaluative reports with point-scales valuation by patients and therapists about whole sessions [3], the digital era of mental health can enable new research fronts utilizing real-time transcripts of the dialogues between the patients and therapists. By analyzing the psychotherapy dialogues, we are interested in studying the usage of natural language processing technique to extract out turn-level features of the working alliance and see if it can help better inform us of the clinical condition of the patient.

patients’ turns are usually more narrative, as they are describing themselves, while the therapists’ turns are usually more declarative, as they are usually confirming the patients, or leading the conversations to a certain topic

After computing the information regarding the predicted clinical outcome with our inferred working alliance scores, this feature vector highlights a bias towards what the clinicians would care about in the psychotherapy given the metrics provided by the working alliance inventory. We would then able to further use this information to potentially inform us of the psychiatric condition of a given patient. As such, we propose the Working Alliance Transformer (WAT), a classification model that utilizes an inference module that informs the downstream classifier where the current state is with respect to the therapeutic trajectory or landscape in the psychotherapy treatment of this patient. Is this patients approaching a breakthrough? Or is he or she susceptible to a rupture of trust?

The analytical features enabled by the working alliance inference are not only useful for the classification we investigate in this study but also other downstream tasks, such as predictive modeling and real-time analytics. In our case, the turns in a dialogue or monologue are fed into the sentence embedding sequentially as individual entries. And then, given the sentence embedding, we feed them each into the psychological state encoder that infer the psychological or therapeutic state of the dialogue at this turn. The encoder will generate a vector that characterizes the state, such as the 36- dimension working alliance scores, corresponding to the 36 working alliance inventory items. Then, the model aggregate both the sentence embedding feature vector and the psychological state vector. Since we feed our input sentence by sentence (or turn by turn), we have a sequence of combined feature vector, which is then fed into a sequence classifier. We use the transformer [11] as our classifier for its effectiveness in various sequence-based learning tasks, and potential interpretability from its attention weights. The output of this classification model is the predicted clinical condition of this sequence. The sequence of turns we feed to generate a label is typically either the entirety or a segment of a session of psychotherapy transcript.