Psychotherapy AI Companion with Reinforcement Learning Recommendations and Interpretable Policy Dynamics

Paper · arXiv 2303.09601 · Published March 16, 2023

We introduce a Reinforcement Learning Psychotherapy AI Companion that generates topic recommendations for therapists based on patient responses. The system uses Deep Reinforcement Learning (DRL) to generate multi-objective policies for four different psychiatric conditions: anxiety, depression, schizophrenia, and suicidal cases. We present our experimental results on the accuracy of recommended topics using three different scales of working alliance ratings: task, bond, and goal. We show that the system is able to capture the real data (historical topics discussed by the therapists) relatively well, and that the best performing models vary by disorder and rating scale.

To address this gap, we propose a virtual psychotherapy AI companion that provides real-time feedback and recommends treatment strategies to therapists while they are conducting psychotherapy. Like a supervisor, our AI companion offers feedback and guidance that are case-dependent and has learned from thousands of historical therapy sessions and case studies.

The base of our recommendation system relies on a rating system that evaluates the effectiveness of a treatment strategy. As characterizing a patient’s mental state can be complicated, we focus our approach on well-defined clinical outcomes. One such outcome is the working alliance, a psychological concept that is highly predictive of the success of psychotherapy in a clinical setting [46]. It describes important cognitive and emotional components of the relationship between the therapist and patient, including the agreement on goals and tasks and the establishment of a bond, trust, and respect over the course of the dialogue [4]. In our previous work [26], we developed a natural language processing (NLP) approach to infer this quantity as real-time ratings of the therapist’s treatment progress within the patient’s entire program.

We also proposed the Reinforced Recommendation model for Dialogue topics in psychiatric Disorders (R2D2) [24, 27], which is the first-ever recommendation system of dialogue topics proposed for the psychotherapy setting. R2D2 transcribes the session in realtime, predicts the therapeutic outcome as a turn-level rating, and recommends a treatment strategy that is best suited for the current context and state of the psychotherapy. This framework is a critical step towards addressing the global issue of mental health by augmenting the treatment and education of clinical practitioners with a recommendation system of therapeutic strategies.

Given a large text corpus of many psychotherapy sessions, we can first perform topic modeling to extract the main concepts discussed in the psychotherapy [23], which can also be directly visualized for interpretable insights [31]. We use the Embedded Topic Model (ETM) [5], which was shown to create the most diverse concepts in psychological corpus as in this systematic analysis [23]. In this study, we annotate each turn with their most likely topic and identify seven unique topics: Topic 0 is about figuring out, self-discovery and reminiscence; Topic 1 is about play; Topic 2 is about anger, scare and sadness; Topic 3 is about counts; Topic 6 is about explicit ways to deal with stress, such as keeping busy and reaching out for help; Topic 7 is about numbers, and Topic 8 is about continuation.

We define the “items”, “users”, “contents” and “ratings” in our recommendation system. In our case, the “items” the system recommends are treatment strategies, which we represent as a topic that the therapist should initiate or continue for the next turn.

We pair these “items” with the “users” and “contents”, which, in our case, would be the patientID, their previous turns, their aggregated formats, and other metadata.

shown in Figure 2, our recommendation systems can be extended into three levels. The first level, our backbone, is reinforcement learning-based, which considers the stateful nature of dialogue data. The flexibility of reward signals, i.e., using any rewards, pseudo rewards, multiple rewards, hybrid rewards, or even inferred rewards, makes our policies adaptable to a versatile suite of clinical settings.

The second level is to use additional context, as in content-based recommendation systems. This involves treating the patient turns before the current turns, or all the previous turns up to now, as a feature in our deep reinforcement learning models, by concatenating their sentence embeddings to our states. This provides more context for in-context learning of our generalized models, which can be a foundation model in future work.

In the third level, if we are given the patient ID and doctor ID, we can create personalized policies with collaborative filtering type recommendation systems, which can potentially improve the compositionality and generalizability of our models for a wide range of populations.

For certain disorders, goal scale and task scale appeared to best capture the human therapists’ choices, while other ones favored the models trained with bond scores.

Moving forward, we plan to extend our work in several directions. First, we aim to explore the use of more advanced reinforcement learning algorithms, such as actor-critic methods or proximal policy optimization, and compare their performance to the methods used in this study. We also plan to investigate the use of more sophisticated embeddings, such as contextual embeddings or knowledge graph embeddings, to further improve the quality of the recommendations. In addition, we will explore the use of user feedback to refine the recommendations and personalize them for individual patients.