SupervisorBot: NLP-Annotated Real-Time Recommendations of Psychotherapy Treatment Strategies with Deep Reinforcement Learning

Paper · arXiv 2208.13077 · Published August 27, 2022

The base of our recommendation system relies on a rating system that evaluates how good a treatment strategy is.

Here we propose the Reinforced Recommendation model for Dialogue topics in psychiatric Disorders (R2D2), a the first ever recommendation system of dialogue topics proposed for the psychotherapy setting. It transcribes the session in real-time, predicts the therapeutic outcome as a turn-level rating, and recommends treatment strategy that is best for the current context and state of the psychotherapy.

In this system, we use the Working Alliance Inventory (WAI), a set of self-report measurement questionnaire that quantifies the therapeutic bond, task agreement, and goal agreement [11, 12, 13]. Operationally, our goal is to derive from these 36 items three alliance scales: the task scale, the bond scale and the goal scale. They measures the three major themes of psychotherapy outcomes: (1) the collaborative nature of the dialogue participants’ relationship; (2) the affective bond between them, and (3) their capabilities to agree on treatment related short-term tasks and long-term goals. The score corresponding to the three scales comes from a key table which specifies the positivity or the sign weight to be applied on the questionnaire answer when summing in the end.

Following the approach proposed in [7, 15, 16, 17], we embed both the dialogue turns and WAI items with deep sentence or paragraph embeddings (in this case, Doc2Vec [18]), and then compute the cosine similarity between the embedding vectors of the turn and its corresponding inventory vectors. With that, for each turn (either by patient or by therapist), we obtain a 36-dimension working alliance score, which we may save in a relational database as in [19].

Topic modeling as recommendation items. First, we define the “items”, “users”, “contents” and “ratings” in our recommendation system. Here, the “items” the system recommends are treatment strategies. In this example, we represents these strategies as a topic that the therapist should initiate or continue for the next turn. Given a large text corpus of many psychotherapy sessions, as in [20] we can first perform topic modeling to extract the main concepts discussed in the psychotherapy. We use the Embedded Topic Model (ETM) [21] in this work because it was shown to create the most diverse concepts in psychological corpus [20]. In this study, we use annotate each turn with their most likely topic and identifies seven unique topics (Topic 0 is about figuring out, self-discovery and reminiscence; Topic 1 is about play. Topic 2 is about anger, scare and sadness. Topic 3 is about counts. Topic 6 is about explicit ways to deal with stress, such as keep busying and reaching out for help. Topic 7 is about numbers. Topic 8 is about continuation and keep doing.)

reinforcement learning environment is formulated such that the recommendation agent takes an action by recommending a strategy (say, a discussion topic). And the therapist will interact with the patient taking that suggestion into account. The dialogue interaction, in turn, has a quality evaluation of some sort (say, the therapeutic working alliance score). This serves as a reward to the recommendation agent to update its weights. In the meanwhile, the state is progressed to the next therapeutic states.