Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models

Paper · arXiv 2402.14200 · Published February 22, 2024

This paper proposes a systematic approach to examine the efficacy of domain knowledge and large language models (LLMs) in better representing conversations between a crisis counselor and a help seeker. We empirically show that state-of-the-art language models such as Transformer-based models and GPT models fail to predict the conversation outcome. To provide richer context to conversations, we incorporate human-annotated domain knowledge and LLM-generated features; simple integration of domain knowledge and LLM features improves the model performance by approximately 15%. We argue that both domain knowledge and LLM-generated features can be exploited to better characterize counseling conversations when they are used as an additional context to conversations.

We hypothesize that the current state-of-the-art language models contain insufficient knowledge of the counseling domain in their parameters. Motivated by existing works using external knowledge for solving tasks such as question answering (Ma et al., 2022), commonsense reasoning (Schick et al., 2023), and language generation (Peng et al., 2023), this paper studies whether additional knowledge helps characterize counseling conversations. We suggest two different ways of obtaining this additional knowledge: human annotation and large language model (LLM) prompting.

In this paper, we measure the level of understanding counseling conversations by predicting conversation outcomes, i.e., whether the help seeker would feel more positive after the conversation or not. We empirically show that Transformer-based classifiers as well as state-of-the-art LLMs exhibit sub-optimal performances despite their strong ability on many downstream tasks. The paper then describes how domain knowledge is obtained in order to further emphasize the counselor’s strategic utterances and the help seeker’s perspectives. We show that the additional knowledge helps pre-trained language models better fit the dataset and perform well in predicting the conversation outcomes—simple integration of the knowledge and feature ensembling improves the model performance by approximately 15%.

Thus in this paper, we use a more easy-to-understand feature to define the level of understanding. We choose the help seeker’s post-conversation survey answer to a question, “Do you feel more positive after this conversation?”, as an output of each conversation instance. We train the model to solve a classification task to predict whether the help seeker has become more positive after having a conversation session.

Regardless of a simple classification pipeline, this is a challenging NLP task as it requires models to understand the context of a conversation session and to read between the lines to assess the help seekers’ feelings throughout the conversation. The help seeker’s perspectives on the counseling session can be affected by many factors such as their situations, needs, the type of abuse, the counselor’s tone, rapport-building strategies, the solutions suggested by the counselor, etc. Moreover, help seekers rarely express their negative emotions about how the counselor is doing during the conversation (e.g. “You are not helping.”). In most cases, the help seekers rather show their gratitude to the counselor as a courtesy (e.g. “Thanks for the help.”), yet respond to the post-conversation survey that they don’t feel more positive after the conversation. Thus the models need to analyze not only the direct meanings of what help seekers say, but also identify different aspects such as whether the help seekers’ needs are met, if the solutions are specific to the help seekers’ situations, whether the counselors express their empathy, etc.