Proactive Conversational Agents in the Post-ChatGPT World

Paper · Source

Although astonished by their human-like performance, we find they share a significant weakness with many other existing conversational agents in that they all take a passive approach in responding to user queries. This limits their capacity to understand the users and the task better and to offer recommendations based on a broader context than a given conversation. Proactiveness is still missing in these agents, including their ability to initiate a conversation, shift topics, or offer recommendations that take into account a more extensive context. To address this limitation, this tutorial reviews methods for equipping conversational agents with proactive interaction abilities.

The development of conversational agents that can comprehend human language and provide appropriate responses has long been a desired goal of Artificial Intelligence (AI). These agents can be broadly classified into two categories: (1) Chit-chat systems, which aim to engage users and offer emotional support by engaging in open-ended discussions on various topics, and (2) Task-oriented dialogue systems, which assist users in accomplishing specific tasks. Many commercial personal assistants, including Amazon Alexa, Apple Siri, Google Home, and large language model (LLM)-enabled Microsoft Copilot, fall under the task-oriented category. These systems are primarily designed to comprehend natural language verbal commands, interpret them, and translate them into actions to be executed by underlying application systems.

Recently, ChatGPT [1] and similar LLM-based conversational agents have brought shock waves to the research community and to the world. Astonished by their human-level performances, we notice that they share a significant weakness with most other existing conversational agents in that they all take a passive approach in responding to user queries. Their main research efforts are still on performing pre-defined actions or providing factual information in response to user commands or queries. This limits their capacity to understand the users and the task better and to offer recommendations based on a broader context than a given conversation. The missing proactiveness includes lacking abilities to initiate a conversation, shift topics, strategic plan with subgoals, or offer recommendations that take into account a more extensive context beyond the scope of a specific conversation.

Moreover, despite being widely adopted and receiving tremendous attention, most current conversational agents, including LLM-enabled ones, heavily rely on pre-existing training conversations, datasets, and knowledge associated with them in order to exchange information, provide recommendations, and complete tasks [1, 24, 26, 32, 53]. They typically generate responses to questions in a passive manner rather than leading the conversation or asking questions themselves [9, 45, 46]. This reactive, passive approach to conversation limits the range of conversations that can take place, particularly in situations that require active engagement from both sides, such as exploratory search or complex decision-making. In recent years, researchers from multiple fields, including natural language processing, dialogue systems, and machine learning [12, 16, 19, 33, 52, 54], have been working towards the goal of enabling conversational agents to engage in two-way, proactive conversations with users [28, 30, 47, 51, 55]. They have proposed

various approaches to address this issue, such as:

• Learning to ask [5, 13, 22, 25, 36, 44, 50, 56, 57]

• Topic shifting [27, 39, 49]

• Strategy planning with reinforcement Learning, counterfactual dialogue act, and label generation [2, 8, 22, 29, 40, 42].