A Survey on Proactive Dialogue Systems: Problems, Methods, and Prospects

Paper · arXiv 2305.02750 · Published May 4, 2023
Conversation Architecture StructureConversation Topics Dialog

Proactive dialogue systems, related to a wide range of real-world conversational applications, equip the conversational agent with the capability of leading the conversation direction towards achieving predefined targets or fulfilling certain goals from the system side. It is empowered by advanced techniques to progress to more complicated tasks that require strategical and motivational interactions. In this survey, we provide a comprehensive overview of the prominent problems and advanced designs for conversational agent’s proactivity in different types of dialogues.

Despite the extensive studies, most dialogue systems typically overlook the design of an essential property in intelligent conversations, i.e., proactivity. Derived from the definition of proactivity in organizational behaviors [Grant and Ashford, 2008] as well as its dictionary definition, conversational agents’ proactivity can be defined as the capability to create or control the conversation by taking the initiative and anticipating the impacts on themselves or human users, rather than only passively responding to the users. It will not only largely improve user engagement and service efficiency, but also empower the system to handle more complicated tasks that involve strategical and motivational interactions.

In this survey, we provide a comprehensive review of such efforts that span various task formulations and application scenarios

three common types of dialogues, namely open-domain dialogues, task-oriented dialogues, and information-seeking dialogues.

As the examples illustrated in Figure 2, target-guided dialogues involve the agent leading discussions towards designated target topics (e.g., Music to K-Pop to Blackpink), while prosocial dialogues entrust the agent with constructively guiding conversations according to social norms in response to problematic user utterances (e.g., the cheating intention).

potential research prospects for future studies. (1) Proactivity in Hybrid Dialogues: Hybrid dialogues are the most realistic simulation of interactions between human users and systems, as they incorporate a variety of conversational objectives, instead of focusing a single type of dialogues. Despite the importance of agent’s proactivity in hybrid dialogues, only a few recent studies investigate this critical design. (2) Evaluation Protocols for Proactivity: Compared with general evaluation protocols for dialogue systems, it additionally relies on other disciplines, such as psychology or sociology. Despite this complexity, developing robust and effective evaluation metrics remains critical for advancing techniques in proactive dialogue systems. (3) Ethics of Conversational Agent’s Proactivity: The designs of proactivity in dialogue systems may walk a precarious line between the benefit to human-AI interactions and the potential harm to the human users.

system needs to produce multiple turns of responses fung to lead the conversation towards the target in the end. The produced responses should satisfy (i) transition smoothness, natural and appropriate content under the given dialogue history, and (ii) target achievement, driving the conversation to reach the designated target.

target can be a topical keyword [Tang et al., 2019], a knowledge entity [Wu et al., 2019], a conversational goal [Liu et al., 2020], etc. A candidate target set is maintained by the dialogue system.

There are three main subtasks in target-guided dialogue systems, including topic-shift detection, topic planning, and topic-aware response generation.

There are three main subtasks in target-guided dialogue systems, including topic-shift detection, topic planning, and topic-aware response generation.

• Topic-shift Detection aims to promptly discover the topic drift in user utterances. Rachna et al. [2021] fine-tune XLNet-base to classify the utterances into major, minor and off topics. Xie et al. [2021] construct TIAGE for topic shift dialogue modeling by augmenting the PersonaChat dataset [Zhang et al., 2018a] with topic-shift annotations, and propose a T5-based topic-shift manager, namely TSMANAGER, to predict the occurrence of topic shifts.

• Topic Planning, which enables the conversation to follow an expected direction, is the core problem in target-guided dialogue systems. Several discourse-level target-guided strategies [Tang et al., 2019; Zhong et al., 2021] constrained on keyword transitions are proposed to proactively drive the conversation topic towards the target. Due to the loose topic-connectivity between keywords, event knowledge graphs are constructed to enhance the coherency in the topic planning [Xu et al., 2020]. However, the knowledge provided in the dialogues is limited for planning a robust and reasonable topic path towards the target. Therefore, latest studies [Yang et al., 2022] leverage external knowledge graphs for improving the quality of topic transitions with graph reasoning techniques. Instead of corpus-based learning, Lei et al. [2022] propose to learn the topic transition from the interactions with users.

• Topic-aware Response Generation aims to produce topic related responses for leading the conversation towards the target. Kishinami et al. [2022] propose to generate a complete responding plan that can lead a conversation to the given target. Gupta et al. [2022] leverage a bridging path of commonsense knowledge concepts between the current and target topics to generate transition responses.

The problem formulation of enriched TODs exactly follows that of general TODs, where the difference is that the generated responses in enriched TODs should be not only functionally accurate but also socially engaging. For instance, Sun et al. [2021] construct the ACCENTOR dataset by adding topical chit-chats into the responses for TODs to make the interactions more engaging and interactive. An end-to-end TOD method, SimpleTOD [Hosseini-Asl et al., 2020], is extended to be SimpleTOD+ for handling enriched TODs, which introduces a new dialogue action, i.e., chit-chat and is further trained on chit-chat generation data. Similarly, [Zhao et al., 2022] develop an end-to-end method,

The goal of conversational information-seeking (CIS) systems is to fulfill the user’s information needs. The typical applications include conversational search, conversational recommendation, and conversational question answering. Conventional CIS systems passively respond to user queries, which may fall short of performing complicated information seeks. Recent years have witnessed several advances on developing proactive CIS systems that can further eliminate the uncertainty for more efficient and precise information seeks by initiating a subdialogue. Such a subdialogue can either clarify the ambiguity of the query or question in conversational search [Aliannejadi et al., 2021] and conversation question answering [Guo et al., 2021], or elicit the user preference in conversational recommendation [Zhang et al., 2018b].

Asking clarification questions aims to clarify the potential ambiguity in the user query, since the user query is often succinct and brief in real-world conversational search and question answering. The problem is formally formulated by two subtasks [Aliannejadi et al., 2021]: clarification need prediction and clarification question generation. Clarification need predication is typically viewed as a binary classification problem for predicting whether the user query is ambiguous. If needed, clarification questions can be either selected from a question bank [Aliannejadi et al., 2019] or generated on the fly [Zamani et al., 2020].

Specifically, Aliannejadi et al. [2019] propose a question retrieval-selection pipeline, namely NeuQS, to first retrieve top k questions from the question bank and then select the most appropriate question by reranking via BERTbased models. Zamani et al. [2020] develop a reinforcement learning based method, namely QCM, to generate clarifying questions by maximizing a clarification utility function. However, these works only focus on the subtask of clarification question generation, while the clarification need prediction is equally important in asking clarification questions. To this end, Aliannejadi et al. [2021] and Guo et al. [2021] present complete pipeline-based systems for asking clarification questions, which adopt a binary classification model to predict the clarification need label first and then perform clarification question generation. Furthermore, Deng et al. [2022a] propose an end-to-end framework, namely UniPCQA, which leverages a unified sequence-to-sequence formulation to tackle three tasks in one model, including clarification need prediction, clarification question generation, and conversational question answering.

Instead of simply learning user preference from the dialogue context [Li et al., 2018], Zhang et al. [2018b] propose a proactive paradigm, namely “System Ask, User Respond”, to explicitly acquire user preference via asking questions in conversational recommendation. The problem is formulated as predicting the item attribute for eliciting user preferences at the next turn, e.g., “Which brand of laptop do you prefer?”. A personalized multi-memory network (PMMN) [Zhang et al., 2018b] is first designed to incorporate user embeddings into next question prediction at turn-level. Due to the complexity of user preferences, multiple turns of question asking are required. Therefore, recent works tackle the user preference elicitation at dialogue-level, i.e., “what questions to ask”, as a multi-step decision making process by reinforcement learning (RL) [Deng et al., 2021; Zhang et al., 2022]. Deng et al. [2021] propose a graph-based RL framework for policy learning, namely UNICORN, which models realtime user preference during the conversation with a dynamic weighted graph structure. Motivated by the complex user interests in CRS, Zhang et al. [2022] propose the MCMIPL framework to efficiently obtain user preferences by asking multi-choice questions.

All the aforementioned conversational systems assume that users always have a clear conversational goal and the system also solely targets at reaching a certain goal, such as chit-chat, question answering, recommendation, etc. The system with a higher level of agent’s proactivity should also be capable of handling conversations with multiple and various goals. Recently, many efforts have been made on constructing valuable data resources for hybrid dialogue systems with multiple conversational goals, such as DuRecDial [Liu et al., 2020], FusedChat [Young et al., 2022], SalesBot [Chiu et al., 2022], and OB-MultiWOZ [Li et al., 2022b]. Early studies simply tackle this problem similar to topic-guided response generation with pre-defined goals [Zhang et al., 2021]. While some latest works [Liu et al., 2022; Deng et al., 2022b] argue the necessity of proactively discovering users’ interests and naturally leading user-engaged dialogues with changing conversational goals. In practice, hybrid dialogue systems are the closest form of real-world applications. More efforts should be made to ensure natural and smooth transitions among different types of dialogues as well as improve the overall dialogue quality