Target-Guided Open-Domain Conversation

Paper · arXiv 1905.11553 · Published May 28, 2019

Many real-world open-domain conversation applications have specific goals to achieve during open-ended chats, such as recommendation, psychotherapy, education, etc. We study the problem of imposing conversational goals on open-domain chat agents. In particular, we want a conversational system to chat naturally with human and proactively guide the conversation to a designated target subject. The problem is challenging as no public data is available for learning such a target-guided strategy. We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses. We then attain smooth conversation transition through turn-level supervised learning, and drive the conversation towards the target with discourse-level constraints.

This paper makes a step towards open-domain dialogue agents with conversational goals. In particular, we want the system to chat naturally with humans on open domain topics and proactively guide the conversation to a designated target subject. For example, in Figure 1, given a target e-books and an arbitrary starting topic such as tired, the agent drives the conversation in a natural way following a high-level logical backbone, and effectively reaches the target in the end. Such a target-guided conversation setup is general purpose and can entail a large variety of practical applications as above. The above problem is difficult in that the agent has to balance well between chatting naturally and achieving the target; and moreover, to the best of our knowledge, there is no public dataset available for learning target guided dialogue.

This paper proposes a solution to the task. We decouple the whole system into separate modules and address the challenges at different granularity. Specifically, we explicitly model and control the intended content of each system response by introducing coarse-grained utterance keywords. We then impose a discourse-level rule that encourages the keywords to approach the end target during the course of the conversation; and we attain smooth conversation transition at each dialogue turn through turn-level supervised learning.

The agent then produces an utterance xn+1 as a response, aiming to satisfy (1) transition smoothness by making the response natural and appropriate in the current conversation context, and (2) target achievement by driving the conversation to reach the designated target. Specifically, we consider a target is achieved when either the human or the agent mentions the target or similar word in an utterance— such a definition is simple and allows easy measurement of the success rate.

given a target cat and conversation history fHuman: I went to a movie.g, a response like Do you like cat? is typically not a smooth transition,