PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking

Paper · arXiv 1909.01296 · Published September 3, 2019

We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-oriented dialogue systems and the use of explicit semantics in the form of task-specific ontologies. The PolyResponse engine is trained on hundreds of millions of examples extracted from real conversations: it learns what responses are appropriate in different conversational contexts. It then ranks a large index of text and visual responses according to their similarity to the given context, and narrows down the list of relevant entities during the multi-turn conversation. We introduce a restaurant search and booking system powered by the PolyResponse engine, currently available in 8 different languages.

These systems are typically constructed around rigid task-specific ontologies (Henderson et al., 2014; Mrkˇsi´c et al., 2015) which enumerate the constraints the users can express using a collection of slots (e.g., PRICE RANGE for restaurant search) and their slot values (e.g., CHEAP, EXPENSIVE for the aforementioned slots). Conversations are then modelled as a sequence of actions that constrain slots to particular values. This explicit semantic space is manually engineered by the system designer. It serves as the output of the natural language understanding component as well as the input to the language generation component both in traditional modular systems (Young, 2010; Eric et al., 2017)

Working with such explicit semantics for task oriented dialogue systems poses several critical challenges on top of the manual time-consuming domain ontology design. First, it is difficult to collect domain-specific data labelled with explicit semantic representations. As a consequence, despite recent data collection efforts to enable training of task-oriented systems across multiple domains (El Asri et al., 2017; Budzianowski et al., 2018), annotated datasets are still few and far between, as well as limited in size and the number of domains covered.1 Second, the current approach constrains the types of dialogue the system can support, resulting in artificial conversations, and breakdowns when the user does not understand what the system can and cannot support. In other words, training a task-based dialogue system for voice-controlled search in a new domain always implies the complex, expensive, and time-consuming process of collecting and annotating sufficient amounts of indomain dialogue data.

In this paper, we present a demo system based on an alternative approach to task-oriented dialogue. Relying on non-generative response retrieval we describe the PolyResponse conversational search engine and its application in the task of restaurant search and booking. The engine is trained on hundreds of millions of real conversations from a general domain (i.e., Reddit), using an implicit representation of semantics that directly optimizes the task at hand. It learns what responses are appropriate in different conversational contexts, and consequently ranks a large pool of responses according to their relevance to the current user utterance and previous dialogue history (i.e., dialogue context). The technical aspects of the underlying conversational

In simple words, it is a ranking model that learns to score conversational replies and images in a given conversational context.