Task-Oriented Dialogue as Dataflow Synthesis

Paper · arXiv 2009.11423 · Published September 24, 2020

We describe an approach to task-oriented dialogue in which dialogue state is represented as a dataflow graph. A dialogue agent maps each user utterance to a program that extends this graph. Programs include metacomputation operators for reference and revision that reuse dataflow fragments from previous turns. Our graph-based state enables the expression and manipulation of complex user intents, and explicit metacomputation makes these intents easier for learned models to predict. We introduce a new dataset, SMCalFlow, featuring complex dialogues about events, weather, places, and people. Experiments show that dataflow graphs and metacomputation substantially improve representability and predictability in these natural dialogues. Additional experiments on the MultiWOZ dataset show that our dataflow representation enables an otherwise off the- shelf sequence-to-sequence model to match the best existing task-specific state tracking model.

Two central design decisions in modern conversational AI systems are the choices of state and action representations, which determine the scope of possible user requests and agent behaviors. Dialogue systems with fixed symbolic state representations (like slot filling systems) are easy to train but hard to extend (Pieraccini et al., 1992). Deep continuous state representations are flexible enough to represent arbitrary properties of the dialogue history, but so unconstrained that training a neural dialogue policy “end-to-end” fails to learn appropriate latent states (Bordes et al., 2016).

This paper introduces a new framework for dialogue modeling that aims to combine the strengths of both approaches: structured enough to enable efficient learning, yet flexible enough to support open-ended, compositional user goals that involve multiple tasks and domains. The framework has two components: a new state representation in which dialogue states are represented as dataflow graphs; and a new agent architecture in which dialogue agents predict compositional programs that extend these graphs. Over the course of a dialogue, a growing dataflow graph serves as a record of common ground: an executable description of the entities that were mentioned and the actions and computations that produced them (Figure 1).

While this paper mostly focuses on representational questions, learning is a central motivation for our approach. Learning to interpret natural language requests is simpler when they are understood to specify graph-building operations. Human speakers avoid repeating themselves in conversation by using anaphora, ellipsis, and bridging to build on shared context (Mitkov, 2014). Our framework treats these constructions by translating them into explicit metacomputation operators for reference and revision, which directly retrieve fragments of the dataflow graph that represents the shared dialogue state. This approach borrows from corresponding ideas in the literature on program transformation (Visser, 2001) and results in compact, predictable programs whose structure closely mirrors user utterances.

Experiments show that our rich dialogue state representation makes it possible to build better dialogue agents for challenging tasks. First, we release a newly collected dataset of around 40K natural dialogues in English about calendars, locations, people, and weather—the largest goal oriented dialogue dataset to date. Each dialogue turn is annotated with a program implementing the user request. Many turns involve more challenging predictions than traditional slot-filling, with compositional actions, cross-domain interaction, complex anaphora, and exception handling (Figure 2).

We model a dialogue between a (human) user and an (automated) agent as an interactive programming task where the human and computer communicate using natural language. Dialogue state is represented with a dataflow graph. At each turn, the agent’s goal is to translate the most recent user utterance into a program. Predicted programs nondestructively extend the dataflow graph, construct any newly requested values or real-world side-effects, and finally describe the results to the user. Our approach is significantly different from a conventional dialogue system pipeline, which has separate modules for language understanding, dialogue state tracking, and dialogue policy execution (Young et al., 2013).

Programs, graphs, and evaluation The simplest example of interactive program synthesis is

question answering:

User: When is the next retreat?

start(findEvent(EventSpec(

name='retreat',

start=after(now()))))

Agent: It starts on April 27 at 9 am.

We have presented a representational framework for task-oriented dialogue modeling based on dataflow graphs, in which dialogue agents predict a sequence of compositional updates to a graphical state representation. This approach makes it possible to represent and learn from complex, natural dialogues. Future work might focus on improving prediction by introducing learned implementations of refer and revise that, along with the program predictor itself, could evaluate their hypotheses for syntactic, semantic, and pragmatic plausibility. The representational framework could itself be extended, e.g., by supporting declarative user goals and preferences that persist across utterances. We hope that the rich representations presented here—as well as our new dataset—will facilitate greater use of context and compositionality in learned models for task-oriented dialogue.