Conversational Semantic Parsing for Dialog State Tracking

Paper · arXiv 2010.12770 · Published October 24, 2020

We consider a new perspective on dialog state tracking (DST), the task of estimating a user’s goal through the course of a dialog. By formulating DST as a semantic parsing task over hierarchical representations, we can incorporate semantic compositionality, cross domain knowledge sharing and co-reference. We present TreeDST, a dataset of 27k conversations annotated with tree-structured dialog states and system acts.1 We describe an encoder-decoder framework for DST with hierarchical representations, which leads to 20% improvement over state-of-the-art DST approaches that operate on a flat meaning space of slot-value pairs.

Language understanding for task-based dialog is often termed “dialog state tracking” (DST) (Williams et al., 2016), the mental model being that the intent of the user is a partially-observed state that must be re-estimated at every turn given new information. The dialog state is typically modelled as a set of independent slots, and a standard DST system will maintain a distribution over values for each slot. In contrast, language understanding for other NLP applications is often formulated as semantic parsing, which is the task of converting a single-turn utterance to a graph structured meaning representation. Such meaning representations include logical forms, database queries and other programming languages.

These two perspectives on language understanding—DST and semantic parsing— have complementary strengths and weaknesses. DST targets a fuller range of conversational dynamics but typically uses a simple and limiting meaning representation. Semantic parsing embraces a compositional view of meaning. By basing meaning on a space of combinable, reusable parts, compositionality can make the NLU problem space more tractable (repeated concepts must only be learned once) and more expressive (it becomes possible to represent nested intents). At the same time, most semantic parsing research treats a sentence as an isolated observation, detached from conversational context.

This work unifies the two perspectives by reformulating DST as conversational semantic parsing. As in DST, the task is to track a user’s goal as it accumulates over the course of a conversation. The goal is represented using a structured formalism like those used in semantic parsing. Specifically, we adopt a hierarchical representation which captures domains, verbs, operators and slots within a rooted graph grounded to an ontology. The structured dialog state is capable of tracking nested intents and representing compositions in a single graph (Turn 5 Table 1).