Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

Paper · arXiv 2308.08169 · Published August 16, 2023
RAGNatural Language Inference

“Task-oriented dialogue (TOD) systems play an important role in various applications, such as restaurant booking, alarm setting, and recommendations (Gao et al., 2018; Xie et al., 2022). These systems can be broadly categorized into two groups: pipeline-based dialogue systems and end-to-end dialogue systems. Pipeline-based dialogue systems consist of four separate modules: a natural language understanding (NLU) module to detect user intents, a dialogue state tracking (DST) module to track user belief states across dialogue turns, a dialogue management (DM) module to decide system actions based on dialogue states, and a natural language generation (NLG) module to generate natural-language responses. However, the pipeline based approach is annotation-intensive, prone to error propagation, and challenging to scale (Hosseini- Asl et al., 2020; Zhang et al., 2020; Feng et al., 2023).

Simultaneously, research on open-domain question answering and dialogue systems has explored the use of retrieval-augmented models. These models retrieve relevant information from a passage, database, APIs, etc., and incorporate it into the generation process, improving answer quality or dialogue responses (Karpukhin et al., 2020; Izacard and Grave, 2021; Dinan et al., 2018; Lewis et al., 2020b; Shuster et al., 2021). Inspired by these ideas, we combine both worlds and propose an end-to-end TOD framework with a retrieval system that addresses the challenge of handling both existing and zero-shot unseen dialogue scenarios.”