SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching

Paper · Source

We present a new method, SOLOIST,1 that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-based auto-regressive language model, which subsumes different dialog modules into a single neural model. We pre-train, on heterogeneous dialog corpora, a task-grounded response generation model, which can generate dialog responses grounded in user goals and real-world knowledge for task completion. The pre-trained model can be efficiently adapted to accomplish new tasks with a handful of task-specific dialogs via machine teaching, where training samples are generated by human teachers interacting with the system.

The increasing use of personal assistants and messaging applications has spurred interest in building task-oriented dialog systems (or task bots) that can communicate with users through natural language to accomplish a wide range of tasks, such as restaurant booking, weather query, flight booking, IT helpdesk (e.g., Zhou et al., 2020; Adiwardana et al., 2020; Roller et al., 2020b; Gao et al., 2020; Peng et al., 2020a). The wide variety of tasks and domains has created the need for a flexible task-oriented dialog development platform that can support many different use cases while remaining straightforward for developers to use and maintain.

A typical task-oriented dialog system uses a modular pipeline, which has four modules and executes sequentially (Young et al., 2013; Gao et al., 2019a), as shown in Figure 1(a). A natural language understanding (NLU) module identifies user intents and extracts associated information such as slots and their values from users’ input. A dialog state tracker (DST) infers the belief state (or user goal) from dialog history. The belief state is often used to query a task-specific database (DB) to obtain the DB state, such as the number of entities that match the user goal. The dialog state and DB state are then passed to a dialog policy (POL) to select the next system action. A natural language generation (NLG) module converts the action to a natural language response.