Intent-calibrated Self-training for Answer Selection in Open-domain Dialogues

Paper · arXiv 2307.06703 · Published July 13, 2023

we focus on selecting answers, which aims to identify the correct answer from a pool of candidates given a dialogue context. Typically, there are two main branches of approaches to produce answers, i.e., generation-based methods and selection-based methods (Park et al., 2022). The former generate a response token by token; and the latter select a response from a pool of candidates.

Figure 1 illustrates our idea by comparing the answer selection paradigms of (a) context-aware methods, (b) intent-aware methods, and (c) intent calibrated methods. Context-aware methods (See Figure 1 (a)) capture the context of the ongoing dialogue for understanding users’ information needs to select the most relevant responses from answer candidates (Jeong et al., 2021). Unlike task-oriented dialogue systems, it is much more challenging for ODSs to infer users’ information needs due to their open-ended goals (Huang et al., 2020).

To this end, user intents, i.e., a taxonomy of utterances, are introduced to guide the information seeking process (Qu et al., 2018, 2019a; Yang et al., 2020). If the intent of the previous original question (OQ) is not satisfied by the potential answer (PA) provided by a system, then the users’ next intent is more likely to be information request (IR). For example, if the user asks: “Can you send me a website, so I can read more information?”, the user’s intent is IR. If the system does not consider the intent label IR, then it may provide an answer which does not satisfy the user’s request.

Intent-aware methods (See Figure 1 (b)) adopt intents as an extra input to better understand users’ information needs in an utterance (Yang et al., 2020). However, they require sufficient human-annotated intent labels for training, the construction of which is time-consuming and labor-intensive.

The teacher-student self-training framework has been widely used in many recent works, where the teacher generates pseudo-labels and the student makes predictions (Xie et al., 2020; Ghiasi et al., 2021; Li et al., 2021; Karamanolakis et al., 2021).

“The core procedure is: First, we train a teacher model on the labeled data and predict pseudo intent labels for the unlabeled data. Second, we select high-quality intent labels by estimating intent confidence gain and then add selected intents to the input of the answer selection model. The intent confidence gain measures how much information a candidate intent label can bring to the model. Third, we re-train a student model on both the labeled and pseudo-labeled data. Intuitively, ICAST synthesizes pseudo intent and answer labels and integrates them into teacher-student self-training, which can assure synthetic answer quality by high-quality intents.”