Clarifying the Path to User Satisfaction: An Investigation into Clarification Usefulness

Paper · arXiv 2402.01934 · Published February 2, 2024
Conversation Topics DialogQuestion Answer Search

Several models are proposed in the ConvAI3 challenge (Aliannejadi et al., 2020), aiming to incorporate CQs in the ranking process, mostly proposed based on pre-trained language models. Complementing this focus, some research integrates ranking and clarification features within learning objectives (Hashemi et al., 2020), while others explore the inherent risks by gauging the prospective retrieval gains (Wang and Ai, 2021). In the information retrieval (IR) community, there is a long-standing discussion suggesting that superior system performance in terms of relevance does not necessarily result in enhanced user experience or usefulness (Mao et al., 2016). This has catalyzed a distinct line of research focused on comprehending the user experience with CQs (Kiesel et al., 2018; Zou et al., 2023a,b; Siro et al., 2022; Zamani et al., 2020c; Tavakoli et al., 2022).

It is pertinent to note that, in this study, we categorize “useful clarifying questions” as those that lead to higher user satisfaction. Specifically,

argue that users’ overall satisfaction depends on a variety of facets of a triad: the query, its CQs, and the corresponding candidate answers. This perspective is motivated by a recent study (Siro et al., 2022) that focuses on user satisfaction in task-oriented dialogues, emphasizing the importance of utterance relevance and efficiency. While there is existing research, such as that by Tavakoli et al. (2022) and Zamani et al. (2020b), that models user interaction and engagement with clarification panes, these studies primarily offer observational insights and have produced publicly available datasets like MIMICS and MIMICS-Duo. In contrast to these studies, our focus shifts toward predicting the practical value – usefulness and user satisfaction – of CQs, based

we conduct a comprehensive evaluation over multiple dimensions, including the template structures of CQs, the number of candidate answers available, subjectivity and sentiment polarity of CQs, the length of both CQs and queries, query ambiguity, as well as the predicted relevance between CQs and queries. To augment the evaluation of useful CQs, we further conduct a user study over a number of features, such as question naturalness.

Indeed, with an example query of “monitor”, both “(Which/What) [monitor] are you looking for” and “What (would you like | do you want) to know about [monitor]?” can be used. Essentially, to reveal the true intent behind a user’s query, there are diverse formats or templates

Based on the table, question templates seeking detailed information consistently yield higher user satisfaction than those that simply rephrase user needs. For example, “What would you like to know about [QUERY]?”, are found to be more useful than those that ask questions like “What are you trying to do?” or “Who are you shopping for?”. A simple rephrasing request from a clarifying question could consume the user’s patience in continuing the search and lower the level of user satisfaction. Instead, by having clarifying questions asking for specific facets of user intent, it enables the user to effectively augment the initial query with enriched information and improve the likelihood of retrieving relevant information. This finding aligns with the observations in the literature that users are more satisfied with those questions that they can foresee the benefit of answering them (Zou et al., 2023a).

The research literature suggests that longer queries often pose greater challenges in producing highquality results (Zamani et al., 2020c; Aliannejadi et al., 2021a). One reason for this is that longer queries may contain more irrelevant or ambiguous information, making it harder to match the user’s intent with relevant results

Intriguingly, as the query length increases, there is a noticeable decline in the rate of clarification usefulness. In general, the results indicate that users are more satisfied with short queries and long clarifying questions, suggesting that shorter queries can potentially lead to more ambiguity, creating room for the system to intervene. In addition, the shorter queries increase the benefit of exploration and could further improve the level of user satisfaction with proper clarifying questions to retrieve the target information.

Ambiguous queries are those with multiple distinct interpretations, while facets are used to address underspecified queries by covering different aspects through subtopics

clarifying questions for faceted queries are found to be more useful than those for ambiguous queries. However, on MIMICS-Duo, although faceted queries have a better rate, ambiguous queries also receive a remarkable usefulness rate. This suggests that for ambiguous queries, one query intent is more likely to dominate the user’s information needs for the query—usually the most popular one (Provatorova et al., 2021).