Query Understanding in the Age of Large Language Models

Paper · arXiv 2306.16004 · Published June 28, 2023
Question Answer SearchArgumentation

“The central problem of IR systems, also referred to as the “holy grail” of IR, is to overcome the vocabulary mismatch between the user and the system [75]. This leads to the challenge of matching the user’s information needs with the relevant documents in the collection. We aim to bridge this gap by proposing an interactive query rewriting framework using LLMs (shown in Figure 1). We define research directions that handle multiple challenges such as like minimizing query reformulations while allowing for low cost user exploration. We believe that our vision of greatly improving interactive query understanding using LLMs has far-reaching implications for – how users interact with search engines, how data is collected from user interactions, the quality of data and feedback, and its impact on the learning ecosystem.

In conventional search engines, the query understanding component comprises several subcomponents, including spell checkers, query classifiers, query expansions, and query suggestions [14, 76]. Presently, users experience limited transparency and minimal control during the query understanding process. In most cases, the users interact with query suggestions and do not get to see the final rewritten query. This limits the users’ ability to modify or reformulate the query to align with their own latent intent.

Moreover, current ranking systems offer indirect and delayed feedback. Users primarily provide feedback by clicking on a ranked list of documents, but clicks are subject to biases like position and trust biases [31, 74, 90]. Consequently, they are unreliable indicators of the user’s evaluation of their information need, often leading to query reformulations after time-consuming document inspection.

This position paper introduces an LLM-based interactive query understanding framework that replaces traditional query rewriter components with a single LLM-based component. This approach minimizes the potential for error propagation arising from multiple components in the system. We investigate the significant implications of using such an LLM-based query rewriter for designing search systems and its impact on their performance.”