Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home

Paper · arXiv 2501.12835 · Published January 22, 2025

While external information via RAG (Lewis et al., 2020b) can potentially help to fill these gaps, it raises the possibility of irrelevance, thus leading to the error accumulation

These methods rely on LLM self-knowledge—model capacity to recognize its own knowledge (Yin et al., 2023) — and determine when it lacks critical information.

models. Furthermore, recent studies of complex pipelines do not assess self-knowledge abilities and lack comparisons with well-established uncertainty estimation methods, such as Mean Entropy (Fomicheva et al., 2020).

Retrieval-Augmented Generation methods are widely used to enhance the performance of LLMs in many tasks, like up-to-date information (Jiang et al., 2024) or questions about rare entities in which LLM shows poor generation quality due to lack of inner knowledge (Allen-Zhu and Li, 2024). In the simplest case, the input sequence of the question is used as a query for databases or search engines. The resulting information is then incorporated as an additional context, proven effective for a variety of tasks (Khandelwal et al., 2020; Lewis et al., 2020a) and models (Borgeaud et al., 2022; Ram et al., 2023; Socher et al., 2013). All these methods are applied to the retrieval once before generation, so they are often combined under the name single-round retrieval augmentation.

Uncertainty Estimation (UE) measures the confidence in LLM predictions and can be classified into white-box and black-box methods. White-box methods require access to internal model details, such as logits or layer outputs, and are divided into information-based (using token or sequence probabilities from a single model), ensemble-based (leveraging probabilities from different model versions), and density-based (constructing a probability density from latent representations). Black-box methods, in contrast, only require access to the model’s output

The results in Table 1 show that uncertainty estimation methods outperform baseline methods on single-hop datasets and perform comparably on multi-hop datasets, while being significantly more compute-efficient, often several times cheaper.

While baseline methods may achieve slightly better performance on some datasets, they require multiple calls to both the language model and retriever, leading to higher computational costs. In contrast, uncertainty estimation methods consistently require fewer than one retriever call and two or less LM calls per question, significantly reducing inference costs.

Uncertainty estimation for adaptive retrieval consistently outperforms constant retrieval in terms of performance and efficiency.