DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

Paper · arXiv 2502.01142 · Published February 3, 2025

In this paper, we propose DeepRAG, a framework that models retrieval-augmented reasoning as a Markov Decision Process (MDP), enabling strategic and adaptive retrieval. By iteratively decomposing queries, DeepRAG dynamically determines whether to retrieve external knowledge or rely on parametric reasoning at each step. Experiments show that DeepRAG improves retrieval efficiency while improving answer accuracy by 21.99%,

Retrieval-Augmented Generation (RAG) has been proposed as a promising paradigm to address this issue by integrating relevant information from knowledge bases or search engines, thereby improving the factuality of model response (Zhao et al., 2024). However, incorporating reasoning with retrieval-augmented generation still presents several challenges. One major issue is that complex queries often require multi-step decomposition to establish a coherent reasoning process (Radhakrishnan et al., 2023). Iterative retrieval has been proposed as a solution to continuously update retrieval results to address the dynamic information needs that arise during the generation process (Yue et al., 2024). However, LLMs often struggle to generate atomic and precise subqueries, which are critical for more effective retrieval (Wu et al., 2024). From the perspective of RAG, iterative retrieval should ideally generate the next atomic query based on the current question and the available information in an adaptive manner. Moreover, retrieval is not always necessary. Some queries require knowledge, while others rely solely on reasoning within the LLM. Furthermore, LLMs have demonstrated their capability to serve as knowledge bases themselves (Petroni et al., 2019). Unnecessary retrieval, in addition to being redundant, can introduce noise, degrade generation quality, and increase inference latency