RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

Paper · arXiv 2507.02962 · Published June 30, 2025

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, while LLMs remain prone to generating hallucinated or outdated responses due to their static internal knowledge. Recent advancements in Retrieval-Augmented Generation (RAG) methods have aimed to enhance models’ search and reasoning capabilities through reinforcement learning (RL). Although these methods demonstrate promising results, they face challenges in training stability and encounter issues such as substantial inference time and restricted capabilities due to reliance on single-query mode. In this paper, we propose RAG-R1, a novel training framework designed to enable LLMs to adaptively leverage internal and external knowledge during the reasoning process. We further expand the generation and retrieval processes within the framework from single-query mode to multi-query parallelism, with the aim of reducing inference time and enhancing the model’s capabilities. Extensive experiments on seven question-answering benchmarks demonstrate that our method outperforms the strongest baseline by up to 13.2% and decreases inference time by 11.1%.

Furthermore, existing methods generate only a single search query whenever external retrieval is required, which presents two significant challenges: (1) Substantial Retrieval Iterations and Inference Time: The model generally requires multi-turn interleaved reasoning and search in single-query mode, particularly when dealing with multi-hop reasoning problems. This results in increased inference time, hindering its applicability in real-world scenarios. (2) Restricted External Knowledge: In single-query mode, the limited knowledge acquired from retrieval restricts the model’s ability to thoroughly explore the extent of its reasoning capability during training, thereby impacting its overall performance. We conducted a straightforward experiment based on Qwen2.5-72B-Instruct (et al., 2024a) following Jin et al. (2025) to validate the aforementioned challenges. We evaluated the model’s performance and average retrieval count when generating single query versus multiple queries in situations requiring retrieval. As illustrated in Figure 1, the multi-query method enhances both performance and inference efficiency compared to the single-query method, highlighting the limitations of the single-query mode.

To address the aforementioned challenges, we propose RAG-R1, a novel training framework that enables LLMs to adaptively leverage internal and external knowledge during the reasoning process and improves the reasoning ability to answer questions correctly. We further expand the generation and retrieval processes within the framework from single-query mode to multi-query parallelism to reduce the model’s inference time and enhance its capabilities. Specifically, the training framework contains two stages, i.e., Format Learning Supervised Fine-Tuning and Retrieval-Augmented Reinforcement Learning. In the first stage, we thoughtfully generate samples that integrate reasoning and search to equip LLMs with the ability to adaptively leverage internal and external knowledge during the reasoning process and response in a think-then-search format. In the second stage, we employ outcome-based RL with a retrieval environment to improve the model’s ability to reason and dynamically retrieve external knowledge to answer questions correctly. Building upon the training framework, we expand the generation and retrieval processes from single-query mode to multi-query parallelism. This approach reduces retrieval rounds and inference time, supplying the model with more comprehensive and diverse information and ultimately boosting its performance.