ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Paper · arXiv 2505.04588 · Published May 7, 2025
Reasoning o1 o3 SearchReinforcement Learning

Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs’ search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZEROSEARCH, a reinforcement learning framework that incentivizes the search capabilities of LLMs without interacting with real search engines. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model’s reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZEROSEARCH effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it.

we propose ZEROSEARCH—a reinforcement learning framework that enables LLMs to learn search strategies without interacting with real search engines. Our key insight is that LLMs have acquired extensive world knowledge during large-scale pretraining and are capable of generating relevant documents given a search query [43]. The primary difference between a real search engine and a simulation LLM lies in the textual style of the returned content. However, with lightweight supervised fine-tuning, even relatively small LLMs can effectively simulate the behavior of real search engines. In addition to eliminating API costs, an important advantage of using LLMs for document generation is the ability to control document quality. Specifically, during supervised fine-tuning, documents that lead to correct or incorrect answers are distinguished through prompt design, enabling the simulation LLM to learn to generate either relevant or noisy documents simply by adjusting a few words in the prompt. Building on this, we introduce a curriculum rollout mechanism during training, in which the quality of the generated documents is gradually degraded over time to simulate increasingly challenging retrieval scenarios. This allows the policy model to first learn basic output formats and task requirements before progressively adapting to more challenging and noisy retrieval scenarios.