Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper · arXiv 2501.05366 · Published January 9, 2025
Reasoning o1 o3 SearchDomain Specialization

Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce Search-o1, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow.

This advancement has inspired a series of foundational efforts aimed at exploring and reproducing o1-like reasoning patterns, to broaden their application to a wider range of foundational models [49, 19, 77, 80, 71, 25, 45].

It is noteworthy that o1-like reasoning patterns guide LRMs to engage in a slower thinking process [6, 61] by implicitly breaking down complex problems, generating a long internal reasoning chain and then discovering suitable solutions step by step. While this characteristic enhances logical coherence and interpretability of reasoning, an extended chain of thought may cause overthinking [4] and increased risks of knowledge insufficiency [60, 51, 2], where any knowledge gap can propagate errors and disrupt the entire reasoning chain [79, 40, 44, 41].

To address this limitation, we conduct preliminary experiments to assess the frequency of uncertain words decoded by the LRMs due to knowledge gaps. As shown in Figure 1, the extended thinking process leads LRM to frequently decode numerous uncertain terms in challenging reasoning problems, with “perhaps” averaging over 30 occurrences in each reasoning process. Notably, the high specialization of these problems also complicates manual reasoning verification, often incurring significant costs [63]. Consequently, automating the supplementation of knowledge required for the o1-like reasoning process has become a significant challenge, limiting the progress of LRMs in achieving universally trustworthy reasoning.

We propose Search-o1, which integrates the reasoning process of LRMs with two core components: an agentic retrieval-augmented generation (RAG) mechanism and a knowledge refinement module. This design aims to enable LRMs to incorporate the agentic search workflow into the reasoning process, retrieving external knowledge on demand to support step-wise reasoning while preserving coherence throughout.

Specifically, our results in Figure 1 reveal that traditional problem-oriented RAG techniques do not effectively address the knowledge gaps compared to direct reasoning (Standard RAG vs. Direct Reasoning). This finding aligns with human intuition, as standard RAG retrieves relevant knowledge only once in a problem-oriented manner, while the knowledge required for each step in complex reasoning scenarios is often varied and diverse [83, 41, 11]. Unlike them, Search-o1 employs an agentic RAG technique that guides the model to actively decode search queries when facing knowledge shortages, thereby triggering the retrieval mechanism to obtain relevant external knowledge. Owing to the benefits of this design, our retrieval mechanism can be triggered and iterated multiple times within a single reasoning session to fulfill the knowledge needs of various reasoning steps.

(1) Redundant Information in Retrieved Documents. Retrieved documents are often lengthy and contain redundant information, directly inputting them into LRMs may disrupt the original coherence of reasoning and even introduce noise [62, 72, 26]. (2) Limited Ability to Understand Long Documents. Most LRMs have been specifically aligned for complex reasoning tasks during the pre-training and fine-tuning stages. This focus has resulted in a degree of catastrophic forgetting in their general capabilities [39, 10], ultimately limiting their long-context understanding of retrieved documents.

• We propose Search-o1, the first framework that integrates the agentic search workflow into the o1-like reasoning process of LRM for achieving autonomous knowledge supplementation.

• To effectively integrate external knowledge during reasoning, Search-o1 combines the reasoning process with an agentic RAG mechanism and a knowledge refinement module. This design enables the LRM to retrieve external knowledge on demand, seamlessly incorporating it into the reasoning chain while maintaining the original logical flow.