From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

Paper · arXiv 2506.18959 · Published June 23, 2025

However, traditional keyword-based search engines are increasingly inadequate for handling complex, multi-step information needs. Our position is that Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research. These systems transcend conventional information search techniques by tightly integrating autonomous reasoning, iterative retrieval, and information synthesis into a dynamic feedback loop. We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn. We also introduce a test-time scaling law to formalize the impact of computational depth on reasoning and search. Supported by benchmark results and the rise of open-source implementations, we demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.

Equipped with TTS on reasoning and search, LLMs are set to drive a new search paradigm termed Agentic Deep Research systems, which are capable of autonomous reasoning, on-demand searching, and iterative information synthesis. Demonstrations from deep research products products launched by OpenAI and Google highlight several key advantages of this paradigm: (1) Comprehensive Understanding: Ability to dissect and address complex, multifaceted queries that overwhelm traditional methods (Wei et al., 2022); (2) Enhanced Synthesis: Excels at synthesizing information from diverse, potentially conflicting sources into coherent and insightful narratives (Cheng et al., 2025); (3) Reduced User Burden: Significantly decreases the cognitive load and manual effort required from users by automating laborious research steps (Sami et al., 2024).

We introduce the test-time scaling (TTS) law for Deep Research, a novel hypothesis formalizing the relationship between allocated inference-time computational resources and the resulting improvements in reasoning and searching tasks;

tightly integrating search and reasoning in a multi-step and interactive manner, these systems can progressively enhance the relevance and depth of retrieved knowledge and simultaneously refine the reasoning process underlying query interpretation, ultimately producing more accurate and contextually nuanced responses

Here, reasoning actively influences search (e.g., refining search queries based on intermediate deductions), while retrieved information recursively refines reasoning in a dynamic feedback loop. Unlike the previous LLM with RAG framework in Section 2.3, where retrieval and reasoning occur in discrete and sequential stages, this approach treats them as interdependent, continuously co-evolving.

This evolution in search methodologies gives rise to a transformative paradigm we define as Agentic Deep Research. In this paradigm, language models takes on the role of active information-seeking agents. Rather than a one-shot prompt + retrieve paradigm, an “agentic” LLM plans a series of steps: it can issue search queries, consult documents, browse on web, or even collaborate with other agents, all while refining its query understanding and response via iterative retrieval and reasoning. Inspired by the way human experts might research a question, we encapsulate this iterative synergy between reasoning and search in the term Deep Research highlighting its dynamic and interactive essence. To substantiate our central position that LLM-driven Agentic Deep Research will inevitably become the predominant paradigm for future information-seeking—we ground our argument across three interlinked technical dimensions: reasoning capabilities as the foundation, principled approaches to incentivize search, and ecosystem-level momentum evidenced through benchmarks and implementations.

The evolution of reasoning capabilities in large language models represents a crucial stepping stone toward truly agentic systems, particularly in the context of deep research tasks. While Chain-of-Thought (CoT) prompting (Wei et al., 2022) initially demonstrated the possibility of explicit reasoning processes, the real breakthrough lies in how reasoning mechanisms enable autonomous decision-making and strategic planning, essential for conducting deep research. The transformation from simple CoT to more sophisticated reasoning frameworks marks a fundamental shift in how AI systems approach complex tasks. Rather than merely following predetermined patterns, modern reasoning frameworks enable systems to dynamically plan, execute, and adjust their approach based on intermediate outcomes. This capability is particularly evident in recent reinforcement learning-based optimization approaches (Jaech et al., 2024; Guo et al., 2025), which have demonstrated unprecedented abilities in managing complex search tasks. These systems can autonomously determine when to initiate searches, formulate appropriate queries, and synthesize findings into coherent understanding, forming the cornerstone of agentic behavior. The DeepSeek-R1 (Guo et al., 2025)