Deep Researcher with Test-Time Diffusion

Paper · arXiv 2507.16075 · Published July 21, 2025
Diffusion LLMDeep ResearchEvolutionNovel Architectures

Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising"process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design makes the report writing process more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents.

Existing DR agents primarily leverage test-time scaling approaches such as Chain-of-Thought (CoT) (Wei et al., 2022), best-of-n sampling(Ichihara et al., 2025), Monte Carlo Tree Search (Świechowski et al., 2022), debate mechanisms (Liang et al., 2023), and self-refinement loops (Madaan et al., 2023). Despite the impressive progress, most popular public DR agents (Alzubi et al., 2025; Researcher, 2025; Roucher et al., 2025) compile these test-time algorithms and various tools without a deliberate design driven by human cognitive behavior in writing, and commonly lack a principled draft, search, and feedback mechanism that empowers human researchers.

Previous cognitive studies indicate that when human write about complex topics, they do not follow a linear progression, writing from the first word to the last. As Fig. 1 (Chitwood, 2022) illustrates, people typically first establish a high-level plan, then draft the research report based on the plan, and subsequently engage in multiple revision cycles (Flower and Hayes, 1981). Crucially, during the revision phase, writers often seek out literature or search tools to gather supplementary information that refines and strengthens their arguments (Catalano, 2013).

We observe a striking resemblance between this human writing pattern and the sampling process in a diffusion model augmented by retrieval (Zhang et al., 2023). In this analogy, a trained diffusion model initially generates a noisy draft, and the denoising module, aided by retrieval tools, revises this draft into higher-quality (or higher-resolution) outputs. Inspired by this diffusion sampling paradigm (Shen et al., 2025; Yang et al., 2022), we propose Test-Time Diffusion (TTD) for deep research agents. Our framework meticulously models the entire research report generation as an iterative diffusion process, mirroring human cognitive patterns. As vanilla diffusion sampling can be ineffective for generating high quality outputs for complex research tasks, we specifically design our TTD Deep Researcher consisting of two mechanisms as illustrated by Fig. 2 and detailed below.

(a) Denoising with Retrieval (Zhang et al., 2023): An initial research report, drafted primarily from the LLM’s internal knowledge, undergoes iterative refinement. The denoised draft, along with the research plan (Stage 1), guide the downstream research direction. Each denoising step is augmented by targeted retrieval of external information (Stage 2), significantly enhancing accuracy and comprehensiveness. (b) Self-Evolution (Lee et al., 2025; Novikov et al., 2025): Beyond the reportlevel diffusion through a draft, each individual component within the agentic workflow (e.g., plan, question, answer and report generation) undergoes its own optimization process. This encourages the exploration of diverse knowledge, mitigates the information loss for each unit agent throughout the long agentic trajectories, and thus provides more conducive context for report diffusion. The intricate interplay and synergistic combination of these two algorithms are crucial for achieving high quality research outcomes.

The TTD-DR framework is designed to address the limitations of existing DR agents. As illustrated in Figure 3, many public agents like Huggingface Open DR (Roucher et al., 2025), GPT (Researcher, 2025) Researcher , and Open Deep Research (Alzubi et al., 2025) employ a linear or parallelized process of planning, searching, and generation. This can lead to a loss of global context and miss critical dependencies during the research process. Our draft-centric, iterative approach maintains coherence and provides a dynamic guide for the research direction, mitigating information loss. Proprietary DRs from OpenAI (2025), Perplexity (2025) and Grok (2025) remain largely black box.

Stage 1: Research Plan Generation is a dedicated unit LLM agent which generates a structured research plan upon receiving a user query. This plan outlines a list of key areas needed for the final report, serving as an initial scaffold to guide the subsequent information-gathering process. Once a research plan is generated, it will be saved in agent stages and then transferred to its sub-agent.

Stage 2: Iterative Search and Synthesis is a loop workflow nested in its parent sequential workflow. It contains of two sub-agents: Search Question Generation (Stage 2a) formulates a search query based on the research plan, the user query, and the context from previous search iterations (i.e., past questions and answers). Answer Searching (Stage 2b) searches the available sources (such as Google search) to find relevant documents and returns a summarized answer. This loop (Stage 2a → Stage 2b) continues until the research plan is adequately covered or a maximum number of iterations is reached.

Stage 3: Final Report Generation is a unit LLM agent in its parent sequential workflow (Stage 2 → Stage 3), which generates a comprehensive and coherent final report by synthesizing all the structured information gathered – the plan from Stage 1 and the series of question-answer pairs from Stage 2.

We enhance the performance of each stage’s agents in order to find and preserve the high quality context. To accomplish this goal, we leverage a self-evolutionary algorithm to improve each stage’s agents. Figure 5 illustrates our proposed algorithm inspired by recent self-evolution work (Lee et al., 2025; Novikov et al., 2025). Here we use the search answer generation as an example, but this algorithm can be applied to all stage agents such as plan generation, search question and even the final report generation to improve their output quality. This algorithm is implemented in a parallel workflow with the following sequential and loop workflows.