LLM Reasoning and Architecture Knowledge Retrieval and RAG Design & LLM Interaction

What makes deep research fundamentally different from RAG?

Explores whether current systems using the label 'deep research' actually meet a rigorous three-component definition involving multi-step gathering, cross-source synthesis, and iterative refinement, or if they're performing something narrower.

Note · 2026-02-21 · sourced from Deep Research

"Deep research" is used loosely to describe anything from a single web search to a multi-hour autonomous investigation. The Characterizing Deep Research paper proposes a formal three-component definition that makes the boundary precise:

  1. Multi-step information gathering — not one retrieval round but a sequence of them, where each round can expand or contract the search space
  2. Cross-source synthesis — combining findings from multiple independent sources, not just summarizing one document
  3. Iterative query refinement — using partial findings to improve subsequent queries, not issuing all queries upfront

The definition excludes single-step RAG (fails component 1), document summarization (fails component 3), and simple web browsing (may fail component 2). It includes only systems that loop across all three simultaneously.

The practical value of the definition is benchmarking clarity. Without it, systems that perform single-step retrieval with sophisticated synthesis can claim "deep research" capability when they lack the iterative refinement component that actually distinguishes DR from RAG++. PRELUDE (the benchmark that accompanies the paper) evaluates all three components, making it possible to locate exactly where a system falls short.

This also clarifies what the TTS law applies to: Does search budget scale like reasoning tokens for answer quality? is a scaling law specifically for systems that meet the full three-component definition. Partial systems that skip iterative query refinement likely show different scaling behavior.

Researchy Questions (2024) operationalizes the "unknown unknowns" concept for deep research. Unlike standard QA benchmarks that study "known unknowns" with clear indications of what information is missing, Researchy Questions identifies non-factoid, multi-perspective, decompositional questions from real search engine logs — questions where the questioner doesn't know what they don't know. Users spend significantly more effort (clicks, session length) on these queries, and "slow thinking" techniques like decomposition into sub-questions show benefit over direct answering. An 8-dimension quality rubric (ambiguity, incompleteness, assumptions, multi-facetedness, knowledge-intensity, subjectivity, reasoning-intensity, harmfulness) provides granular characterization. This distinguishes "deep" questions from merely "hard" ones: a deep question has multiple perspectives allowing a dense manifold of answers, no single correct answer, and requires genuine synthesis rather than just retrieval. Source: Arxiv/Agentic Research.


Source: Deep Research

Related concepts in this collection

Concept map
15 direct connections · 120 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

deep research requires a formal three-component definition: multi-step information gathering, cross-source synthesis, and iterative query refinement