LLM Reasoning and Architecture Knowledge Retrieval and RAG Design & LLM Interaction

What makes deep research fundamentally different from RAG?

Explores whether current systems using the label 'deep research' actually meet a rigorous three-component definition involving multi-step gathering, cross-source synthesis, and iterative refinement, or if they're performing something narrower.

Note · 2026-02-21 · sourced from Deep Research

"Deep research" is used loosely to describe anything from a single web search to a multi-hour autonomous investigation. The Characterizing Deep Research paper proposes a formal three-component definition that makes the boundary precise:

Multi-step information gathering — not one retrieval round but a sequence of them, where each round can expand or contract the search space
Cross-source synthesis — combining findings from multiple independent sources, not just summarizing one document
Iterative query refinement — using partial findings to improve subsequent queries, not issuing all queries upfront

The definition excludes single-step RAG (fails component 1), document summarization (fails component 3), and simple web browsing (may fail component 2). It includes only systems that loop across all three simultaneously.

The practical value of the definition is benchmarking clarity. Without it, systems that perform single-step retrieval with sophisticated synthesis can claim "deep research" capability when they lack the iterative refinement component that actually distinguishes DR from RAG++. PRELUDE (the benchmark that accompanies the paper) evaluates all three components, making it possible to locate exactly where a system falls short.

This also clarifies what the TTS law applies to: Does search budget scale like reasoning tokens for answer quality? is a scaling law specifically for systems that meet the full three-component definition. Partial systems that skip iterative query refinement likely show different scaling behavior.

Researchy Questions (2024) operationalizes the "unknown unknowns" concept for deep research. Unlike standard QA benchmarks that study "known unknowns" with clear indications of what information is missing, Researchy Questions identifies non-factoid, multi-perspective, decompositional questions from real search engine logs — questions where the questioner doesn't know what they don't know. Users spend significantly more effort (clicks, session length) on these queries, and "slow thinking" techniques like decomposition into sub-questions show benefit over direct answering. An 8-dimension quality rubric (ambiguity, incompleteness, assumptions, multi-facetedness, knowledge-intensity, subjectivity, reasoning-intensity, harmfulness) provides granular characterization. This distinguishes "deep" questions from merely "hard" ones: a deep question has multiple perspectives allowing a dense manifold of answers, no single correct answer, and requires genuine synthesis rather than just retrieval. Source: Arxiv/Agentic Research.

Source: Deep Research

Related concepts in this collection

Does search budget scale like reasoning tokens for answer quality? Explores whether the test-time scaling law that applies to reasoning tokens also governs search-based retrieval in agentic systems. Understanding this relationship could reshape how we allocate inference compute between thinking and searching.
grounds: the TTS law applies specifically to systems meeting this formal definition; the three components define what search budget measures
Do hierarchical retrieval architectures outperform flat ones on complex queries? Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.
connects: hierarchical architecture is the structural implementation of the three-component definition

Concept map

15 direct connections · 120 in 2-hop network ·medium cluster

What makes deep research fundamentally different… Does search budget scale like reasoning tokens for… Do hierarchical retrieval architectures outperform…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

deep research requires a formal three-component definition: multi-step information gathering, cross-source synthesis, and iterative query refinement