Do hierarchical retrieval architectures outperform flat ones on complex queries?
Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.
HierSearch separates two functions that flat retrieval architectures conflate: deciding what to search for (query planning) and deciding what the answer is (answer synthesis). The finding is that these functions interfere with each other when combined, and separating them improves multi-hop query performance.
The interference mechanism: in a flat architecture, the model must simultaneously track what it is looking for, what it has found, and how the findings combine into an answer. Multi-hop queries require multiple retrieval rounds with intermediate synthesis steps — each round's findings must inform the next round's query while also contributing to the final answer. When one model component handles all of this, it loses coherence across the chain. The hierarchical architecture assigns query planning to one component and answer synthesis to another, letting each specialize.
This has implications beyond deep research. The same interference between planning and execution is well-documented in agent design: models that plan and execute simultaneously produce worse plans and worse execution than models where these are separated. HierSearch is the retrieval-specific confirmation of a general architectural principle.
The structural finding also has a connection to How do readers track segments, purposes, and salience together? — that is the cognitive architecture problem HierSearch solves at the system level. The discourse-level problem (tracking segments + purposes + salient objects in parallel) is equivalent to the retrieval-level problem (tracking query intent + retrieved evidence + synthesis state in parallel). Architecturally separating these reduces the tracking burden.
LogicRAG extends the hierarchical principle by making the query planning step structurally explicit: it decomposes the query into a directed acyclic graph (DAG) of subproblems at inference time, then resolves them in topological order. Where HierSearch separates planning from synthesis at the system level, LogicRAG implements the planning step as a structured dependency graph at the query level. The result: query-adaptive logic structures without corpus pre-processing cost. See Can query-time graph construction replace pre-built knowledge graphs?.
Source: Deep Research
Related concepts in this collection
-
Does search budget scale like reasoning tokens for answer quality?
Explores whether the test-time scaling law that applies to reasoning tokens also governs search-based retrieval in agentic systems. Understanding this relationship could reshape how we allocate inference compute between thinking and searching.
extends: hierarchical architecture makes the search budget more efficient by reducing interference loss
-
How do readers track segments, purposes, and salience together?
Can discourse processing actually happen in parallel rather than sequentially? This matters because understanding how readers coordinate multiple layers of meaning at once reveals where AI systems break down in comprehension.
connects: HierSearch solves at system architecture level the same parallel-tracking problem that discourse processing requires at the cognitive level
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
hierarchical research architectures that separate query planning from answer synthesis outperform flat architectures on multi-hop queries