LLM Reasoning and Architecture Knowledge Retrieval and RAG Agentic and Multi-Agent Systems

Do hierarchical retrieval architectures outperform flat ones on complex queries?

Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.

Note · 2026-02-21 · sourced from Deep Research

HierSearch separates two functions that flat retrieval architectures conflate: deciding what to search for (query planning) and deciding what the answer is (answer synthesis). The finding is that these functions interfere with each other when combined, and separating them improves multi-hop query performance.

The interference mechanism: in a flat architecture, the model must simultaneously track what it is looking for, what it has found, and how the findings combine into an answer. Multi-hop queries require multiple retrieval rounds with intermediate synthesis steps — each round's findings must inform the next round's query while also contributing to the final answer. When one model component handles all of this, it loses coherence across the chain. The hierarchical architecture assigns query planning to one component and answer synthesis to another, letting each specialize.

This has implications beyond deep research. The same interference between planning and execution is well-documented in agent design: models that plan and execute simultaneously produce worse plans and worse execution than models where these are separated. HierSearch is the retrieval-specific confirmation of a general architectural principle.

The structural finding also has a connection to How do readers track segments, purposes, and salience together? — that is the cognitive architecture problem HierSearch solves at the system level. The discourse-level problem (tracking segments + purposes + salient objects in parallel) is equivalent to the retrieval-level problem (tracking query intent + retrieved evidence + synthesis state in parallel). Architecturally separating these reduces the tracking burden.

LogicRAG extends the hierarchical principle by making the query planning step structurally explicit: it decomposes the query into a directed acyclic graph (DAG) of subproblems at inference time, then resolves them in topological order. Where HierSearch separates planning from synthesis at the system level, LogicRAG implements the planning step as a structured dependency graph at the query level. The result: query-adaptive logic structures without corpus pre-processing cost. See Can query-time graph construction replace pre-built knowledge graphs?.


Source: Deep Research

Related concepts in this collection

Concept map
17 direct connections · 159 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

hierarchical research architectures that separate query planning from answer synthesis outperform flat architectures on multi-hop queries