Knowledge Retrieval and RAG

Can building a document map first improve retrieval over long texts?

Does constructing a global summary before retrieval help RAG systems connect scattered evidence in long documents the way human readers do? This tests whether understanding document structure improves what gets retrieved.

Note · 2026-05-03 · sourced from 12 types of RAG

Standard RAG retrieves first and reasons second, which works for short factoid queries but fragments evidence in long documents because the retriever has no idea what the document is about. MiA-RAG (Mindscape-Aware RAG) flips the order: it builds a high-level summary of the whole text first, then uses that "global view" to guide what gets retrieved and how the answer is composed. The mindscape acts as a conditioning prior — retrieval queries are reformulated against the document's topology, so scattered evidence that connects only when read in context becomes findable.

This matters because it names a previously implicit failure mode. The retriever's bag-of-chunks view of a long document destroys the discourse structure that makes evidence cohere; readers do not retrieve evidence cold, they retrieve evidence already knowing what the document is broadly arguing. MiA-RAG approximates that reading posture computationally. The mechanism — summary as retrieval conditioner — also generalizes beyond long documents: any retrieval task where local matching diverges from global relevance could benefit from a topology pass before chunk selection. The same hierarchical decomposition principle drives Do hierarchical retrieval architectures outperform flat ones on complex queries? and is the architectural cousin of Can community detection enable RAG systems to answer global corpus questions?.

The architectural cost is one extra summarization pass before retrieval. The benefit is that downstream retrieval and reasoning operate over a compressed plan rather than a token soup, which means the system can connect distant passages by their role in the document rather than only by surface similarity.


Source: 12 types of RAG

Related concepts in this collection

Concept map
13 direct connections · 78 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

global-summary-first retrieval guides RAG over long documents — building a mindscape before retrieving connects scattered evidence the way a human reader does