Knowledge Retrieval and RAG

Can long-context models resolve retriever-reader imbalance?

Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?

Note · 2026-02-22 · sourced from RAG
RAG How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

Standard RAG retrieves 100-word paragraphs. This forces the retriever to locate the precise passage containing the answer across a corpus of potentially 22 million units. The task is "find the needle." The reader then extracts the answer from the found passage — a relatively easy task. The retriever carries almost all the weight.

This design was rational in the era when language models had 512–2048 token context windows. Longer retrieval units were unusable because the reader could not process them. The retriever had to do the precision work because the reader could not.

LongRAG (2024) reassesses this design choice given long-context LLMs that handle 128K tokens. Instead of 100-word units, use 4K-token units constructed by grouping related documents. The corpus shrinks from 22M to 600K units — the retriever's job becomes "find the right section" rather than "find the exact needle." Recall@1 on NQ improves from 52% to 71%, and Recall@2 on HotpotQA from 47% to 72%.

The reader then receives the top-k long units concatenated (~30K tokens) and performs zero-shot answer extraction. The LLM is handling what it is good at — understanding language in rich context — while the retriever handles what it is good at — coarse relevance ranking.

The broader principle: RAG architecture design assumptions were frozen at the constraints of their era. As those constraints lift (context windows, model capability, inference cost), the optimal design changes. "Best practices" based on 2020 constraints may be anti-patterns by 2025 standards.


Source: RAG

Related concepts in this collection

Concept map
18 direct connections · 138 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

heavy retriever / light reader imbalance is a historical artifact — long-context LLMs resolve it by shifting burden to the reader