Knowledge Retrieval and RAG

Can document count be learned instead of fixed in RAG?

Standard RAG systems use a fixed number of documents regardless of query complexity. Can an RL agent learn to dynamically select both how many documents and their order based on what helps the generator produce correct answers?

Note · 2026-02-22 · sourced from RAG
RAG How should researchers navigate LLM reasoning research?

Every standard RAG re-ranking system passes a fixed k documents to the generator. The k is set by the system designer and held constant across queries. This is wrong in both directions: too few documents omit critical information for complex queries; too many documents introduce noise that misleads the generator and reduces efficiency.

The k selection problem is unsolved by all pre-DynamicRAG re-ranking approaches. Re-rankers have improved document ordering but assumed k was given. The number of documents to retrieve is treated as a hyperparameter, not a learned decision.

DynamicRAG models the reranker as an RL agent whose action space is a permutation and count selection over retrieved documents. The reward is LLM output quality — specifically, whether the generator produces a correct answer given the selected document set. The agent receives both explicit query signals and the generator's feedback.

Training proceeds in two phases. First, behavior cloning on expert trajectories (SFT) gives the reranker a baseline policy and reduces action space complexity. Second, RL with generator feedback allows the reranker to explore and learn to calibrate both ordering and count to query needs.

The insight generalizes beyond re-ranking: any RAG system parameter that is currently a heuristic (chunk size, retrieval depth, context window allocation) is a candidate for learning via generator feedback. The generator's output quality is a reward signal that can backpropagate through any component of the pipeline that affects what the generator receives.


Source: RAG

Related concepts in this collection

Concept map
15 direct connections · 129 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

rl-trained reranker that adjusts document order and count solves the fixed top-k problem in rag