Knowledge Retrieval and RAG

When should retrieval actually help versus hurt reasoning?

Retrieval augmentation seems universally beneficial, but does it always improve reasoning? This explores whether some reasoning steps benefit from internal knowledge alone, and when external retrieval introduces harmful noise rather than useful information.

Note · 2026-02-22 · sourced from RAG
RAG How should researchers navigate LLM reasoning research?

Retrieval augmentation is not always helpful. Some queries require external knowledge that the LLM does not have. Others require reasoning over knowledge the LLM already contains. For the second type, retrieval adds noise: potentially irrelevant retrieved documents compete with the model's correct internal representations, increasing latency without improving accuracy.

DeepRAG formalizes this as a Markov Decision Process. At each reasoning step, the model makes a binary decision: retrieve external knowledge or rely on parametric knowledge. The state is the current question and available information; the action is the decision; the reward is downstream answer accuracy. The model learns a policy for when to retrieve.

The MDP framing makes explicit what standard RAG leaves implicit: retrieval is a resource with a cost, not a free improvement. Always-retrieve is a degenerate policy that ignores the cost. Never-retrieve is a degenerate policy that ignores the benefit. Optimal policy adapts to step-level information needs.

The 21.99% accuracy improvement comes from two sources: better answers when retrieval is used (because the model retrieves more targeted subqueries), and reduced noise when retrieval is not used (because the model stops disrupting correct parametric reasoning with irrelevant retrieved content).

The connection to Does reasoning fine-tuning make models worse at declining to answer?: both findings highlight that LLMs trained with outcome rewards learn to always engage (always answer, always retrieve) rather than calibrating engagement to actual knowledge state. The MDP explicitly rewires this — abstention (use parametric knowledge) becomes an active and rewarded choice.


Source: RAG

Related concepts in this collection

Concept map
19 direct connections · 163 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

retrieval-augmented reasoning as Markov Decision Process enables per-step parametric versus external knowledge switching