Knowledge Retrieval and RAG

Should retrieval triggers use model confidence or data rarity?

FLARE and QuCo-RAG propose different signals for when to retrieve in RAG systems. Are these competing approaches, or do they each catch distinct failure modes that a combined strategy could address?

Note · 2026-05-03 · sourced from 12 types of RAG

Two competing answers exist for "when should a RAG system trigger retrieval?" FLARE answers: when model confidence drops below a threshold during generation. QuCo-RAG answers: when the entities or claims in the query are rare in pretraining data. The papers frame these as alternative mechanisms. They are not — the signals catch orthogonal failure modes, and the right design combines them.

Internal uncertainty (FLARE-style) catches cases where the model recognizes its own ignorance: low log-probability over generated tokens, high entropy over candidate continuations, semantic drift mid-generation. The model knows it does not know, and the trigger fires. This works well when the failure mode is uncertainty-correlated — paraphrasing common knowledge, summarizing seen content, generating in well-trodden territory. It fails when the model is confidently wrong: pretraining bias produces high-confidence outputs about rare entities the model has never seen enough of to be correctly calibrated about. Calibration error is precisely the regime where internal uncertainty is silent.

External rarity (QuCo-style) catches cases where the model has no business being confident: query entities that co-occurred fewer than k times in pretraining, claims about specific quantities or dates that are easily fabricated, named entities outside the model's training distribution. The signal is computed from the corpus, not from the model's state, so it works precisely where calibration has failed. It fails when the model is uncertain about common knowledge — a stylistic ambiguity, an in-context contradiction, a multi-step inference that compounds error. Pretraining frequency says "you should know this" while the model in fact does not.

The two signals are nearly orthogonal. FLARE catches known unknowns; QuCo catches unknown unknowns. A retrieval policy using only one will systematically underfire on the failures the other catches. The composite policy is straightforward: trigger if either signal exceeds its threshold, with the union covering the calibration gap that single-signal policies leave open. The framing also explains why fixed-interval retrieval (e.g., retrieve every k tokens) underperforms both: fixed intervals waste retrieval budget on confident-correct generation and miss the prompts where neither signal naturally fires together.

The implication for RAG architecture: retrieval triggering is not a single-signal classification problem but a dual-channel calibration problem, and the channels measure different things. Building either channel without the other leaves a known failure surface uncovered.


Source: 12 types of RAG

Related concepts in this collection

Concept map
13 direct connections · 86 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

retrieval triggers should combine internal-uncertainty signals with external-rarity signals — model confidence misses pretraining-frequency hallucination risk and vice versa