Should retrieval triggers use model confidence or data rarity?

FLARE and QuCo-RAG propose different signals for when to retrieve in RAG systems. Are these competing approaches, or do they each catch distinct failure modes that a combined strategy could address?

Note · 2026-05-03 · sourced from 12 types of RAG

Two competing answers exist for "when should a RAG system trigger retrieval?" FLARE answers: when model confidence drops below a threshold during generation. QuCo-RAG answers: when the entities or claims in the query are rare in pretraining data. The papers frame these as alternative mechanisms. They are not — the signals catch orthogonal failure modes, and the right design combines them.

Internal uncertainty (FLARE-style) catches cases where the model recognizes its own ignorance: low log-probability over generated tokens, high entropy over candidate continuations, semantic drift mid-generation. The model knows it does not know, and the trigger fires. This works well when the failure mode is uncertainty-correlated — paraphrasing common knowledge, summarizing seen content, generating in well-trodden territory. It fails when the model is confidently wrong: pretraining bias produces high-confidence outputs about rare entities the model has never seen enough of to be correctly calibrated about. Calibration error is precisely the regime where internal uncertainty is silent.

External rarity (QuCo-style) catches cases where the model has no business being confident: query entities that co-occurred fewer than k times in pretraining, claims about specific quantities or dates that are easily fabricated, named entities outside the model's training distribution. The signal is computed from the corpus, not from the model's state, so it works precisely where calibration has failed. It fails when the model is uncertain about common knowledge — a stylistic ambiguity, an in-context contradiction, a multi-step inference that compounds error. Pretraining frequency says "you should know this" while the model in fact does not.

The two signals are nearly orthogonal. FLARE catches known unknowns; QuCo catches unknown unknowns. A retrieval policy using only one will systematically underfire on the failures the other catches. The composite policy is straightforward: trigger if either signal exceeds its threshold, with the union covering the calibration gap that single-signal policies leave open. The framing also explains why fixed-interval retrieval (e.g., retrieve every k tokens) underperforms both: fixed intervals waste retrieval budget on confident-correct generation and miss the prompts where neither signal naturally fires together.

The implication for RAG architecture: retrieval triggering is not a single-signal classification problem but a dual-channel calibration problem, and the channels measure different things. Building either channel without the other leaves a known failure surface uncovered.

Source: 12 types of RAG

Related concepts in this collection

When should retrieval happen during model generation? Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
the FLARE-style internal-uncertainty channel; necessary but not sufficient
Can pretraining data statistics detect hallucinations better than model confidence? This explores whether tracking rare entity co-occurrences in training data provides a more reliable hallucination signal than measuring model confidence. It matters because confidence-based retrieval triggers miss the model's most dangerous mistakes.
the QuCo-style external-rarity channel; necessary but not sufficient
Can RAG systems refuse to answer without reliable evidence? Explores whether retrieval-augmented generation can be designed to abstain from answering when sources are corrupted or insufficient, rather than filling gaps with plausible-sounding guesses. This matters for historical text where OCR errors and language drift are common.
downstream: once retrieval fires, generation must be evidence-conditional; the trigger is upstream of refusal behavior
Can smaller models handle RAG filtering while larger models focus on synthesis? Does splitting RAG pipeline work between cheaper small models and expensive large models improve both cost and quality? The question asks whether different pipeline stages have different optimal model sizes.
composes with the dual-trigger: tier the retrieval decision (cheap rarity check first, then expensive uncertainty probe)

Concept map

13 direct connections · 86 in 2-hop network ·medium cluster

Should retrieval triggers use model confidence o… When should retrieval happen during model generati… Can pretraining data statistics detect hallucinati… Can RAG systems refuse to answer without reliable … Can smaller models handle RAG filtering while larg…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

retrieval triggers should combine internal-uncertainty signals with external-rarity signals — model confidence misses pretraining-frequency hallucination risk and vice versa