← All notes

Where do retrieval systems break and why?

How retrieval foundations work and where they systematically fail in RAG systems.

Topic Hub · 24 linked notes · 6 sections
View as

Retrieval Mechanics

6 notes

When should retrieval happen during model generation?

Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.

Explore related Read →

Can uncertainty estimation replace complex adaptive retrieval?

Is a simpler approach using model confidence signals sufficient to decide when retrieval is needed, or do complex multi-call adaptive pipelines deliver meaningful benefits?

Explore related Read →

Why do queries and documents occupy different embedding spaces?

Queries and documents express the same information in fundamentally different ways—short and interrogative versus long and declarative. Understanding this mismatch is crucial for why direct embedding retrieval often fails.

Explore related Read →

Can fine-tuning replace query augmentation for retrieval?

Query augmentation helps retrievers handle ambiguous queries but increases input cost. Does fine-tuning the retrieval model achieve comparable performance without this overhead?

Explore related Read →

Can long-context models resolve retriever-reader imbalance?

Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?

Explore related Read →

Can a model's partial response guide what to retrieve next?

Can generation reveal implicit information needs that the original query cannot express? This explores whether using in-progress responses as retrieval signals outperforms upfront query formulation.

Explore related Read →

Failure Modes

5 notes

Do vector embeddings actually measure task relevance?

Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?

Explore related Read →

Can long-context LLMs replace retrieval-augmented generation systems?

Explores whether loading entire corpora into LLM context windows can eliminate the need for separate retrieval systems, and what task types this approach handles well or poorly.

Explore related Read →

When do graph databases outperform vector embeddings for retrieval?

Vector similarity struggles with aggregate and relational queries that require traversing multiple entity connections. Can graph-oriented databases with deterministic queries solve this failure mode in enterprise domain applications?

Explore related Read →

Does reasoning ability actually degrade with longer inputs?

Explores whether modern language models can maintain reasoning performance when processing long contexts, and whether technical capacity translates to practical reasoning capability over extended text.

Explore related Read →

Do embedding dimensions fundamentally limit retrievable document combinations?

Can single-vector embeddings represent any top-k document subset a user might need? Research using communication complexity theory suggests there are hard geometric limits independent of training data or model architecture.

Explore related Read →

Encoder Architecture

1 note

Query Routing and Knowledge Structure Selection

1 note

Pass 3 Additions (2026-05-03)

7 notes

Can pretraining data statistics detect hallucinations better than model confidence?

This explores whether tracking rare entity co-occurrences in training data provides a more reliable hallucination signal than measuring model confidence. It matters because confidence-based retrieval triggers miss the model's most dangerous mistakes.

Explore related Read →

Should retrieval triggers use model confidence or data rarity?

FLARE and QuCo-RAG propose different signals for when to retrieve in RAG systems. Are these competing approaches, or do they each catch distinct failure modes that a combined strategy could address?

Explore related Read →

How can video retrieval handle multiple modalities at different times?

Video RAG systems struggle because the same content appears across visual, audio, and subtitle tracks at offset timestamps. Can temporal awareness in text ranking and frame sampling solve cross-modal misalignment?

Explore related Read →

Can RAG systems refuse to answer without reliable evidence?

Explores whether retrieval-augmented generation can be designed to abstain from answering when sources are corrupted or insufficient, rather than filling gaps with plausible-sounding guesses. This matters for historical text where OCR errors and language drift are common.

Explore related Read →

Can we defend RAG systems from corpus poisoning without retraining?

Explores whether retrieval-time defenses can catch and block poisoned documents before they reach the generator, without expensive retraining cycles. Matters because corpus updates outpace model retraining in production RAG systems.

Explore related Read →

Why do queries and their causes seem semantically different?

Information retrieval systems find passages matching query language, but what if the segment that actually caused a user's question says something quite different? This explores when semantic similarity fails to find causal relevance.

Explore related Read →

How should LLM-based recommenders retrieve from massive item corpora?

When conversational recommenders need to search millions of items, the LLM cannot memorize the corpus. What retrieval strategies work best under different constraints, and how do they trade off latency, sample efficiency, and scalability?

Explore related Read →