Can factually wrong generated documents still improve retrieval accuracy?

This explores whether a generated 'document' can sharpen retrieval even when its facts are wrong — i.e., whether generation's value to search is about bridging vocabulary and surfacing intent rather than being correct.

This explores whether a generated 'document' can sharpen retrieval even when its facts are wrong. The corpus suggests yes — because the generated text isn't being used as an *answer*, it's being used as a *better query*. The cleanest evidence is ITER-RETGEN, where feeding a model's own generated response back in as the next retrieval query substantially improves multi-hop reasoning and fact verification Can a model's partial response guide what to retrieve next?. The mechanism there has nothing to do with the generation being true: a draft answer, even a wrong one, names entities, phrasings, and intermediate steps that the original question left implicit. It closes the gap between what you asked and what the corpus actually says.

Why that works comes into focus when you look at where retrieval breaks. Embeddings measure *association*, not relevance — and there's a hard mathematical ceiling on how many distinct document sets a fixed embedding dimension can even represent Where do retrieval systems fail and why?. A short user query lands in a sparse, ambiguous neighborhood of that space. A generated pseudo-document, factually wrong or not, is longer and denser; it lands closer to the real target documents because it *talks like them*. The win is geometric, not epistemic. This is also why retrieval and usefulness can be cleanly separated: CLaRa shows the gap between 'looks similar' and 'actually helps answer' only closes when retrieval gets feedback from generation success — meaning a generated artifact's job is to steer the search, a role orthogonal to its truth value Can retrieval learn what actually helps answer questions?.

There's a related move that makes the point even sharper: MiA-RAG generates a global *summary* of a document first and conditions retrieval on that, recovering discourse structure that chunk-level similarity destroys Can building a document map first improve retrieval over long texts?. The summary is a synthetic, lossy, potentially-distorted representation — and it still improves which evidence gets found, because it supplies structural scaffolding the raw query lacks. The generated text functions as a map, not as a fact.

But the corpus also marks the cliff edge, and it's worth knowing where 'wrong-but-useful' flips to 'wrong-and-toxic.' The decisive variable is whether the generated text re-enters the corpus as *content* rather than staying a transient *query*. Bidirectional RAG only lets generated answers join the retrieval base after they pass entailment, attribution, and novelty checks — precisely because unverified generations pollute future retrievals Can RAG systems safely learn from their own generated answers?. And once false text is in the corpus, it behaves like poisoning, with defenses needed at the retrieval layer to bound its influence Can we defend RAG systems from corpus poisoning without retraining?. So the honest synthesis is a clean split: factually wrong generation can *guide* retrieval (as a query, a map, a steering signal) precisely because nobody trusts it as an answer — but the moment you trust it enough to store it, its wrongness stops helping and starts compounding.

Sources 6 notes

Can a model's partial response guide what to retrieve next?

ITER-RETGEN shows that iteratively using generated responses as retrieval queries substantially improves performance on multi-hop reasoning and fact verification. Generation acts as both answer producer and information-need clarifier, surfacing implicit gaps that the original query missed.

Where do retrieval systems fail and why?

RAG systems fail at three structural levels: adaptive triggering (fixed intervals waste context), semantic-task mismatch (embeddings measure association, not relevance), and mathematical limits (embedding dimension constrains representable document sets). These require fundamentally different retrieval approaches, not tuning.

Can retrieval learn what actually helps answer questions?

CLaRa propagates generator loss back through continuous document representations, allowing retrievers to optimize for documents that actually improve answers rather than surface similarity. The gap between relevance and usefulness closes when retrieval receives direct feedback from generation success.

Can building a document map first improve retrieval over long texts?

MiA-RAG inverts standard RAG by summarizing documents first, then conditioning retrieval on that global view. This approach recovers discourse structure that bag-of-chunks retrieval destroys, making scattered evidence findable by their document role rather than surface similarity alone.

Can RAG systems safely learn from their own generated answers?

Systems can add generated answers to their retrieval corpus when outputs pass entailment verification, source attribution checks, and novelty detection. This prevents hallucinations from polluting future retrievals while allowing genuine knowledge accumulation.

Can we defend RAG systems from corpus poisoning without retraining?

RAGPart and RAGMask provide lightweight, retraining-free defenses that operate at the retrieval layer. RAGPart bounds poisoned-document influence via partitioned retriever learning; RAGMask flags suspicious documents through abnormal similarity collapse under token masking.

Can factually wrong generated documents still improve retrieval accuracy?

Sources 6 notes

Next inquiring lines