Knowledge Retrieval and RAG

Can we defend RAG systems from corpus poisoning without retraining?

Explores whether retrieval-time defenses can catch and block poisoned documents before they reach the generator, without expensive retraining cycles. Matters because corpus updates outpace model retraining in production RAG systems.

Note · 2026-05-03 · sourced from 12 types of RAG

RAG poisoning attacks insert malicious documents into the retrieval corpus so they get pulled in for matching queries and steer generation toward attacker-preferred outputs. Existing defenses typically require retraining the retriever or the generator, which is expensive and slow to deploy. RAGPart and RAGMask propose two lightweight defenses that operate at retrieval time without modifying the generation model.

RAGPart exploits a structural property of dense retrievers: they learn discriminative patterns from how the training data is partitioned, which means malicious documents inserted into one partition have predictably limited influence on retrieval from queries that match a different partition. By configuring partitions deliberately, the system bounds how far any single poisoned document can propagate. RAGMask takes a different angle: it masks tokens in candidate documents and watches for abnormal similarity shifts. Genuine documents are robust to token masking — their similarity scores degrade smoothly — while poisoned documents that rely on specific trigger tokens show sudden similarity collapse, which serves as a detection signal.

The architectural significance is that defense need not be coupled to training. Both methods sit at the retrieval layer and treat the generator as an untrusted black box that must be protected from upstream corruption. This separation matters operationally because retrieval corpora update faster than retrievers can be retrained, so defenses that require retraining are always behind the threat. The threat surface is real and severe — How vulnerable is GraphRAG to tiny text manipulations? shows even minimal corpus modifications can devastate accuracy in graph-structured RAG.


Source: 12 types of RAG

Related concepts in this collection

Concept map
13 direct connections · 102 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

RAG corpus poisoning has lightweight defenses without retraining — partition-aware retrieval and token-masking similarity shifts catch attacks the generator never sees