How vulnerable is GraphRAG to tiny text manipulations?
GraphRAG converts raw text into knowledge graphs for question answering. This explores whether adversaries can degrade accuracy with minimal edits to source documents, and what makes the system susceptible.
GraphRAG relies on LLMs to extract knowledge from raw text during graph construction — and this extraction process can be manipulated with minimal text changes. Two complementary attacks demonstrate the vulnerability:
Targeted KPA (TKPA): Uses graph-theoretic analysis to locate vulnerable nodes in the generated graph, then rewrites corresponding narratives with LLMs. Achieves 93.1% success rate at controlling specific QA outcomes while keeping poisoned text fluent and natural. This is precision — making specific queries return attacker-desired answers.
Universal KPA (UKPA): Exploits linguistic cues (pronouns, dependency relations) to disrupt structural integrity of the generated graph by altering globally influential words. With fewer than 0.05% of text modified, QA accuracy collapses from 95% to 50%. This is breadth — small modifications corrupt reasoning across many queries.
The critical distinction from prior adversarial attacks on RAG: this is a manipulation-only attack surface. The adversary doesn't inject new content — they make subtle edits to existing trusted sources (e.g., minor Wikipedia changes). The corrupted graph structure persists after construction and misleads all subsequent queries built on it. Stealthiness is achieved implicitly by restricting edits to very small modifications on trusted sources.
The structural vulnerability: GraphRAG's strength (converting unstructured text into structured knowledge) becomes its weakness because the LLM extraction step is sensitive to small perturbations that propagate through the graph. Entity and relation extraction errors compound through graph topology — a single misattributed relationship can redirect entire reasoning paths.
This connects to:
- How much poisoned training data survives safety alignment? — knowledge poisoning operates at a different level (corpus text rather than training data) but shares the principle that small contamination has outsized downstream effects
- Can knowledge graphs enable multi-hop reasoning in one retrieval step? — HippoRAG's KG construction faces the same extraction vulnerability; graph-based retrieval amplifies poisoning because errors propagate through relational traversal
- How vulnerable are reasoning models to irrelevant text? — CatAttack operates on model inputs while KPA operates on knowledge sources, but both demonstrate that minimal perturbations have disproportionate effects on reasoning systems
Original note title
Knowledge poisoning attacks collapse GraphRAG accuracy from 95 to 50 percent by modifying fewer than 0.05 percent of source text words