A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models
Graph-based Retrieval-Augmented Generation (GraphRAG) has recently emerged as a promising paradigm for enhancing large language models (LLMs) by converting raw text into structured knowledge graphs, improving both accuracy and explainability. However, GraphRAG relies on LLMs to extract knowledge from raw text during graph construction, and this process can be maliciously manipulated to implant misleading information. Targeting this attack surface, we propose two knowledge poisoning attacks (KPAs) and demonstrate that modifying only a few words in the source text can significantly change the constructed graph, poison the GraphRAG, and severely mislead downstream reasoning. The first attack, named Targeted KPA (TKPA), utilizes graph-theoretic analysis to locate vulnerable nodes in the generated graphs and rewrites the corresponding narratives with LLMs, achieving precise control over specific question-answering (QA) outcomes with a success rate of 93.1%, while keeping the poisoned text fluent and natural. The second attack, named Universal KPA (UKPA), exploits linguistic cues such as pronouns and dependency relations to disrupt the structural integrity of the generated graph by altering globally influential words. With fewer than 0.05% of full text modified, the QA accuracy collapses from 95% to 50%.
These attacks show that crafted additions to the corpus can distort the resulting graph and mislead multiple queries once the graph is built. An unexplored question is whether GraphRAG is also vulnerable when the adversary cannot add new text, but is only able to make small, subtle modifications to the existing corpus. In this work, we reveal a manipulation-only attack surface for GraphRAG: even without introducing additional content, simply changing a few words in the existing corpus can distort the entities and relations extracted during graph construction, and the corrupted structure then persists and misleads a broad range of queries. This threat corresponds to subtle edits to trusted sources (e.g., minor changes in Wikipedia) rather than the injection of obviously malicious content. Such manipulations pose two key challenges: to what extent can a few edits change the behavior of a GraphRAG system, and do these changes appear in a targeted or a widespread form?
We address these questions by focusing on two complementary objectives: precision, the ability to make specific queries return attacker-desired answers with only a few edits; and breadth, the ability of small, subtle modifications to corrupt the graph broadly, degrading reasoning across many queries. Although stealthiness is not an explicit objective, it is implicitly achieved by restricting the attack to very small edits on trusted sources.