Knowledge Retrieval and RAG

Can community detection enable RAG systems to answer global corpus questions?

Standard RAG struggles with corpus-wide questions that require understanding overall themes rather than retrieving specific passages. Can graph community detection overcome this limitation at scale?

Note · 2026-02-23 · sourced from Knowledge Graphs
RAG How should researchers navigate LLM reasoning research?

Standard RAG fails on global questions directed at entire text corpora ("What are the main themes in the dataset?") because these are query-focused summarization (QFS) tasks, not explicit retrieval tasks. Prior QFS methods fail to scale to the quantities of text indexed by typical RAG systems. Graph RAG bridges both limitations.

The two-stage approach:

  1. Graph construction: LLM extracts named entities and relationships from source documents, building an entity knowledge graph with weighted edges (normalized counts of detected relationship instances). A secondary extraction captures claims linked to detected entities (subject, object, type, description, source span, dates).
  2. Community-based summarization: Leiden algorithm partitions the graph into hierarchical communities of closely-related entities. LLM generates report-like summaries for each community at each hierarchy level. These summaries are pre-generated and independently useful for understanding global dataset structure.

Given a question, each community summary generates a partial response, then all partial responses are summarized into a final global answer (map-reduce pattern). This exploits a previously unexplored quality of graphs: their inherent modularity and the ability of community detection algorithms to partition them into coherent groups.

The community summaries serve dual purposes: (1) answering questions via map-reduce, and (2) enabling sensemaking in the absence of a specific question — users can scan community summaries at one hierarchy level for themes, then follow links to lower-level reports for subtopic details.

This represents a fundamentally different use of graphs in RAG: not for structured retrieval and traversal (as in HippoRAG or LogicRAG), but for modular summarization that provides complete coverage of the underlying corpus.

This connects to:

Original note title

GraphRAG uses community detection to enable global query-focused summarization that neither pure RAG nor pure summarization can achieve at scale