Can graph-based retrieval with knowledge graphs scale to multi-hop reasoning?
This explores whether building a knowledge graph over your documents — and then walking its connections — can answer questions that require chaining several facts together (multi-hop), and whether that approach holds up as the graph and the reasoning get bigger.
This explores whether graph-based retrieval can handle multi-hop reasoning — questions where the answer lives across several connected facts, not in any single chunk — and whether that scales. The short version from the corpus: yes, and the more interesting story is *how* the field got past the obvious bottleneck. The naive worry is that traversing a graph hop-by-hop means many expensive retrieval rounds. HippoRAG's answer is to skip the round-trips entirely: convert the corpus into a knowledge graph, then run Personalized PageRank seeded from the query's concepts to gather a whole multi-hop path in a single retrieval step — matching iterative methods while running 10–20x cheaper and 6–13x faster Can knowledge graphs enable multi-hop reasoning in one retrieval step?. So scaling isn't about doing more hops; it's about collapsing the hops into one structural operation.
But a single PageRank sweep reads the whole graph, which itself doesn't scale to large graphs or tight context windows. Two opposing fixes show up in the corpus. One says *read less of the graph*: Graph-O1 replaces whole-graph ingestion with learned step-by-step navigation using Monte Carlo Tree Search and RL, fitting inside the context window by trading certainty about the full graph for smart decisions under uncertainty Can learned traversal policies beat exhaustive graph reading?. The other says *don't pre-build the graph at all*: LogicRAG constructs a query-specific directed graph at inference time, dodging the construction cost and the staleness that plagues a giant pre-built corpus graph Can query-time graph construction replace pre-built knowledge graphs?. These are two genuinely different bets on where the scaling cost should land — build-time vs. query-time.
There's also a quieter claim that flat graphs aren't expressive enough for real multi-hop reasoning, regardless of speed. Hypergraph memory binds three-or-more entities into a single relation instead of decomposing them into pairwise edges, preserving the joint constraints that multi-step reasoning actually depends on Can hypergraphs capture multi-hop reasoning better than graphs?. Hierarchical and multimodal graphs push the same way — separating query planning from answer synthesis improves hard queries Do hierarchical retrieval architectures outperform flat ones on complex queries?, and book-scale multimodal graphs answer cross-chapter "global" questions flat retrieval simply can't reach Can multimodal knowledge graphs answer questions that flat retrieval cannot?. And StructRAG reframes the whole question: maybe no single structure scales to everything, so route each query to the structure that fits it — table, graph, algorithm, or plain chunks Can routing queries to task-matched structures improve RAG reasoning?.
What you might not expect: the graph isn't only the retrieval substrate — it's increasingly the *reasoning* substrate. Symbolic rules pulled from graph topology give a model an explicit navigational plan that beats semantic-similarity search Can symbolic rules from knowledge graphs guide complex reasoning?; small models externalize their reasoning into KG triples to crack hard tasks they'd otherwise fail Can structuring reasoning as knowledge graphs help smaller models solve complex tasks?; and graphs even generate the multi-hop training data that teaches search agents to traverse in the first place Can knowledge graphs generate training data for search agents?. The most surprising note suggests this kind of reasoning may have no ceiling at all: agentic graph reasoning self-organizes into a critical state where ~12% of edges stay "semantically surprising" despite being structurally connected, continuously generating new discoveries rather than converging Why do reasoning systems keep discovering new connections?.
If you want one doorway: start with HippoRAG to see the scaling problem dissolved in a single step, then read LogicRAG and Graph-O1 as the two camps arguing over where the remaining cost should go.
Sources 11 notes
HippoRAG converts corpus into a knowledge graph, then uses Personalized PageRank seeded from query concepts to traverse multi-hop paths in one step. It matches iterative retrieval while being 10-20x cheaper and 6-13x faster, with 20% better accuracy on multi-hop QA.
Graph-O1 replaces whole-graph ingestion with step-by-step agentic navigation using Monte Carlo Tree Search and reinforcement learning. This approach fits within LLM context windows while learning domain-specific traversal policies, though it trades certainty about the full graph for decision-making under uncertainty.
LogicRAG constructs directed acyclic graphs from queries at inference time rather than pre-building corpus-wide graphs, eliminating construction overhead, avoiding staleness, and enabling query-specific retrieval logic without sacrificing multi-hop reasoning capability.
HGMem organizes retrieved evidence as hyperedges rather than flat lists or binary graphs, allowing three or more entities to bind into single relations without decomposition. This structure accumulates coherent knowledge across retrieval steps, trading representational complexity for constraint expressiveness.
Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.
MegaRAG builds hierarchical multimodal knowledge graphs from text and visuals to answer cross-chapter, global questions that flat chunk retrieval cannot reach. The hierarchy supports abstraction levels from high-level summaries to page-specific details while treating images as first-class graph nodes.
StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.
SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.
Knowledge Graph of Thoughts (KGoT) achieves 29% improvement on GAIA Level 3 tasks using GPT-4o mini by externalizing reasoning into iteratively constructed KG triples. The approach improves transparency, reduces bias, and enables quality control over reasoning steps.
KG-based random walks with selective entity obscuring create verifiable, multi-hop questions that train deep search agents effectively. DeepDive-32B trained on this data achieves 14.8% on BrowseComp, outperforming larger models through end-to-end multi-turn RL.
Analysis shows iterative graph reasoning evolves toward a stable phase where semantic entropy persistently dominates structural entropy, with ~12% of edges remaining semantically surprising despite structural connection, fueling ongoing discovery.