Knowledge Retrieval and RAG

Can learned traversal policies beat exhaustive graph reading?

As knowledge graphs grow, can agents learn which nodes to explore rather than ingesting entire subgraphs? This explores whether MCTS and reinforcement learning can solve the context-window constraint better than dumping whole graphs into the LLM.

Note · 2026-05-03 · sourced from 12 types of RAG

Naive GraphRAG dumps the relevant subgraph into the LLM's context, which works for small knowledge graphs but breaks at scale: even moderate-sized graphs blow past context limits, and most of what gets passed in is irrelevant to the query. Graph-O1 reframes graph reasoning as an agentic search problem. Instead of reading the whole graph, an agent uses Monte Carlo Tree Search to select promising nodes and edges to explore step by step, and reinforcement learning trains the policy that decides which expansions are worthwhile.

This trades one constraint for another: the LLM no longer has to ingest the whole graph but does have to make navigation decisions under uncertainty about what lies beyond each unexplored edge. MCTS is the right tool for this because it natively handles the explore-exploit problem — it can commit cheap rollouts to evaluating whether a branch is worth deeper traversal — and RL adapts the policy to the specific graph topology and query distribution rather than relying on a generic heuristic.

The general lesson extends beyond graphs. As context windows become the binding constraint for retrieval-heavy reasoning, the architectural pressure shifts from "fit more in" to "decide what not to read." Agentic traversal with learned policies is a way to do that decision making well, and the principle should generalize to any retrieval space where exhaustive exposure is infeasible. Does reasoning ability actually degrade with longer inputs? gives an even stronger reason to selectively read — even when content fits, reasoning over it degrades with irrelevant material present.


Source: 12 types of RAG

Related concepts in this collection

Concept map
13 direct connections · 95 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

MCTS plus RL replaces whole-graph reading with selective traversal in GraphRAG — context-window limits make exhaustive graph exposure infeasible at scale