How do knowledge graphs scale as training data for open-ended search tasks?
This explores whether knowledge graphs are a good source of synthetic training data for agents that do open-ended, multi-hop search — and how well that approach holds up as you scale it.
This explores whether knowledge graphs are a good source of synthetic training data for agents that do open-ended, multi-hop search — and how well that approach holds up as you scale it. The most direct answer in the corpus is yes, and the trick is making the questions genuinely hard. Random walks across a knowledge graph naturally generate multi-hop questions with verifiable answers, but if entities are named plainly the questions are too easy to look up. Selectively blurring entities forces an agent to actually reason and search across hops — and this is what lets a 32B model trained on synthetic graph data beat much larger models on hard browsing benchmarks Can knowledge graphs generate training data for search agents?. The scaling story here isn't 'more data,' it's that graph structure lets you manufacture difficulty on demand, cheaply and with built-in answer checking.
That last point — verifiable answers — is why graphs pair so well with reinforcement learning. The deeper pattern across the corpus is that structured knowledge consistently beats raw text volume. A medical knowledge-graph curriculum of reasoning tasks produces domain expertise that scale alone doesn't Can knowledge graphs teach models deep domain expertise?, and organizing training chunks into a taxonomy reaches half of full-corpus performance using a fraction of a percent of the data Can organizing knowledge structures beat raw training data volume?. The reason is that the model learns where a fact sits in a conceptual structure rather than memorizing surface patterns — closer to how a student learns from a textbook than from flashcards.
But 'scale' cuts two ways, and the corpus is interesting on the cost of the graphs themselves. Pre-building a corpus-wide knowledge graph is expensive and goes stale; one line of work builds small query-specific logic graphs at inference time instead, keeping the multi-hop reasoning while dropping the construction overhead Can query-time graph construction replace pre-built knowledge graphs?. And once a graph is large, you can't read all of it — so learned traversal policies using tree search and RL let an agent walk the graph selectively within a context window, trading certainty about the whole graph for tractable navigation Can learned traversal policies beat exhaustive graph reading?. Symbolic rules pulled from graph topology can serve as navigation plans that align plain-language questions with the graph's actual structure Can symbolic rules from knowledge graphs guide complex reasoning?.
The part you might not expect: graphs aren't always the right structure, and search itself behaves like a scaling axis. Routing each query to the knowledge structure that fits it — sometimes a graph, sometimes a table or a plain catalogue — beats forcing everything through graphs uniformly Can routing queries to task-matched structures improve RAG reasoning?. And for the open-ended search task itself, the number of search iterations an agent spends shows the same diminishing-returns curve as reasoning tokens, meaning search budget is a tunable inference-compute dial, not just a fixed retrieval step Does search budget scale like reasoning tokens for answer quality?. If you want the full loop of training search agents cheaply, the corpus also has work on simulating the search engine entirely from an LLM's internal knowledge to avoid API costs during RL Can LLMs replace search engines during agent training? — a natural companion to graph-generated questions, since together they give you both the questions and the answers without paying for either.
Sources 9 notes
KG-based random walks with selective entity obscuring create verifiable, multi-hop questions that train deep search agents effectively. DeepDive-32B trained on this data achieves 14.8% on BrowseComp, outperforming larger models through end-to-end multi-turn RL.
Fine-tuning a 32B model on 24,000 reasoning tasks derived from medical knowledge graph paths produces state-of-the-art performance across 15 medical domains, demonstrating that structured knowledge composition matters more than scale.
StructTuning achieves 50% of full-corpus performance using only 0.3% of training data by organizing chunks into auto-generated domain taxonomies. The model learns knowledge position within conceptual structures rather than raw text patterns, matching how students learn from textbooks.
LogicRAG constructs directed acyclic graphs from queries at inference time rather than pre-building corpus-wide graphs, eliminating construction overhead, avoiding staleness, and enabling query-specific retrieval logic without sacrificing multi-hop reasoning capability.
Graph-O1 replaces whole-graph ingestion with step-by-step agentic navigation using Monte Carlo Tree Search and reinforcement learning. This approach fits within LLM context windows while learning domain-specific traversal policies, though it trades certainty about the full graph for decision-making under uncertainty.
SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.
StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.
Agentic deep research shows monotonic-to-diminishing-returns curves for search iterations, matching reasoning token scaling. This creates a new inference-compute axis: models can trade off reasoning budget against search budget to optimize answer quality.
ZeroSearch and SSRL demonstrate that LLMs can generate relevant documents and search results from internal knowledge, with 14B simulators matching or exceeding real search engines. Curriculum degradation and test-time scaling optimize this approach for training without API costs.