Can knowledge graphs built at inference time outperform pre-built retrieval augmented generation?

This explores whether building knowledge graphs on the fly — at the moment a question is asked — beats the standard approach of constructing them in advance, and what the corpus says about the tradeoff between the two.

This explores whether building knowledge graphs on the fly — at the moment a question is asked — beats the standard approach of constructing them in advance. The corpus has a direct answer and a more interesting set of surrounding ideas about *why* it might. The most on-the-nose finding is LogicRAG, which constructs a directed graph from the query itself at inference time rather than pre-building one across the whole corpus Can query-time graph construction replace pre-built knowledge graphs?. The pitch is that pre-built graphs carry three taxes: the cost of constructing them, staleness as the underlying data changes, and inflexibility — a graph built for the average query isn't shaped for *your* query. Building per-query sidesteps all three while keeping the multi-hop reasoning that graphs are good for.

But the corpus reframes the question in a useful way: the real choice may not be "build early vs. build late" so much as "match the structure to the task." StructRAG argues that no single knowledge structure is universally best — it trains a router to pick among tables, graphs, algorithms, catalogues, and plain chunks depending on what the query demands, grounding this in cognitive-fit theory from psychology Can routing queries to task-matched structures improve RAG reasoning?. Seen this way, inference-time construction wins precisely *because* it can be query-specific, not because graphs beat RAG in the abstract. That theme echoes across the collection: retrieval should adapt dynamically and stay tightly coupled to reasoning rather than follow a fixed pipeline How should systems retrieve and reason with external knowledge?, and separating query planning from answer synthesis into distinct stages outperforms flat retrieval on hard multi-hop questions Do hierarchical retrieval architectures outperform flat ones on complex queries?.

The catch is that pre-built graphs aren't just overhead to be eliminated — their explicit structure is doing real work. SymAgent derives symbolic navigational rules from a graph's topology, beating retrieval methods that lean only on semantic similarity, because the graph encodes relationships that embeddings blur together Can symbolic rules from knowledge graphs guide complex reasoning?. And there's a hard limit on what you can skip: long-context LLMs can absorb a corpus and match RAG on semantic lookup, but they collapse on structured relational queries that require joining across tables — context length alone can't fake structure Can long-context LLMs replace retrieval-augmented generation systems?. So a query-time graph still has to actually reconstruct the relational scaffolding; it just does so on demand instead of in advance.

The more surprising thread is that pre-built knowledge graphs may earn their keep not at retrieval time at all, but at *training* time. One line of work fine-tunes a 32B model on 24,000 reasoning tasks walked out of a medical knowledge graph and reaches state-of-the-art across 15 domains — the conclusion being that structured composition matters more than raw scale Can knowledge graphs teach models deep domain expertise?. Another uses random walks through a graph, with entities selectively blurred, to mint hard multi-hop questions that train search agents Can knowledge graphs generate training data for search agents?. So the deeper answer is that "inference-time vs. pre-built" may be a false binary: build the graph once to *teach* the model the shape of a domain, then build lightweight graphs per-query to *navigate* it.

What you didn't know you wanted to know: the strongest argument for inference-time construction isn't speed or cost — it's that a fresh per-query graph can be shaped to the exact reasoning the question needs, which is the same insight (match the structure to the task) that the whole adaptive-retrieval literature keeps rediscovering from different angles.

Sources 8 notes

Can query-time graph construction replace pre-built knowledge graphs?

LogicRAG constructs directed acyclic graphs from queries at inference time rather than pre-building corpus-wide graphs, eliminating construction overhead, avoiding staleness, and enabling query-specific retrieval logic without sacrificing multi-hop reasoning capability.

Can routing queries to task-matched structures improve RAG reasoning?

StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.

How should systems retrieve and reason with external knowledge?

Research shows retrieval should adapt dynamically rather than follow fixed patterns, reasoning and retrieval must integrate closely, and embedding-based retrieval has fundamental limits requiring architectural alternatives.

Do hierarchical retrieval architectures outperform flat ones on complex queries?

Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.

Can symbolic rules from knowledge graphs guide complex reasoning?

SymAgent derives symbolic rules from KG structure using LLM reasoning to create navigational plans that align natural language with graph topology. This approach captures structural reasoning patterns explicitly, outperforming retrieval methods that rely on semantic similarity alone.

Can long-context LLMs replace retrieval-augmented generation systems?

The LOFT benchmark shows LCLMs match RAG on semantic retrieval without explicit training, but cannot execute relational queries requiring joins across structured tables. Context length alone cannot bridge this gap.

Can knowledge graphs teach models deep domain expertise?

Fine-tuning a 32B model on 24,000 reasoning tasks derived from medical knowledge graph paths produces state-of-the-art performance across 15 medical domains, demonstrating that structured knowledge composition matters more than scale.

Can knowledge graphs generate training data for search agents?

KG-based random walks with selective entity obscuring create verifiable, multi-hop questions that train deep search agents effectively. DeepDive-32B trained on this data achieves 14.8% on BrowseComp, outperforming larger models through end-to-end multi-turn RL.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing the durability of findings on inference-time knowledge graphs vs. pre-built RAG (2024–2025 claims). The question remains open: does query-time graph construction outperform static retrieval-augmented generation?

What a curated library found — and when (dated claims, not current truth):
Findings span 2024–2025; treat as perishable.
• LogicRAG sidesteps pre-built-graph taxes (construction cost, staleness, inflexibility) by constructing directed graphs per query at inference time (2025-08).
• StructRAG shows no single knowledge structure is universally best; a router trained via cognitive-fit theory picks among tables, graphs, algorithms, catalogues, chunks per query, and inference-time construction wins because it is query-specific (2024-10).
• SymAgent derives symbolic navigational rules from graph topology, beating pure semantic-similarity retrieval; embeddings blur relational structure that explicit graphs preserve (2025-02).
• Long-context LLMs subsume semantic retrieval but collapse on structured relational queries requiring table joins; context length alone cannot replace relational scaffolding (2024-06).
• Pre-built graphs may earn their keep at *training* time: fine-tuning on 24,000 reasoning tasks from a medical KG reaches SOTA across 15 domains; random walks with entity blurring generate hard multi-hop training data (2025-07).

Anchor papers (verify; mind their dates):
• arXiv:2508.06105 — You Don't Need Pre-built Graphs for RAG (2025-08)
• arXiv:2410.08815 — StructRAG (2024-10)
• arXiv:2502.03283 — SymAgent (2025-02)
• arXiv:2507.13966 — Bottom-up Domain-specific Superintelligence (2025-07)

Your task:
(1) RE-TEST EACH CONSTRAINT. For LogicRAG's three taxes and SymAgent's relational-preservation claim, judge whether newer models, training methods (e.g., reasoning-focused fine-tuning, reinforcement learning on graph traversal), or evaluation harnesses (e.g., compositional-reasoning benchmarks, multi-hop datasets) have since tightened or relaxed these limits. Separate the durable question (does per-query structure beat static structure?) from perishable implementation details (cost, latency of graph construction).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months — any paper showing pre-built graphs outperform or subsume adaptive query-time construction, or demonstrating that reasoning models obviate explicit graph routing.
(3) Propose 2 research questions that ASSUME the regime may have moved: e.g., does agentic reasoning (long rollouts, self-correction) reduce the need for structured knowledge layout? Can learned graph-routers (vs. fixed pipelines) now match hand-designed symbolic rules?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can knowledge graphs built at inference time outperform pre-built retrieval augmented generation?

Sources 8 notes

Next inquiring lines