INQUIRING LINE

Why does GraphRAG prioritize corpus completeness while LogicRAG prioritizes query adaptivity?

This explores a design tension in retrieval systems — building one complete knowledge structure up front (the GraphRAG instinct) versus shaping retrieval around what each query actually needs (the query-adaptive instinct) — and the corpus speaks to that tension even though it doesn't use those exact two product names.


This explores why some retrieval systems invest in mapping the whole corpus into a rich structure while others spend their effort deciding, per query, how much and what kind of retrieval to do. The split isn't arbitrary — the corpus suggests these are two answers to a single question: where do you pay the cost of being right?

The completeness-first instinct comes from a real payoff. Building a graph over the entire corpus lets you answer questions that require connecting facts no single passage states — the joins and multi-hop links that flat retrieval misses. The corpus shows exactly where this matters: long-context models can match retrieval on plain semantic lookup but collapse on relational queries that need joins across structured data, because context length alone can't reconstruct relationships Can long-context LLMs replace retrieval-augmented generation systems?. A complete graph is how you buy those relationships in advance. But completeness has a hidden bill: the same structure that amplifies connection also amplifies corruption. Editing under 0.05% of source words can drop GraphRAG QA accuracy from 95% to 50%, because the graph propagates small perturbations through its topology How vulnerable is GraphRAG to tiny text manipulations?. When you commit to one global structure, you inherit its fragility everywhere.

The adaptivity-first instinct rejects the idea that one structure fits every query. StructRAG makes this explicit: a trained router picks among tables, graphs, algorithms, catalogues, or plain chunks depending on what the query demands, and this beats applying any single structure uniformly — grounded in the cognitive idea that the right representation depends on the task Can routing queries to task-matched structures improve RAG reasoning?. The corpus pushes this further toward minimalism: calibrated uncertainty from the model's own token probabilities decides *when* to retrieve at all, beating elaborate adaptive schemes at a fraction of the compute Can simple uncertainty estimates beat complex adaptive retrieval?. Adaptivity says the expensive thing isn't building structure — it's matching structure to the moment.

The deeper reason the two priorities diverge is that they're optimizing different failure modes. A pattern that runs through the corpus is that partial, selective formalization often beats both extremes: enriching natural language with *some* symbolic structure outperforms both pure language and full formalization, because full formalization loses semantic information while raw language lacks structure Why does partial formalization outperform full symbolic logic?. Read that onto retrieval, and a completeness-first system is the 'full formalization' end — maximally structured, maximally brittle — while a query-adaptive system is the bet that you should formalize only as much as a given question requires. Neither is wrong; they price the same trade-off differently.

What ties it together is the field's emerging consensus that retrieval and reasoning shouldn't be separate stages at all: retrieval should adapt dynamically rather than follow fixed patterns, and reasoning has to be coupled tightly to it How should systems retrieve and reason with external knowledge?. Under that lens, 'corpus completeness vs query adaptivity' stops being a rivalry and becomes a dial — how much do you precompute versus decide live? The interesting takeaway is that the apparent winner depends on your threat model: if you fear missing connections, you build the graph; if you fear paying for structure you don't need (or inheriting structure you can't defend), you adapt per query.


Sources 6 notes

Can long-context LLMs replace retrieval-augmented generation systems?

The LOFT benchmark shows LCLMs match RAG on semantic retrieval without explicit training, but cannot execute relational queries requiring joins across structured tables. Context length alone cannot bridge this gap.

How vulnerable is GraphRAG to tiny text manipulations?

Two knowledge poisoning attacks modify fewer than 0.05% of source words to reduce QA accuracy from 95% to 50%. The attacks exploit GraphRAG's reliance on LLM extraction, which amplifies small perturbations through graph topology.

Can routing queries to task-matched structures improve RAG reasoning?

StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

Why does partial formalization outperform full symbolic logic?

QuaSAR and Logic-of-Thought both achieve 4-8% accuracy gains by enriching natural language with selective symbolic elements rather than replacing it. Full formalization loses semantic information; pure language lacks structure. Augmentation preserves both.

How should systems retrieve and reason with external knowledge?

Research shows retrieval should adapt dynamically rather than follow fixed patterns, reasoning and retrieval must integrate closely, and embedding-based retrieval has fundamental limits requiring architectural alternatives.

Next inquiring lines