INQUIRING LINE

Does retrieval quality depend more on access structure or write gating?

This explores a framing the corpus doesn't use literally — reading 'access structure' as how retrieval is organized and routed (hierarchies, query planning, document maps, verification stages) and 'write gating' as control over what's allowed into the retrieval path (poisoning defenses, deciding when to retrieve at all). The collection leans hard toward access structure as the dominant lever, with write gating mattering most under adversarial pressure.


Reading the question as access structure (how retrieval is organized and routed) versus write gating (control over what enters the retrieval path), the corpus weights the answer firmly toward structure — but with a sharp exception worth knowing about. The strongest claim is that retrieval failures are architectural, not incremental: systems break at adaptive triggering, at the gap between embedding similarity and actual task relevance, and at hard mathematical limits on what a fixed embedding dimension can even represent Where do retrieval systems fail and why?. None of those are fixed by gating writes more carefully; they're fixed by changing the access architecture itself.

What that restructuring looks like recurs across several notes. Separating query planning from answer synthesis into distinct stages reduces interference and beats flat pipelines on multi-hop questions Do hierarchical retrieval architectures outperform flat ones on complex queries?. Summarizing a document first and conditioning retrieval on that global map recovers discourse structure that bag-of-chunks retrieval destroys, so scattered evidence becomes findable by its role rather than surface wording Can building a document map first improve retrieval over long texts?. Adding a learned verifier over full token-token interaction maps catches structural near-misses that compressed-vector similarity waves through Can verification separate structural near-misses from topical matches?. These are all access-structure moves — they change how the system reaches the right text, not what's allowed in.

The most interesting wrinkle is that the optimal structure isn't fixed; it tracks the surrounding hardware. As context windows grew, the best design shifted from precise small-chunk retrieval toward coarse ranking plus deep reading by a long-context reader Can long-context models resolve retriever-reader imbalance?. And the real long-context bottleneck turns out to be compute — the work of consolidating context into internal state — not storage capacity Is long-context bottleneck really about memory or compute?. There's even a provable floor here: transformers can copy and retrieve from context in ways fixed-state space models fundamentally cannot Can state-space models match transformers at copying and retrieval?. Access quality is partly baked into the architecture before any retrieval logic runs.

Write gating earns its keep in one place: adversarial robustness. When the corpus can be poisoned, lightweight retrieval-time defenses that partition retriever influence or flag documents by abnormal similarity collapse stop bad writes from dominating answers — no retraining needed Can we defend RAG systems from corpus poisoning without retraining?. There's also a softer kind of gating: deciding whether to retrieve at all. Framing each reasoning step as a choice between external retrieval and the model's own parametric knowledge gave a 22% accuracy jump, mostly by *not* writing unnecessary external noise into the context When should language models retrieve external knowledge versus use internal knowledge?.

So the corpus's verdict: structure sets the ceiling, gating protects the floor. Most retrieval quality is won or lost in how the system is organized to reach evidence — and the surprising part is that the same query can need a *different* structure as the model around it changes, while quieter levers like domain-description-only adaptation Can you adapt retrieval models without accessing target data?, fine-tuning that absorbs query augmentation Can fine-tuning replace query augmentation for retrieval?, and temporal scoring Can retrieval systems ground answers in the right time? each reshape access without touching the gate at all.


Sources 12 notes

Where do retrieval systems fail and why?

RAG systems fail at three structural levels: adaptive triggering (fixed intervals waste context), semantic-task mismatch (embeddings measure association, not relevance), and mathematical limits (embedding dimension constrains representable document sets). These require fundamentally different retrieval approaches, not tuning.

Do hierarchical retrieval architectures outperform flat ones on complex queries?

Separating query planning from answer synthesis into distinct components reduces interference and improves multi-hop query performance. This architectural principle mirrors documented benefits of separating planning from execution in agent design.

Can building a document map first improve retrieval over long texts?

MiA-RAG inverts standard RAG by summarizing documents first, then conditioning retrieval on that global view. This approach recovers discourse structure that bag-of-chunks retrieval destroys, making scattered evidence findable by their document role rather than surface similarity alone.

Can verification separate structural near-misses from topical matches?

A two-stage pipeline—pooled-cosine recall followed by a small Transformer verifier operating on token-token similarity maps—reliably rejects structural near-misses that MaxSim-style late interaction cannot. The verifier succeeds because it operates on full token interaction patterns rather than compressed vectors.

Can long-context models resolve retriever-reader imbalance?

LongRAG shows that 4K-token units and long-context readers outperform 100-word retrieval on standard benchmarks. The optimal RAG design shifts from precise retrieval to coarse ranking plus deep reading as context windows expanded.

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Can state-space models match transformers at copying and retrieval?

Two-layer transformers can copy exponentially long strings while state-space models are fundamentally limited by their fixed-size latent state. Empirically, transformers dramatically outperform SSMs at copying and context retrieval in both synthetic and pretrained settings.

Can we defend RAG systems from corpus poisoning without retraining?

RAGPart and RAGMask provide lightweight, retraining-free defenses that operate at the retrieval layer. RAGPart bounds poisoned-document influence via partitioned retriever learning; RAGMask flags suspicious documents through abnormal similarity collapse under token masking.

When should language models retrieve external knowledge versus use internal knowledge?

DeepRAG models each reasoning step as a Markov Decision Process where the model learns when to retrieve versus rely on parametric knowledge. The 21.99% improvement comes from better-targeted retrieval and elimination of noise from unnecessary external knowledge.

Can you adapt retrieval models without accessing target data?

Research demonstrates that a brief textual domain description suffices to generate synthetic training data for retrieval fine-tuning, outperforming baselines in zero-target-access scenarios and enabling adaptation where conventional methods are blocked.

Can fine-tuning replace query augmentation for retrieval?

Fine-tuned semantic search models trained on implicit queries match the performance of augmented pretrained retrievers without expanding input length. The model learns to resolve ambiguity through training rather than requiring explicit augmentation.

Can retrieval systems ground answers in the right time?

TempRALM adds a temporal term to retrieval scoring alongside semantic similarity, achieving up to 74% improvement over baseline systems when documents have multiple time-stamped versions. The approach requires no model retraining or index changes.

Next inquiring lines