Can other RAG hyperparameters like chunk size be learned through generator feedback?

This explores whether the trick behind learnable RAG — using the generator's success as a training signal — extends from the parameters researchers have already made adaptive (document count, ordering, retrieval timing) to ones still usually hand-set, like chunk size.

This reads the question as asking whether chunk size belongs to the growing family of RAG settings that no longer have to be fixed by hand, but can instead be tuned by feedback from whether the final answer came out right. The corpus doesn't contain a paper that learns chunk size directly — but it contains the exact machinery that would make it possible, applied to almost every neighboring knob, which is the more useful thing to know.

The clearest precedent is DynamicRAG, which throws out the fixed top-k assumption entirely and trains a reranker as a reinforcement-learning agent whose reward is the quality of the generator's output — letting it learn both how many documents and in what order to pass along, per query Can document count be learned instead of fixed in RAG?. That's the template: take a setting normally frozen at config time, wire the generator's success back as a reward, and let the system calibrate it to query complexity. Chunk size is structurally the same kind of knob, so there's no in-principle reason the same loop couldn't tune it.

The deeper version of this idea is CLaRa, which doesn't just adjust a discrete count but propagates the generator's loss back through continuous document representations, so retrieval learns to favor documents that *actually help answer the question* rather than ones that merely look similar Can retrieval learn what actually helps answer questions?. This matters for chunking specifically: chunk size is really a proxy for "how much context is the useful unit?" — and CLaRa's whole point is that usefulness and surface relevance diverge. A system optimizing chunk boundaries on generator feedback would be closing that same gap from a different angle. StructRAG pushes the idea even further out, learning to pick the *form* of knowledge — table, graph, chunk, catalogue — based on query demands via a trained router Can routing queries to task-matched structures improve RAG reasoning?. Once you can learn the structure type, learning the granularity within "chunk" is a smaller step.

Two caveats the corpus surfaces. First, *what* feedback signal you use matters: process-level supervision on intermediate retrieval steps substantially beats rewarding only the final answer, because a single end-of-pipeline reward is a noisy teacher for a multi-part decision Does supervising retrieval steps outperform final answer rewards?. Chunk size affects retrieval early, so it might learn better from step-level than outcome-only feedback. Second, there's a counter-current worth respecting: simple calibrated uncertainty estimates often beat elaborate learned adaptive-retrieval schemes at a fraction of the cost Can simple uncertainty estimates beat complex adaptive retrieval?. The lesson isn't "don't learn chunk size" — it's that a learned knob has to clear the bar of a good cheap heuristic before it's worth the training.

The thing you might not have known you wanted: across this corpus, "hyperparameter" is quietly becoming the wrong word. Document count, ordering, retrieval triggering, knowledge structure, and the retriever-generator boundary itself How should systems retrieve and reason with external knowledge? are all migrating from things-you-set to things-the-system-learns from how well it answered. Chunk size is simply the next obvious resident of that list — and the techniques to move it there already exist in pieces.

Sources 6 notes

Can document count be learned instead of fixed in RAG?

DynamicRAG trains a reranker as an RL agent using LLM output quality as reward, learning to adjust both document ordering and count for each query. Two-phase training with behavior cloning followed by RL with generator feedback enables the agent to calibrate document selection to query complexity.

Can retrieval learn what actually helps answer questions?

CLaRa propagates generator loss back through continuous document representations, allowing retrievers to optimize for documents that actually improve answers rather than surface similarity. The gap between relevance and usefulness closes when retrieval receives direct feedback from generation success.

Can routing queries to task-matched structures improve RAG reasoning?

StructRAG demonstrates that selecting knowledge structure type based on query demands—via DPO-trained router choosing among tables, graphs, algorithms, catalogues, and chunks—improves knowledge-intensive reasoning over standard retrieval. The approach grounds this in cognitive load and cognitive fit theory from cognitive science.

Does supervising retrieval steps outperform final answer rewards?

Fine-grained feedback on intermediate retrieval steps significantly boosts agentic RAG performance compared to final-answer-only rewards. DPO trained with both positive and negative step feedback outperforms PPO and single-direction training by directly contrasting good and bad retrieval chains.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

How should systems retrieve and reason with external knowledge?

Research shows retrieval should adapt dynamically rather than follow fixed patterns, reasoning and retrieval must integrate closely, and embedding-based retrieval has fundamental limits requiring architectural alternatives.

Can other RAG hyperparameters like chunk size be learned through generator feedback?

Sources 6 notes

Next inquiring lines