Knowledge Retrieval and RAG

Can fine-tuning replace query augmentation for retrieval?

Query augmentation helps retrievers handle ambiguous queries but increases input cost. Does fine-tuning the retrieval model achieve comparable performance without this overhead?

Note · 2026-02-22 · sourced from RAG
RAG How should researchers navigate LLM reasoning research?

CoT query augmentation for RAG works by generating additional context before retrieval — a chain-of-thought that expands an ambiguous query into richer text that retrieval models can match against. This helps. For pretrained retrievers encountering underspecified queries, the additional context closes the gap between what was asked and what is actually needed.

The catch: CoT augmentation increases input sequence length. Longer inputs to the LLM cost more, and retrieval quality is sensitive to where relevant information falls in the context window. The augmentation adds a cost in exchange for a performance gain.

Context Tuning for RAG demonstrates that fine-tuning the retrieval model removes this trade-off. A fine-tuned semantic search model trained on implicit queries achieves comparable retrieval performance without CoT augmentation. When fine-tuning is applied, adding CoT produces only marginal additional gain — the model has already learned to bridge the ambiguity gap from training.

The mechanism: pretrained retrievers struggle with ambiguous/implicit queries because they were trained on explicit query-document pairs. Fine-tuning on implicit queries with usage signals (frequency, history, geo-temporal correlation) teaches the model to resolve ambiguity from context rather than requiring it to be spelled out.

This is an instance of a recurring pattern across LLM research: inference-time workarounds (chain-of-thought, query augmentation) represent the gap between what a model can do and what the task requires. Fine-tuning can close that gap and retire the workaround. The workaround's cost is then avoidable.

The practical corollary: query augmentation strategies should be evaluated against fine-tuned retrieval baselines, not just pretrained baselines. The augmentation is solving a training distribution problem, not an inherent query complexity problem.


Source: RAG

Related concepts in this collection

Concept map
14 direct connections · 135 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

fine-tuning the retrieval model eliminates the need for query augmentation