Knowledge Retrieval and RAG

Can you adapt retrieval models without accessing target data?

Explores whether dense retrieval systems can adapt to new domains using only a textual description, rather than actual target documents—especially relevant for privacy-restricted or competitive scenarios.

Note · 2026-02-22 · sourced from RAG
RAG How do you build domain expertise into general AI models? How should researchers navigate LLM reasoning research?

Dense retrieval models require labeled query-document pairs to adapt to new domains. In many enterprise contexts, the target collection is unavailable: it may not exist yet, it may be legally restricted (medical records, financial data), or sharing it with a model provider would compromise competitive advantage.

The standard assumption — you need the data to train for the domain — turns out to be false for retrieval. A brief textual description of the target domain is sufficient.

The pipeline: (1) Provide a textual domain description. (2) Use instruction-following LLMs to extract domain properties: document topics, linguistic attributes, source characteristics, terminology patterns. (3) Generate seed documents matching those properties. (4) Iteratively retrieve real-domain-like documents using the seed as query anchor. (5) Generate synthetic queries for the constructed collection. (6) Use pseudo-relevance labels to fine-tune the retrieval model.

The retrieval-augmented approach to domain understanding is key: at step (2), the domain description itself becomes a RAG query to extract structured properties, which are then used to parameterize generation at step (3). Bootstrapping from description through synthesis to training.

Evaluation on five diverse target domains shows that description-based adaptation outperforms existing dense retrieval baselines in the zero-target-access scenario. The approach enables adaptation in precisely the contexts where conventional adaptation is blocked: privacy-sensitive domains, legally restricted data, competitive scenarios.


Source: RAG

Related concepts in this collection

Concept map
14 direct connections · 87 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

domain adaptation for retrieval is possible without target collection via description-based synthetic data