Can the same description-then-retrieve pattern work for domain adaptation without target data?

This explores whether the trick of writing a text description and using it to retrieve — instead of collecting real examples from the new domain — generalizes as a way to adapt models to domains you have no data for.

This explores whether the description-then-retrieve move — substituting a written description for collected target data — is a reusable adaptation strategy rather than a one-off trick. The corpus suggests it is, and the strongest evidence is that it shows up independently in two different modalities. In text retrieval, a brief domain description alone is enough to synthesize training data and fine-tune a retriever that beats baselines when the target collection is off-limits Can you adapt retrieval models without accessing target data?. In vision, the same shape recurs: describe an unknown image with a vision-language model, then retrieve known references from a text-indexed database — no recognition model training required, and the natural-language description bridges the visual-to-reference gap better than direct embedding similarity Can describing images in text improve zero-shot recognition?. The common thread is that language acts as a portable adapter: a description carries enough structure to stand in for examples you can't gather.

What makes the pattern work is that description and retrieval do different jobs. The description supplies *where* to look (the shape of the target domain), and retrieval supplies *what's actually there* (concrete references). That division echoes a broader finding in the corpus: adaptation works best when you split the fast, textual channel from the slow, weight-level one, routing task-specific lessons into prompts instead of forcing everything into parameter updates Can splitting adaptation into two channels reduce forgetting?. Description-then-retrieve is essentially an extreme version of that split — *all* the adaptation lives in text and retrieval, and the base model's weights never move.

But here's the thing the reader might not expect: the pattern has a hard floor, and it's set by what the model already knows. Description-based methods reorganize and activate existing capability; they cannot inject knowledge that was never in training. Prompt optimization research makes this explicit — no amount of clever prompting compensates for missing foundational knowledge, it only rearranges what's already there Can prompt optimization teach models knowledge they lack?. And when a model's prior associations are strong, supplied context gets overridden entirely unless you intervene below the prompt level Why do language models ignore information in their context?. So a description can point at a genuinely novel domain, but if the underlying model has no representation of that domain's vocabulary or concepts, the description has nothing to activate.

The corpus also offers a quieter alternative worth knowing about: instead of leaning on free-text descriptions, you can transfer through discrete codes. Mapping item text to quantized codes before embedding reduces text bias and transfers across domains more cleanly than encoding the raw text directly Can discrete codes transfer better than text embeddings?. That's a hint that the *description* in description-then-retrieve isn't sacred — what matters is producing a domain-portable intermediate representation, and natural language is one option, not the only one. Where the pattern reliably breaks down is structured, relational targets: long-context models can match retrieval on semantic similarity with no training, but cannot execute queries that require joining across structured tables Can long-context LLMs replace retrieval-augmented generation systems?. The description-then-retrieve family inherits that boundary — it adapts well to domains defined by meaning and similarity, and poorly to domains defined by exact relational structure.

Sources 7 notes

Can you adapt retrieval models without accessing target data?

Research demonstrates that a brief textual domain description suffices to generate synthetic training data for retrieval fine-tuning, outperforming baselines in zero-target-access scenarios and enabling adaptation where conventional methods are blocked.

Can describing images in text improve zero-shot recognition?

SignRAG demonstrates that describing an unknown image via vision-language model, then retrieving known designs from a text-indexed database, eliminates the need for recognition model training. Natural-language description bridges the visual-reference gap better than direct embedding similarity.

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can discrete codes transfer better than text embeddings?

VQ-Rec demonstrates that mapping item text to discrete codes via product quantization, then to embeddings, improves cross-domain transfer compared to direct text encoding. The discrete intermediate reduces text bias and enables efficient per-domain fine-tuning.

Can long-context LLMs replace retrieval-augmented generation systems?

The LOFT benchmark shows LCLMs match RAG on semantic retrieval without explicit training, but cannot execute relational queries requiring joins across structured tables. Context length alone cannot bridge this gap.

Can the same description-then-retrieve pattern work for domain adaptation without target data?

Sources 7 notes

Next inquiring lines