Reasoning and Knowledge Reasoning and Learning Architectures

Does training for compositional sensitivity hurt dense retrieval?

Dense retrieval excels at topical recall but struggles with meaning-level distinctions. Adding structure-targeted negatives during training might improve compositional sensitivity—but at what cost to overall retrieval performance?

Note · 2026-05-18 · sourced from Training Fine Tuning

Dense retrieval — compress text into a single vector, rank by cosine similarity — is efficient for topical recall but brittle for identity-level matching. Minimal compositional edits (negation, role swaps, word reordering) flip the meaning of a sentence while retaining high cosine similarity to the original. The natural fix is to train with structure-targeted negatives: hard examples that look similar lexically but mean something different.

The empirical finding from Training for Compositional Sensitivity Reduces Dense Retrieval Generalization is that this fix is zero-sum. Across four dual-encoder backbones, adding structure-targeted negatives consistently reduces zero-shot NanoBEIR retrieval performance — 8-9% mean nDCG@10 drop on small backbones, up to 40% on medium ones — while only partially improving the structural discrimination that motivated the change. The model learns to reject some permutations but loses ground on broad topical retrieval.

This is a geometric trade-off, not a training-recipe artifact. Pooled-cosine embedding requires that all meaningful distinctions live in a single high-dimensional vector. Allocating representational margin to reject meaning-changing near-misses (the structural sensitivity) competes with the margin available for coarse content grouping (the topical sensitivity). The vector cannot do both simultaneously; pushing one capability gains capacity for it by surrendering capacity for the other.

The implication for retrieval system design is that dense retrieval has a structural ceiling on what it can do single-handed. Methods that try to add compositional sensitivity to the dense pipeline will pay for it elsewhere. This is not a hyperparameter to tune; it is a fundamental geometric constraint of unit-sphere cosine spaces.

The productive response is architectural rather than training-recipe-tuning. Treat dense retrieval as a recall stage — broad topical filtering at scale — and add a separate verification stage for compositional sensitivity. The retrieval stage no longer needs to be compositionally sensitive; the verification stage handles structural discrimination on the filtered candidate set. This decomposition matches dense retrieval to what it does well and adds a downstream component where dense fails.

Related concepts in this collection

Concept map
13 direct connections · 108 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

dense retrieval has a retrieval-composition tension — training for compositional sensitivity zero-sum trades against broad topical retrieval