Why can't cosine space retrievers distinguish word order?
Dense retrievers using unit-sphere cosine spaces struggle to capture non-commutative linguistic structures like negation and role reversal. Understanding this geometric constraint explains why training fixes have limited reach in compositional retrieval.
The retrieval-composition tension in dense retrieval is not just a training artifact — it has a geometric explanation. Kang et al. (2025) argue that unit-sphere cosine spaces force conceptual clusters into linear superposition. Linear superposition is commutative by construction: a + b = b + a. This makes the geometry hostile to non-commutative structures like negation ("not X" vs "X"), role reversal ("the dog bit the man" vs "the man bit the dog"), or word order ("X happens before Y" vs "Y happens before X").
A pooled-cosine dual encoder compresses variable-length reasoning into a single vector. The geometry of that vector space is structurally incompatible with the relational distinctions natural language constantly makes. Even the canonical example — distinguishing "the dog bit the man" from "the man bit the dog" — is hard for pooled-cosine retrieval. The two sentences have nearly identical lexical content, and the geometric encoding cannot represent the structural difference robustly.
This identifies why training-recipe fixes have limited reach. Adding structure-targeted negatives during training can improve the discrimination locally but cannot overcome the underlying geometric constraint — the unit-sphere cosine space remains hostile to non-commutativity regardless of how the training objective is shaped. The constraint lives in the space, not in the training procedure.
The implication is methodological. If the goal is faithful retrieval that respects compositional meaning, the architecture has to leave unit-sphere cosine space at some point. Token-level interaction (MaxSim, ColBERT), graph structures, or downstream verifiers operating on richer representations are routes around the constraint. Single-vector retrieval is the wrong tool for compositional sensitivity; pretending otherwise wastes training compute on a problem the architecture cannot solve.
The deeper observation is that representation geometry constrains what the space can represent independent of training. Designing for capability requires choosing a geometry that admits the relations you care about — and unit-sphere cosine, despite its efficiency advantages, admits a narrower set of relations than natural language uses.
Related concepts in this collection
-
Does training for compositional sensitivity hurt dense retrieval?
Dense retrieval excels at topical recall but struggles with meaning-level distinctions. Adding structure-targeted negatives during training might improve compositional sensitivity—but at what cost to overall retrieval performance?
same paper, the empirical trade-off this geometry explains
-
Can verification separate structural near-misses from topical matches?
Should retrieval pipelines use a separate verification stage to detect structural errors that dense retrievers miss? This explores whether splitting retrieval and verification solves the compositional sensitivity problem.
same paper, the architectural escape from the constraint
-
Can language models learn meaning without engaging the world?
Explores whether LLMs prove that meaning emerges from relational structure alone, independent of embodied experience or external reference. Tests structuralist theory empirically.
adjacent: relational structure in language requires non-commutative representation; dense cosine fails this requirement
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
unit-sphere cosine spaces are geometrically hostile to non-commutative structures like negation and word order — single-vector retrieval cannot distinguish dog-bit-man from man-bit-dog