Language Understanding and Reasoning Reasoning and Knowledge Reasoning and Learning Architectures

Why can't cosine space retrievers distinguish word order?

Dense retrievers using unit-sphere cosine spaces struggle to capture non-commutative linguistic structures like negation and role reversal. Understanding this geometric constraint explains why training fixes have limited reach in compositional retrieval.

Note · 2026-05-18 · sourced from Training Fine Tuning

The retrieval-composition tension in dense retrieval is not just a training artifact — it has a geometric explanation. Kang et al. (2025) argue that unit-sphere cosine spaces force conceptual clusters into linear superposition. Linear superposition is commutative by construction: a + b = b + a. This makes the geometry hostile to non-commutative structures like negation ("not X" vs "X"), role reversal ("the dog bit the man" vs "the man bit the dog"), or word order ("X happens before Y" vs "Y happens before X").

A pooled-cosine dual encoder compresses variable-length reasoning into a single vector. The geometry of that vector space is structurally incompatible with the relational distinctions natural language constantly makes. Even the canonical example — distinguishing "the dog bit the man" from "the man bit the dog" — is hard for pooled-cosine retrieval. The two sentences have nearly identical lexical content, and the geometric encoding cannot represent the structural difference robustly.

This identifies why training-recipe fixes have limited reach. Adding structure-targeted negatives during training can improve the discrimination locally but cannot overcome the underlying geometric constraint — the unit-sphere cosine space remains hostile to non-commutativity regardless of how the training objective is shaped. The constraint lives in the space, not in the training procedure.

The implication is methodological. If the goal is faithful retrieval that respects compositional meaning, the architecture has to leave unit-sphere cosine space at some point. Token-level interaction (MaxSim, ColBERT), graph structures, or downstream verifiers operating on richer representations are routes around the constraint. Single-vector retrieval is the wrong tool for compositional sensitivity; pretending otherwise wastes training compute on a problem the architecture cannot solve.

The deeper observation is that representation geometry constrains what the space can represent independent of training. Designing for capability requires choosing a geometry that admits the relations you care about — and unit-sphere cosine, despite its efficiency advantages, admits a narrower set of relations than natural language uses.

Related concepts in this collection

Concept map
13 direct connections · 93 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

unit-sphere cosine spaces are geometrically hostile to non-commutative structures like negation and word order — single-vector retrieval cannot distinguish dog-bit-man from man-bit-dog