Knowledge Retrieval and RAG

Do vector embeddings actually measure task relevance?

Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?

Note · 2026-02-22 · sourced from RAG
RAG How should researchers navigate LLM reasoning research?

The king/queen/ruler problem illustrates a fundamental misconception embedded in RAG architecture. Vector embeddings trained on language co-occurrence measure semantic association — how often concepts appear in related contexts. King and queen appear frequently together in discussions of royalty, so they are highly similar (92%). King and ruler appear in the same conceptual category but co-occur less, so similarity is lower (83%).

But the relevant criterion for a RAG system is not semantic association — it is whether a chunk answers the query. For a query about "king," chunks discussing "ruler" are more relevant than chunks discussing "queen" even though queen is more similar by embedding distance. Semantic similarity and task relevance diverge whenever concepts are closely associated but play different roles.

This divergence is not a calibration problem or a model quality problem — it is structural. Embeddings cannot know what role a concept plays in a query without understanding the query's intent. They can only return what is semantically nearby. For many RAG use cases this is sufficient approximation. For others — precision-critical domains, complex queries, queries where highly associated concepts would be wrong answers — it fails.

The production failure pattern: RAG demos work because demo queries are carefully chosen to favor semantic retrieval (simple, unique topics, clear information needs). Production queries are messy — underspecified, multi-intent, asking about concepts with many associated concepts that would be wrong answers. The semantic association measure that works in demos becomes a noise source in production.

Re-ranking, advanced chunking, and other "Advanced RAG" techniques address symptoms. They do not fix the fundamental mismatch between what embeddings measure and what retrieval needs to optimize.

LLM attention on graph-structured data reveals a parallel mismatch. When LLMs are fine-tuned on graph data, their attention patterns shift toward node tokens — they learn to recognize graph entities. But shuffling node connectivity has no effect on performance, meaning the model attends to nodes without modeling the relationships between them. The same structural limitation appears in both embedding retrieval (association without relevance) and LLM graph processing (recognition without relational modeling). See Can language models actually use graph structure information?.


Source: RAG; enriched from Knowledge Graphs

Related concepts in this collection

Concept map
24 direct connections · 192 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

vector embeddings measure semantic association not task relevance — causing production RAG failures