Knowledge Retrieval and RAG Language Understanding and Pragmatics Conversational AI Systems

Why do queries and their causes seem semantically different?

Information retrieval systems find passages matching query language, but what if the segment that actually caused a user's question says something quite different? This explores when semantic similarity fails to find causal relevance.

Note · 2026-05-03 · sourced from Recommenders Conversational

Standard information retrieval matches a query against a corpus by semantic similarity — the system finds the passages most similar to the query. The implicit assumption is that the user wants information about whatever the query mentions. Backtracing inverts the question: given a user's query, what segment of the source caused them to ask it? The cause is what content creators (lecturers, journalists, conversational partners) need to find to improve their material.

The empirical difference between these tasks is what the paper documents. In the LECTURE domain, a student asks "does projecting multiple times still lead to the same point?" The semantically similar passage discusses "projection matrices." But the causally relevant passage is the lecturer saying "projecting twice gets me the same answer as one projection" — which sounds like it should be the answer, except that's exactly what triggered the student's confusion (they didn't see why two projections collapse to one). Semantic relevance and causal relevance pull apart.

The phenomenon is domain-dependent. In NEWS ARTICLE backtracing, queries and causes are semantically close because news articles introduce key information early to capture interest. In CONVERSATION and LECTURE, the gap between maximal semantic similarity and ground-truth causal similarity is large — there are multiple semantically similar passages, but most are not the cause. Distribution of cause locations also differs: news has cause peaks at the beginning, conversation peaks at the end (cumulative buildup), lecture is uniform.

The practical bite for conversational recommender systems: when the user expresses dissatisfaction or asks a clarifying question, the segment of the conversation that caused the reaction is not necessarily the segment most similar to the reaction. Existing IR retrievers fail at this. The task requires new methods that model causal-relevance signals — not just embeddings of surface content.

Source: Recommenders Conversational

Related concepts in this collection

Do vector embeddings actually measure task relevance? Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?
extends: the same gap between similarity and the relevance the user actually needs — backtracing names the causal-relevance variant of this general failure
Why do users drift away from their original information need? When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
complements: ASK explains why a query and its causing-passage diverge — the user cannot articulate the gap they detected, so query semantics drift away from the cause
Does including all conversation history actually help retrieval? Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
complements: both find that surface-content retrieval over conversation history is wrong — selective history strips noise, backtracing redirects the relevance signal
Why do decoder-only models underperform as text encoders? Decoder-only LLMs use causal attention, which limits each token to seeing only prior context. This explores whether removing this constraint could make them competitive universal encoders without architectural redesign.
complements: encoder design fixes the representation; backtracing reframes what counts as relevant — both are needed
Does conversation order matter for recommending items in dialogue? Conversational recommendation systems typically ignore the sequence in which items are mentioned, treating dialogue as a bag of entities. But does the order itself carry predictive signal about what to recommend next?
complements: sequence-aware retrieval is the architectural correlate of cause-aware retrieval — both move past static similarity

Concept map

14 direct connections · 104 in 2-hop network ·medium cluster

Why do queries and their causes seem semanticall… Do vector embeddings actually measure task relevan… Why do users drift away from their original inform… Does including all conversation history actually h… Why do decoder-only models underperform as text en… Does conversation order matter for recommending it…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

causal relevance differs from semantic relevance — backtracing retrieves the segment that caused a query not the segment that matches it