Why do queries and their causes seem semantically different?
Information retrieval systems find passages matching query language, but what if the segment that actually caused a user's question says something quite different? This explores when semantic similarity fails to find causal relevance.
Standard information retrieval matches a query against a corpus by semantic similarity — the system finds the passages most similar to the query. The implicit assumption is that the user wants information about whatever the query mentions. Backtracing inverts the question: given a user's query, what segment of the source caused them to ask it? The cause is what content creators (lecturers, journalists, conversational partners) need to find to improve their material.
The empirical difference between these tasks is what the paper documents. In the LECTURE domain, a student asks "does projecting multiple times still lead to the same point?" The semantically similar passage discusses "projection matrices." But the causally relevant passage is the lecturer saying "projecting twice gets me the same answer as one projection" — which sounds like it should be the answer, except that's exactly what triggered the student's confusion (they didn't see why two projections collapse to one). Semantic relevance and causal relevance pull apart.
The phenomenon is domain-dependent. In NEWS ARTICLE backtracing, queries and causes are semantically close because news articles introduce key information early to capture interest. In CONVERSATION and LECTURE, the gap between maximal semantic similarity and ground-truth causal similarity is large — there are multiple semantically similar passages, but most are not the cause. Distribution of cause locations also differs: news has cause peaks at the beginning, conversation peaks at the end (cumulative buildup), lecture is uniform.
The practical bite for conversational recommender systems: when the user expresses dissatisfaction or asks a clarifying question, the segment of the conversation that caused the reaction is not necessarily the segment most similar to the reaction. Existing IR retrievers fail at this. The task requires new methods that model causal-relevance signals — not just embeddings of surface content.
Source: Recommenders Conversational
Related concepts in this collection
-
Do vector embeddings actually measure task relevance?
Vector embeddings rank semantic similarity, but RAG systems need topical relevance. When these diverge—as with king/queen versus king/ruler—does similarity-based retrieval fail in production?
extends: the same gap between similarity and the relevance the user actually needs — backtracing names the causal-relevance variant of this general failure
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
complements: ASK explains why a query and its causing-passage diverge — the user cannot articulate the gap they detected, so query semantics drift away from the cause
-
Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
complements: both find that surface-content retrieval over conversation history is wrong — selective history strips noise, backtracing redirects the relevance signal
-
Why do decoder-only models underperform as text encoders?
Decoder-only LLMs use causal attention, which limits each token to seeing only prior context. This explores whether removing this constraint could make them competitive universal encoders without architectural redesign.
complements: encoder design fixes the representation; backtracing reframes what counts as relevant — both are needed
-
Does conversation order matter for recommending items in dialogue?
Conversational recommendation systems typically ignore the sequence in which items are mentioned, treating dialogue as a bag of entities. But does the order itself carry predictive signal about what to recommend next?
complements: sequence-aware retrieval is the architectural correlate of cause-aware retrieval — both move past static similarity
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
causal relevance differs from semantic relevance — backtracing retrieves the segment that caused a query not the segment that matches it