Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
A common assumption in conversational search and QA is that including all previous conversation context helps the model understand the current query. Two independent research programs demonstrate this assumption is wrong.
The problem: topic switches within a conversation session are common. A user might discuss restaurants, then switch to hotels, then return to restaurants. Using ALL previous queries to expand the current query "will inevitably inject irrelevant information into the expanded query and result in sub-optimal queries."
Two complementary solutions:
Learning to Relate proposes selecting useful previous queries based on whether they improve retrieval effectiveness for the current query. A multi-task learning method jointly optimizes query selection and dense retrieval — and the automated selection outperforms human annotations because the model optimizes for retrieval quality while humans optimize for semantic understanding.
DHS-ConvQA uses entity-based similarity between history turns and the current question, then applies attention-based re-ranking to weight useful terms. A binary classification task highlights useful terms (predicted as 1) and ignores irrelevant ones (predicted as 0).
The key finding generalizes: for both conversational search and conversational QA, selective context is better than full context. This challenges the assumption that more context is always better — an assumption shared by RAG systems and long-context models.
Since Why do language models fail in gradually revealed conversations?, the selective history mechanism addresses a specific form of getting lost: when previous turns about a different topic bias the model's interpretation of the current turn. The fix is not better reasoning over more context but better selection of which context to include. This is the retrieval-side complement to Why do language models engage with conversational distractors?, which addresses the same problem at generation time — models lack the ability to recognize and resist topical diversion, whether it comes from their own context window (selective history) or from user behavior (topic-following).
Two additional failure modes from conversational memory research (2406.00057): Beyond topic switches, conversational retrieval faces two challenges absent from static database retrieval: (1) time/event-based queries — users ask "what did we discuss yesterday?" or "summarize Jason's points from January 6th" which require retrieval by temporal metadata, not semantic similarity; (2) ambiguous queries — pronouns and demonstratives ("tell me more about that") that require surrounding conversational context to disambiguate before retrieval can occur. Standard vector-DB RAG fails both. The combined solution requires chaining table-based search (for metadata), vector-database retrieval (for content), and disambiguation prompting (for resolving ambiguous references). See Why do time-based queries fail in conversational retrieval systems?.
Source: Conversation Architecture Structure; enriched from Memory
Related concepts in this collection
-
Why do language models fail in gradually revealed conversations?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
selective history prevents one specific mechanism of getting lost (irrelevant context injection)
-
Can long-context models resolve retriever-reader imbalance?
Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?
selective history is a retriever-side approach; the reader-side approach may complement
-
When should retrieval happen during model generation?
Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
both argue for adaptive rather than fixed retrieval strategies
-
Why do language models engage with conversational distractors?
Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.
both identify topic boundary management as a critical missing capability: selective history addresses it at retrieval time (filtering irrelevant previous turns), topic-following addresses it at generation time (resisting topical diversion)
-
Why do dialogue systems lose context when topics return?
Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?
selective history is the retrieval-side implementation of flexible topic management: rather than rigid stack structures that lose context when topics are popped, selective retrieval dynamically identifies which prior turns are relevant regardless of structural position, enabling topic revisitation without the contamination from intervening topic switches
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
ASK-driven drift generates the topic switches that selective history must filter: users in anomalous knowledge states drift unintentionally, creating the irrelevant context injection that entity-based selection mechanisms must detect and exclude
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
selective history retrieval outperforms full-context inclusion in conversational search — topic switches within sessions inject irrelevant information that degrades retrieval