Conversational AI Systems Knowledge Retrieval and RAG

Does including all conversation history actually help retrieval?

Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?

Note · 2026-02-22 · sourced from Conversation Architecture Structure
RAG Why do AI conversations reliably break down after multiple turns? How should researchers navigate LLM reasoning research?

A common assumption in conversational search and QA is that including all previous conversation context helps the model understand the current query. Two independent research programs demonstrate this assumption is wrong.

The problem: topic switches within a conversation session are common. A user might discuss restaurants, then switch to hotels, then return to restaurants. Using ALL previous queries to expand the current query "will inevitably inject irrelevant information into the expanded query and result in sub-optimal queries."

Two complementary solutions:

Learning to Relate proposes selecting useful previous queries based on whether they improve retrieval effectiveness for the current query. A multi-task learning method jointly optimizes query selection and dense retrieval — and the automated selection outperforms human annotations because the model optimizes for retrieval quality while humans optimize for semantic understanding.

DHS-ConvQA uses entity-based similarity between history turns and the current question, then applies attention-based re-ranking to weight useful terms. A binary classification task highlights useful terms (predicted as 1) and ignores irrelevant ones (predicted as 0).

The key finding generalizes: for both conversational search and conversational QA, selective context is better than full context. This challenges the assumption that more context is always better — an assumption shared by RAG systems and long-context models.

Since Why do language models fail in gradually revealed conversations?, the selective history mechanism addresses a specific form of getting lost: when previous turns about a different topic bias the model's interpretation of the current turn. The fix is not better reasoning over more context but better selection of which context to include. This is the retrieval-side complement to Why do language models engage with conversational distractors?, which addresses the same problem at generation time — models lack the ability to recognize and resist topical diversion, whether it comes from their own context window (selective history) or from user behavior (topic-following).

Two additional failure modes from conversational memory research (2406.00057): Beyond topic switches, conversational retrieval faces two challenges absent from static database retrieval: (1) time/event-based queries — users ask "what did we discuss yesterday?" or "summarize Jason's points from January 6th" which require retrieval by temporal metadata, not semantic similarity; (2) ambiguous queries — pronouns and demonstratives ("tell me more about that") that require surrounding conversational context to disambiguate before retrieval can occur. Standard vector-DB RAG fails both. The combined solution requires chaining table-based search (for metadata), vector-database retrieval (for content), and disambiguation prompting (for resolving ambiguous references). See Why do time-based queries fail in conversational retrieval systems?.


Source: Conversation Architecture Structure; enriched from Memory

Related concepts in this collection

Concept map
16 direct connections · 116 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

selective history retrieval outperforms full-context inclusion in conversational search — topic switches within sessions inject irrelevant information that degrades retrieval