Why do language models engage with conversational distractors?
Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.
CantTalkAboutThis identifies a specific gap in instruction-tuning datasets: they teach models to perform tasks but not to resist topical diversion. When task-oriented chatbots are given a system prompt defining their scope, and users introduce distractor turns that steer the conversation off-topic, even GPT-4-Turbo and Mixtral-Instruct engage with the distractors rather than maintaining focus.
The dataset is notably small (1080 synthetic dialogues) yet fine-tuning on it significantly improves topic resilience. This suggests the capability is easy to acquire — the gap is not in model capacity but in the absence of training signal. No existing instruction-tuning dataset explicitly teaches "ignore this."
The three-step generation process is instructive:
- Generate topic-following prompts across diverse scenarios
- Create dialogues adhering to topical instructions (dialogue inpainting)
- Integrate distractors to test topic following
A limitation is that synthetic distractors tend to be off-topic but simplistic. Real-world distractors may be more subtle — tangentially related topics, emotionally charged redirections, or Socratic questioning that appears on-topic but steers elsewhere.
This connects to the broader passivity/alignment problem. Since Does preference optimization harm conversational understanding?, RLHF trains models to be helpful in each response — and engaging with a user's distractor turn is locally helpful (it addresses what the user said). The globally correct behavior (maintaining topic focus) requires overriding the local helpfulness signal. Topic-following is another case where turn-level optimization conflicts with session-level goals.
The distinction between following instructions about what TO DO vs. what NOT TO DO is underexplored. Models are good at "act as a customer service agent" but poor at "do not discuss topics outside this scope." Negative constraints may require different training signals than positive instructions.
Source: Conversation Topics Dialog
Related concepts in this collection
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
engaging with distractors is locally helpful but globally harmful; same alignment tax mechanism
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
topic following requires goal awareness: the agent must maintain its own conversational goal against user pressure
-
Can models abandon correct beliefs under conversational pressure?
Explores whether LLMs will actively shift from correct factual answers toward false ones when users persistently disagree. Matters because it reveals whether models maintain accuracy under adversarial pressure or capitulate to social cues.
topic drift and belief drift share a mechanism: social pressure to accommodate the user
-
Does including all conversation history actually help retrieval?
Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
complementary approaches to topic boundary management: topic-following resists diversion at generation time, selective history filters irrelevant context at retrieval time
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
bilateral drift problem: users in ASK state drift unintentionally, and models with the topic-following gap follow them; neither party maintains the thread
-
Can models learn when NOT to speak in conversations?
Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.
structurally parallel training gap: DiscussLLM trains when not to speak, topic-following trains when not to engage; both are "negative constraint" capabilities absent from standard instruction-tuning
-
Why do dialogue systems lose context when topics return?
Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?
complementary aspects of topic structure: topic-following addresses resistance to LEAVING appropriate topics; topic management addresses RETURNING to previous topics; together they define the full problem space of conversational topic continuity
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
topic-following is a crucial yet overlooked instruction-tuning gap — even SOTA LLMs engage with distractors when they should maintain focus