← All clusters

Conversational AI Systems

Research on dialogue systems, conversational agents, and multi-turn interaction architectures. Covers topics including conversation structure, personalization, synthetic dialog generation, and conversational recommenders, studied by NLP and human-computer interaction researchers.

57 notes (primary) · 271 papers · 8 sub-topics
View as

Conversation Architecture and Structure

13 notes

Why do standard dialogue systems fail at tracking negotiation agreement?

Standard dialogue state tracking monitors one user's goals, but negotiation requires tracking both parties' evolving positions simultaneously. Why is this bilateral requirement fundamentally different, and what makes existing models insufficient?

Explore related Read →

Can models learn to abstain when uncertain about predictions?

Explores whether language models can be trained to recognize when they lack sufficient information to forecast conversation outcomes, rather than forcing uncertain predictions into confident-sounding responses.

Explore related Read →

Can conversation structure predict dialogue success better than content?

Does the geometric shape of how dialogue unfolds—timing, repetition, topic drift—matter as much as what people actually say? This explores whether interactive patterns hold signals hidden in word choice alone.

Explore related Read →

Can AI agents communicate efficiently in joint decision problems?

When humans and AI must collaborate to solve optimization problems under asymmetric information, what communication patterns enable effective coordination? Current LLMs struggle with this—why?

Explore related Read →

Why do dialogue systems lose context when topics return?

Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?

Explore related Read →

When should AI agents ask users instead of just searching?

Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.

Explore related Read →

Can we teach LLMs to form linguistic conventions in context?

Humans naturally shorten references as conversations progress, but LLMs don't adapt their language for efficiency even when they understand their partners do. Can training on coreference patterns teach this convention-forming behavior?

Explore related Read →

Could proactive dialogue make conversations dramatically more efficient?

Explores whether AI systems that volunteer relevant unrequested information could significantly reduce the back-and-forth turns required in task-oriented conversations, and why this behavior is missing from training data.

Explore related Read →

Does including all conversation history actually help retrieval?

Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?

Explore related Read →

What six problems must every conversation solve?

Schegloff's Conversation Analysis identifies six universal organizational challenges that speakers navigate in all talk-in-interaction. Understanding these helps explain why current AI dialogue systems fall short of human fluency.

Explore related Read →

Why can't users articulate what they want from AI?

Explores the cognitive gap between imagining possibilities and expressing them as prompts. Why language interfaces create a harder envisioning task than traditional UI affordances.

Explore related Read →

How do time gaps shape what people discuss across conversation sessions?

Do AI systems account for how elapsed time between conversations changes the way people reference and discuss past events? Current models mostly handle single sessions, but real interactions span days, weeks, and months.

Explore related Read →

Can conversation shape predict whether it will work?

Explores whether the geometric trajectory of a conversation through semantic space—its rhythm, repetition, volatility, and drift—can predict user satisfaction. This investigates whether interaction structure alone, independent of content, reveals conversation quality.

Explore related Read →

Dialog Topics and Modeling

8 notes

Which clarifying questions actually improve user satisfaction?

Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.

Explore related Read →

Does linguistic alignment shape how users perceive AI relationships?

Can conversational AI build relational trust and partnership through real-time linguistic accommodation, or is warmth only surface-level styling? This explores whether alignment is foundational to how users categorize AI as tool versus partner.

Explore related Read →

Why do language models fail in gradually revealed conversations?

Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.

Explore related Read →

Why do language models lose performance in longer conversations?

Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.

Explore related Read →

Does segment-level optimization work better for multi-turn dialogue alignment?

How should preference optimization target multi-turn social dialogue—at individual turns, whole conversations, or key segments in between? This matters because granularity affects whether agents learn genuine social intelligence or just local fixes.

Explore related Read →

Why do AI assistants get worse at longer conversations?

Explores why LLM performance drops 25 points when instructions span multiple turns instead of one message, and whether models can recover from early wrong assumptions.

Explore related Read →

Why do language models engage with conversational distractors?

Explores why state-of-the-art LLMs struggle to maintain topical focus when users introduce off-topic turns, despite having explicit scope instructions. This gap suggests models lack training signals for ignoring irrelevant directions.

Explore related Read →

Can models learn to ask genuinely useful clarifying questions?

Explores whether question-asking quality is teachable through decomposing it into specific attributes like clarity and relevance, rather than treating it as a monolithic skill.

Explore related Read →

Conversational Recommenders

8 notes

Does conversation order matter for recommending items in dialogue?

Conversational recommendation systems typically ignore the sequence in which items are mentioned, treating dialogue as a bag of entities. But does the order itself carry predictive signal about what to recommend next?

Explore related Read →

Do simulated training interactions transfer to real conversations?

Most conversational recommender systems train on simulated entity-level exchanges, not natural dialogue. The question is whether models built this way actually work when deployed with real users who speak naturally and deviate from expected patterns.

Explore related Read →

Can unified policy learning improve conversational recommender systems?

This explores whether formulating attribute-asking, item-recommending, and timing decisions as a single reinforcement learning policy outperforms treating them as separate components. The question matters because joint optimization could improve conversation quality and system scalability.

Explore related Read →

Can controlled latent variables make LLM user simulators realistic?

Can session-level and turn-level latent variables steer LLM-based user simulators toward realistic dialogue while maintaining measurable diversity and ground truth labels for training conversational systems?

Explore related Read →

What makes conversational recommenders hard to build well?

Most assume the challenge is language fluency, but what if the real problem is managing mixed-initiative dialogue—where both users and systems take turns driving the conversation?

Explore related Read →

Can language models bridge the gap between critique and preference?

When users express what they dislike rather than what they want, can LLMs reliably transform those critiques into positive preferences that retrieval systems can actually use?

Explore related Read →

Do conversational recommender benchmarks actually measure recommendation skill?

Conversational recommender systems are evaluated against ground-truth items mentioned later in conversations. But does this metric distinguish between genuinely recommending new items versus simply repeating items users already discussed?

Explore related Read →

Do recommendation strategies beyond preference questions work better?

What role do sociable conversational moves—opinion sharing, encouragement, credibility signals—play in successful human recommendations, compared to simply asking what someone likes?

Explore related Read →

Personalized Assistants

3 notes

Why do LLM judges fail at predicting sparse user preferences?

When LLMs judge user preferences based on limited persona information, what causes their predictions to become unreliable? Understanding persona sparsity's role in judgment failure could improve personalization systems.

Explore related Read →

Can conversations themselves personalize without user profiles?

Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?

Explore related Read →

How do personalization granularity levels trade precision against scalability?

LLM personalization operates at user, persona, and global levels, each with different tradeoffs. Understanding these tradeoffs helps determine when to invest in individual user data versus broader patterns.

Explore related Read →

Speech and Voice

3 notes

Why do dialogue systems need probabilistic reasoning?

Explores whether deterministic flowchart-based dialogue systems can handle realistic speech recognition error rates of 15-30 percent, and what alternative approaches might be necessary.

Explore related Read →

Can skipping transcription make voice assistants faster?

Voice assistants traditionally convert speech to text before responding. Does eliminating that middle step reduce latency enough to matter for real-time conversation?

Explore related Read →

Do speech models learn language-specific sounds or universal physics?

Exploring whether self-supervised speech models encode phonetic categories tied to specific languages or instead capture the underlying vocal-tract physics common to all humans. This matters for understanding why these models transfer across languages without retraining.

Explore related Read →

Personalization (General)

3 notes

Can personas evolve in real time to match what users actually want?

Explores whether a persona that bridges memory and action can adapt during conversations by simulating interactions and optimizing against user feedback, without retraining the underlying model.

Explore related Read →

Do persona consistency metrics actually measure dialogue quality?

Personalized dialogue systems can achieve high persona consistency scores by simply restating character descriptions, ignoring conversational relevance. Does optimizing for persona fidelity necessarily harm the coherence readers actually care about?

Explore related Read →

Does abstract preference knowledge outperform specific interaction recall?

Explores whether summarized user preferences are more effective for LLM personalization than retrieving individual past interactions. Tests a cognitive dual-memory model against real personalization performance across model scales.

Explore related Read →

Synthetic Dialogue Generation

1 note

Conversational Agents

1 note