Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
Why advanced language models remain passive conversationalists despite their technical capability to lead dialogue.
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.
When students solve problems with AI chatbots instead of peers, do they sacrifice personal voice and subjective expression in exchange for more efficient knowledge exchange and higher task performance?
Does the absence of social goals in human-machine communication explain why people disclose sensitive information more readily to chatbots? Understanding this mechanism could reshape how we design conversational AI.
When generative AI produces different outputs from identical inputs, how do interaction design principles help users maintain control and develop effective mental models for stochastic systems?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.
Users face a 'gulf of envisioning'—they must simultaneously imagine possibilities and express them to language models. This cognitive gap creates breakdowns not from AI incapability but from users struggling to articulate what they truly need.
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.
Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
When humans and AI must collaborate to solve optimization problems under asymmetric information, what communication patterns enable effective coordination? Current LLMs struggle with this—why?
This explores whether prompt quality has measurable, learnable dimensions beyond intuition. The research asks if prompts can be evaluated by their communicative, cognitive, and instructional properties rather than by their results.
How should preference optimization target multi-turn social dialogue—at individual turns, whole conversations, or key segments in between? This matters because granularity affects whether agents learn genuine social intelligence or just local fixes.
Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
Humans naturally shorten references as conversations progress, but LLMs don't adapt their language for efficiency even when they understand their partners do. Can training on coreference patterns teach this convention-forming behavior?
Humans naturally develop shorter, efficient language during conversations. Do multimodal LLMs exhibit this same spontaneous adaptation, or do they lack this communicative behavior?
Does the geometric shape of how dialogue unfolds—timing, repetition, topic drift—matter as much as what people actually say? This explores whether interactive patterns hold signals hidden in word choice alone.
Explores whether structuring internal reasoning as multi-agent dialogue rather than monologue can improve strategy diversity and coherency across different problem types, using the Compound-QA benchmark.
How do conversational systems recognize when their previous response was based on a misunderstanding, and what mechanism allows them to correct it retroactively rather than restart?
Can language models learn better problem-solving by observing full exploration trajectories—including mistakes and backtracking—rather than only optimal solutions? This matters because current LMs rarely see the decision-making process itself.
Explores which techniques make AI most persuasive—and whether the usual suspects like personalization and model size are actually the main drivers. Matters because it reshapes where to focus AI safety concerns.
Does having an AI generate customized counterevidence based on someone's specific conspiracy claims reduce their belief durably? This tests whether conspiracy beliefs are truly resistant to correction or whether previous failures reflected poor tailoring.
When AI systems generate more informative push notifications, do users engage more? This explores whether informativeness and engagement always align in real product contexts.
Prior AI misuse focused on generating text at scale. But does AI now make strategic decisions about when and how social media accounts should engage? Understanding this shift matters because it suggests a qualitative change in machine agency and operational sophistication.
Explores whether non-linear prompting structures (tree-of-thought, debate prompting) can functionally replace multi-agent architectures, and whether a single LLM simulating multiple personas achieves the same cognitive benefits as multiple models collaborating.