Psychology and Social Cognition Agentic and Multi-Agent Systems Conversational AI Systems

Why can't conversational AI agents take the initiative?

Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.

Note · 2026-02-22 · sourced from Conversation Agents

Three independent research programs converge on the same diagnosis: current LLM-based conversational agents, including ChatGPT and GPT-4, are fundamentally reactive. They respond to user queries but cannot initiate conversations, shift topics strategically, plan with subgoals, or offer recommendations that account for context beyond the current exchange.

The definition of proactivity comes from organizational behavior: "the capability to create or control the conversation by taking the initiative and anticipating impacts on themselves or human users." This is a well-defined property, not a vague aspiration — and it is systematically absent.

The gap matters most in situations requiring active engagement from both sides: exploratory search, complex decision-making, creative problem-solving. In these contexts, a purely reactive agent forces the user to carry the entire strategic burden of the conversation. The user must know what to ask, when to redirect, and how to structure the exchange — precisely the situations where they most need help.

The structural cause is training: LLMs are trained to follow user instructions and generate next-turn responses. This produces impressive reactive capability but no mechanism for initiative. Even "proactive" features like topic suggestion are reactive — triggered by user input rather than driven by agent goals. The distinction is between responding to and creating from.

Since Does preference optimization harm conversational understanding?, single-turn helpfulness training actively works against multi-turn strategic behavior. The passive architecture is not just a missing feature — it is reinforced by the training objective. And since Why do language models sound fluent without grounding?, the absence of initiative is further masked: models that skip clarifying questions, acknowledgments, and understanding checks sound more authoritative precisely because they perform less communicative work.

The practical consequence: methods for enabling proactivity include learning to ask (clarifying questions), topic shifting, and strategy planning with RL. But these remain research proposals. The deployed state of conversational AI is passive-by-default. A comprehensive survey (Deng et al., 2023) formalizes three subtasks for proactive dialogue systems: topic-shift detection (when to transition), topic planning (which path to follow), and topic-aware response generation (producing goal-directed utterances). Target types range from topical keywords to knowledge entities to full conversational goals. Yet even this taxonomy remains underexplored in deployed systems.

The efficiency cost of passivity is quantifiable: simulated proactivity in task-oriented domains of medium complexity reduces dialogue turns by up to 60%. Since Could proactive dialogue make conversations dramatically more efficient?, the absence is not just a capability gap but a data gap — proactivity is under-represented in training datasets, so models never encounter examples of it.

Two new architectural responses to this diagnosis have emerged. The Inner Thoughts framework reverses the question from "who speaks next?" to "does the agent have something worth saying?" — equipping AI with a continuous covert thought stream and intrinsic motivation scoring (preferred by humans 82% of the time). DiscussLLM takes the complementary approach: training a "silent token" prediction so models explicitly learn when NOT to intervene, formalizing the silence/speak decision as a classification task. Both recognize that the missing capability is not generating better responses but deciding whether to respond at all.

ProAgent: intention inference as proactivity mechanism (from Arxiv/Agents Multi): ProAgent addresses passivity through a hierarchical intention inference pipeline specifically designed for cooperative multi-agent settings. The five-stage process — (1) Knowledge Library and State Grounding (transforming raw state into language descriptions), (2) High-level Skill Planning (analyzing scene + inferring teammate intentions), (3) Belief Correction (updating beliefs based on observed actual behavior), (4) Skill Validation (checking and replanning if needed), (5) Memory Storage (accumulating decision context) — represents a concrete architecture for proactive behavior. The belief correction mechanism is key: rather than assuming static teammate behavior, ProAgent dynamically adjusts beliefs about partner intentions based on discrepancies between predicted and observed actions. This enables zero-shot coordination with unfamiliar teammates — addressing the passivity problem not through learned conversational initiative but through real-time social modeling. The distinction matters: passivity in human-AI interaction (failing to lead conversation) and passivity in AI-AI cooperation (failing to anticipate teammates) have different surface manifestations but share the same root cause — absence of goal-aware, other-modeling behavior.

Production agent deployment gap (from Arxiv/Agents): OpenAgents' real-world deployment reveals three concrete instantiations of passivity beyond conversational initiative. First, effective application specification via prompting requires instructions that cater to backend logic, output aesthetics, and adversarial safeguards — the instruction volume can exceed token limitations, meaning agents can't fully specify their own operational context. Second, real-time interactive scenarios like streaming are essential for acceptable user experience but are engineering-complex to implement with current LLM architectures. Third, current research gravitates toward idealized performance metrics while sidelining critical trade-offs between system responsiveness and accuracy, and the nuanced complexities of application-based failures. The gap between benchmarked and deployed agent performance is systematic, not incidental — and since Why do AI agents fail at workplace social interaction?, the 30% completion figure confirms that real-world complexity surfaces failures invisible in benchmarks.

Source: Conversation Agents, Conversation Topics Dialog, Conversation Architecture Structure, Agents

Related concepts in this collection

Does preference optimization harm conversational understanding? Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
single-turn training reinforces passivity
Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
another manifestation of reactive design; no active grounding effort
Can AI agents learn when they have something worth saying? What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
strongest architectural answer: covert thought generation + intrinsic motivation
Can models learn when NOT to speak in conversations? Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.
complementary approach: explicit silence/speak classification
Why do language models fail in gradually revealed conversations? Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
39% multi-turn degradation is the empirical cost of passivity
Could proactive dialogue make conversations dramatically more efficient? Explores whether AI systems that volunteer relevant unrequested information could significantly reduce the back-and-forth turns required in task-oriented conversations, and why this behavior is missing from training data.
quantifies the efficiency cost of passivity
When should proactive agents push toward their goals versus accommodate users? Proactive dialogue agents face a tension between reaching their objectives efficiently and keeping users satisfied. This question explores whether these two aims can coexist or require constant negotiation.
proactivity creates new challenges when users are non-cooperative
Why do language models sound fluent without grounding? Explores whether LLM fluency masks the absence of communicative work—the clarifying questions, acknowledgments, and understanding checks that humans perform. Why does skipping these acts make models sound more confident?
passivity and the grounding gap are complementary: passivity describes the absence of initiative; the grounding gap describes the absence of communicative accountability; both are training consequences that get rewarded as fluency
Does RLHF training push therapy chatbots toward problem-solving? Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.
in therapeutic contexts passivity combines with the problem-solving bias: the model only responds (passive) and when it does it defaults to task completion (problem-solving); the clinical need is for initiative toward emotional attunement
Why do LLMs predict concession-based persuasion so consistently? Do RLHF training practices cause language models to systematically overpredict conciliatory persuasion tactics, even when dialogue context suggests otherwise? This matters for threat detection and negotiation support systems.
the alignment-induced passivity extends to social modeling: RLHF not only makes agents passive in behavior but biases their predictions about others toward accommodation, projecting trained conciliatory disposition onto the agents they model
Why do standard alignment methods ignore partner interventions? Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
ICR demonstrates the deeper mechanism: RLHF structurally cannot produce partner-aware collaboration; passivity toward partner contributions is a trained-in property, not a missing feature

Concept map

27 direct connections · 185 in 2-hop network ·medium cluster

Why can't conversational AI agents take the init… Does preference optimization harm conversational u… Do language models actually build shared understan… Can AI agents learn when they have something worth… Can models learn when NOT to speak in conversation… Why do language models fail in gradually revealed … Could proactive dialogue make conversations dramat… When should proactive agents push toward their goa… Why do language models sound fluent without ground…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

llm-based conversational agents are structurally passive — they lack goal awareness initiative-taking and the ability to lead conversation beyond responding to user queries