Psychology and Social Cognition Agentic and Multi-Agent Systems Conversational AI Systems

Why can't conversational AI agents take the initiative?

Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.

Note · 2026-02-22 · sourced from Conversation Agents
Why do AI agents fail to take initiative? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Three independent research programs converge on the same diagnosis: current LLM-based conversational agents, including ChatGPT and GPT-4, are fundamentally reactive. They respond to user queries but cannot initiate conversations, shift topics strategically, plan with subgoals, or offer recommendations that account for context beyond the current exchange.

The definition of proactivity comes from organizational behavior: "the capability to create or control the conversation by taking the initiative and anticipating impacts on themselves or human users." This is a well-defined property, not a vague aspiration — and it is systematically absent.

The gap matters most in situations requiring active engagement from both sides: exploratory search, complex decision-making, creative problem-solving. In these contexts, a purely reactive agent forces the user to carry the entire strategic burden of the conversation. The user must know what to ask, when to redirect, and how to structure the exchange — precisely the situations where they most need help.

The structural cause is training: LLMs are trained to follow user instructions and generate next-turn responses. This produces impressive reactive capability but no mechanism for initiative. Even "proactive" features like topic suggestion are reactive — triggered by user input rather than driven by agent goals. The distinction is between responding to and creating from.

Since Does preference optimization harm conversational understanding?, single-turn helpfulness training actively works against multi-turn strategic behavior. The passive architecture is not just a missing feature — it is reinforced by the training objective. And since Why do language models sound fluent without grounding?, the absence of initiative is further masked: models that skip clarifying questions, acknowledgments, and understanding checks sound more authoritative precisely because they perform less communicative work.

The practical consequence: methods for enabling proactivity include learning to ask (clarifying questions), topic shifting, and strategy planning with RL. But these remain research proposals. The deployed state of conversational AI is passive-by-default. A comprehensive survey (Deng et al., 2023) formalizes three subtasks for proactive dialogue systems: topic-shift detection (when to transition), topic planning (which path to follow), and topic-aware response generation (producing goal-directed utterances). Target types range from topical keywords to knowledge entities to full conversational goals. Yet even this taxonomy remains underexplored in deployed systems.

The efficiency cost of passivity is quantifiable: simulated proactivity in task-oriented domains of medium complexity reduces dialogue turns by up to 60%. Since Could proactive dialogue make conversations dramatically more efficient?, the absence is not just a capability gap but a data gap — proactivity is under-represented in training datasets, so models never encounter examples of it.

Two new architectural responses to this diagnosis have emerged. The Inner Thoughts framework reverses the question from "who speaks next?" to "does the agent have something worth saying?" — equipping AI with a continuous covert thought stream and intrinsic motivation scoring (preferred by humans 82% of the time). DiscussLLM takes the complementary approach: training a "silent token" prediction so models explicitly learn when NOT to intervene, formalizing the silence/speak decision as a classification task. Both recognize that the missing capability is not generating better responses but deciding whether to respond at all.

ProAgent: intention inference as proactivity mechanism (from Arxiv/Agents Multi): ProAgent addresses passivity through a hierarchical intention inference pipeline specifically designed for cooperative multi-agent settings. The five-stage process — (1) Knowledge Library and State Grounding (transforming raw state into language descriptions), (2) High-level Skill Planning (analyzing scene + inferring teammate intentions), (3) Belief Correction (updating beliefs based on observed actual behavior), (4) Skill Validation (checking and replanning if needed), (5) Memory Storage (accumulating decision context) — represents a concrete architecture for proactive behavior. The belief correction mechanism is key: rather than assuming static teammate behavior, ProAgent dynamically adjusts beliefs about partner intentions based on discrepancies between predicted and observed actions. This enables zero-shot coordination with unfamiliar teammates — addressing the passivity problem not through learned conversational initiative but through real-time social modeling. The distinction matters: passivity in human-AI interaction (failing to lead conversation) and passivity in AI-AI cooperation (failing to anticipate teammates) have different surface manifestations but share the same root cause — absence of goal-aware, other-modeling behavior.

Production agent deployment gap (from Arxiv/Agents): OpenAgents' real-world deployment reveals three concrete instantiations of passivity beyond conversational initiative. First, effective application specification via prompting requires instructions that cater to backend logic, output aesthetics, and adversarial safeguards — the instruction volume can exceed token limitations, meaning agents can't fully specify their own operational context. Second, real-time interactive scenarios like streaming are essential for acceptable user experience but are engineering-complex to implement with current LLM architectures. Third, current research gravitates toward idealized performance metrics while sidelining critical trade-offs between system responsiveness and accuracy, and the nuanced complexities of application-based failures. The gap between benchmarked and deployed agent performance is systematic, not incidental — and since Why do AI agents fail at workplace social interaction?, the 30% completion figure confirms that real-world complexity surfaces failures invisible in benchmarks.


Source: Conversation Agents, Conversation Topics Dialog, Conversation Architecture Structure, Agents

Related concepts in this collection

Concept map
27 direct connections · 185 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm-based conversational agents are structurally passive — they lack goal awareness initiative-taking and the ability to lead conversation beyond responding to user queries