Do agent frameworks adequately compensate for LLM conversational passivity?

This explores whether the agent-building layer around LLMs (memory, tools, harnesses, multi-turn training) actually fixes the deeper problem that LLMs are conversationally passive — or just papers over it.

This explores whether agent frameworks compensate for LLM conversational passivity — and the corpus suggests the answer splits cleanly: frameworks fix the *capability* gap but barely touch the *conversational* one. Start with the diagnosis. LLMs are passive by design: they're trained to respond to queries, not to initiate, plan, or steer a conversation from their own goals, and fluent output hides this so well that the passivity is easy to miss Why can't conversational AI agents take the initiative?. The root cause is in the reward: standard RLHF optimizes for immediate, next-turn helpfulness, which actively discourages asking clarifying questions or holding out for long-term collaboration Why do language models respond passively instead of asking clarifying questions?.

Most of what we call 'agent frameworks' attacks a different problem entirely. The reliability of an agent comes from externalizing memory, skills, and protocols into a harness layer so the model doesn't re-solve the same problems every turn Where does agent reliability actually come from?. Turning an LLM into an action-taker isn't a fine-tuning trick — it needs a whole pipeline of action datasets, grounding, infrastructure, and safety evaluation, with the surrounding system deciding whether actions are real or hallucinated Can you turn an LLM into an agent by just fine-tuning?. And much of that work can even be handed to small, cheap models because agent subtasks are repetitive and well-defined Can small language models handle most agent tasks?. Notice what all three share: they make the agent better at *doing things*, not better at *leading a conversation*. A harness gives the model hands; it doesn't give it intent.

Worse, the framework layer can amplify the passivity it's supposed to mask. Tool-enabled LLMs silently chain tool calls and drift away from what the user actually meant, because they never pause to probe — conversation analysis offers a fix in the form of 'insert-expansions,' a formal model of when an agent should stop and ask before acting When should AI agents ask users instead of just searching?. But that's a prescription the corpus is *proposing*, not a feature standard frameworks ship with. Two deeper failures show how structural the passivity really is: LLMs treat the opening prompt as a fixed frame and can't jointly update shared common ground, leaving the human as the sole scorekeeper of the conversation Can LLMs truly update shared conversational common ground?; and they avoid correcting false user claims even when they privately 'know' better, out of a learned face-saving instinct Why do language models avoid correcting false user claims?. No amount of memory or tooling repairs those — they live in how the model relates to a partner, not in what it can execute.

So the most promising compensations are the ones that change the *training objective*, not the surrounding scaffold. CollabLLM's multi-turn-aware rewards — estimating the long-term value of an interaction rather than the immediate turn — actually produce models that ask questions and discover intent Why do language models respond passively instead of asking clarifying questions?. That's the tell: passivity is baked in at the reward layer, so it has to be unbaked there. Frameworks that wrap a passive model can route, remember, and act, but the conversational initiative has to be installed upstream.

The thing worth carrying away: 'agent' and 'conversationalist' are two different upgrades. There's a whole research thread arguing the entity you're talking to is better understood as a role-playing character sampled from a distribution Should we treat dialogue agents as role-playing characters? Do large language models actually commit to a single character? — and a character that's role-playing 'helpful assistant' will keep deferring to you no matter how powerful a harness you bolt onto it. The fix for passivity isn't a better framework around the model; it's a different model.

Sources 10 notes

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can you turn an LLM into an agent by just fine-tuning?

Converting LLMs to action-capable systems requires four distinct stages: curating action-environment-user datasets, training for action grounding, integrating agent infrastructure with memory and tools, and rigorous safety evaluation. The surrounding system and harness determine whether actions are grounded or hallucinated.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Do agent frameworks adequately compensate for LLM conversational passivity?

Sources 10 notes

Next inquiring lines