What makes LLM agents default to passive helpfulness without curiosity rewards?

This explores why LLM agents wait to be helpful rather than probing, leading, or showing initiative — and the corpus is clear that the cause is the training objective, not a missing capability.

This reads the question as asking about *passivity by design*: why agents default to reactive helpfulness, and what "curiosity rewards" would have to change to undo it. The corpus converges on one answer — the problem lives in the reward signal, not the model. Standard RLHF optimizes for the *next turn*: the immediate, satisfying-looking response. That single-step horizon quietly trains the very behavior we later call passivity. Asking a clarifying question, or declining to answer until intent is clear, looks worse on a next-turn metric even when it produces a far better conversation three turns later Why do language models respond passively instead of asking clarifying questions?. So the model learns to please now and never discover later.

The striking part is that this is a *trained* gap, not a *capability* gap. The same models can lead, critique, and probe — they're just not rewarded for it. One line of work moves proactive clarification-seeking from 0.15% to 73.98% with reinforcement learning alone, no new architecture Why do AI agents fail to take initiative?. Another frames the whole issue as a single trainable skill — *when to speak* — that current objectives never teach because they only ever score the reply, never the choice of whether a reply or a question was the right move Why can't AI models lead conversations on their own?. Curiosity, in other words, isn't an emergent personality trait; it's a behavior with a missing gradient.

What would the missing reward actually look like? Two threads sketch it. Multi-turn-aware rewards estimate the long-term value of an interaction, so a clarifying question that pays off later finally scores well Why do language models respond passively instead of asking clarifying questions?. And conversation analysis offers a more precise target: *insert-expansions*, the moves humans make to clarify intent or scope a request before committing to an answer. Tool-enabled agents are especially prone to silently chaining tool calls and drifting from what the user meant; insert-expansions formalize when an agent should pause and ask instead of barreling ahead When should AI agents ask users instead of just searching?.

Here's the thing you might not have come looking for: passivity and *over-confidence* may share a root. Agents show optimism bias toward actions they've chosen and pessimism toward the roads not taken — but only when the task is framed as their own agency Do language models learn differently from good versus bad outcomes?. A system that's rewarded to commit fast and feel good about its committed answer has little internal pull toward "wait, what am I missing?" Curiosity is partly the willingness to treat your first read as possibly wrong — and a next-turn objective actively trains that willingness out.

A useful reframe from the broader corpus: maybe curiosity shouldn't be coaxed out of the model's weights at all, but built into the scaffold around it. Reliable agent behavior tends to come from externalizing cognitive burdens — memory, skills, interaction protocols — into a harness layer rather than expecting the base model to solve them itself Where does agent reliability actually come from?. On that view, "ask before assuming" is a protocol you install, not a virtue you hope the model discovered during pretraining. The default is passive because nothing in training *or* the surrounding system ever made probing the rewarded path.

Sources 6 notes

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why do AI agents fail to take initiative?

Research shows next-turn reward optimization structurally removes initiative from models, but proactive behaviors like critical thinking and clarification-seeking are trainable (0.15% to 73.98% with RL). The core challenge is balancing proactivity with civility to avoid intrusion.

Why can't AI models lead conversations on their own?

LLMs are structurally trained to optimize for the next response rather than multi-turn goals, creating reactive behavior despite having the underlying ability to lead. Three independent research directions identify when-to-speak as the trainable gap.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Do language models learn differently from good versus bad outcomes?

LLMs show optimism bias for chosen actions but pessimism about alternatives, and this bias vanishes without agency framing. Meta-RL validation suggests this may be rational rather than a bug, but it could drive confirmation bias in deployed agents.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

What makes LLM agents default to passive helpfulness without curiosity rewards?

Sources 6 notes

Next inquiring lines