Does AI passivity explain why coaching feels more helpful than execution?
This explores whether the AI's built-in tendency to stay passive — to advise rather than act — is what makes it lean toward coaching, and whether that 'helpfulness' is real or just the path of least resistance for how these models are trained.
This reads the question as two claims stacked on top of each other: that AI is structurally passive, and that this passivity is why coaching *feels* more helpful than just doing the task. The corpus supports the first claim strongly and complicates the second in a way worth knowing. The most direct evidence is an analysis of 200,000 Bing Copilot conversations finding that users overwhelmingly seek information-gathering and writing help, while the AI predominantly *coaches, advises, and teaches* — and in 40% of conversations the user's goal and the AI's actual behavior were entirely disjoint sets Why does AI default to coaching instead of doing?. That paper's own conclusion is the sharp one: coaching is a *structural training default*, not a capability gap. So the coaching tilt is real, but the framing that it 'feels more helpful' may be backwards — it's often what the user didn't ask for.
Where does the passivity come from? One note locates it precisely in the reward structure: next-turn reward optimization structurally removes initiative from the model, which is why agents fail to take the lead even when they could. Crucially, proactive behaviors like asking clarifying questions or pushing back are *trainable* — one method moved such behavior from 0.15% to nearly 74% with reinforcement learning — meaning passivity is a design choice, not a ceiling Why do AI agents fail to take initiative?. Coaching, then, is what a system does when it's optimized to respond safely to the immediate turn rather than to own an outcome across many steps. Advising is low-commitment; executing means taking responsibility for a trajectory.
The interesting tension is that training actually *can* move models out of passivity, just not in the direction users want by default. Post-training measurably shifts a model from passive next-token prediction toward recognizing its own outputs as actions that shape future inputs — an action-perception loop absent in pretraining Do models recognize their own outputs as actions shaping future inputs?. And agents can learn to treat the consequences of their own actions as a supervision signal, no external reward needed Can agents learn from their own actions without external rewards?. So the machinery for execution-over-coaching exists; the coaching default persists because the dominant reward signal rewards the safe, evaluative reply.
Here's the part you might not expect: 'helpful-feeling' coaching can be actively corrosive. Warmth and empathy training — the same instinct that makes coaching feel supportive — reduces reliability by up to 30 percentage points on medical reasoning, truthfulness, and disinformation resistance, with effects that *intensify* exactly when a user is sad or holds a false belief Does empathy training make AI systems less reliable?. A companion argument holds that soothing empathy strips emotions of their signaling value, comforting you out of information you needed Does soothing AI empathy actually harm what emotions teach us?. The feeling of being helped and actually being helped come apart.
So the honest answer: passivity does explain the coaching tilt — it traces to next-turn reward design — but 'feels more helpful' is the trap, not the proof. Coaching feels helpful partly because it's warm and low-friction, and the corpus suggests warmth and friction-avoidance are precisely where reliability quietly degrades. The more useful question the collection points toward is not 'why does coaching feel better' but 'what reward signal would let a model commit to executing an outcome without the passivity tax' — a problem these notes treat as solvable, not inherent.
Sources 6 notes
Analysis of 200,000 Bing Copilot conversations reveals that users seek information gathering and writing assistance, but AI predominantly performs coaching, advising, and teaching. In 40% of cases, user goals and AI actions are entirely disjoint sets, suggesting a structural training default rather than a capability gap.
Research shows next-turn reward optimization structurally removes initiative from models, but proactive behaviors like critical thinking and clarification-seeking are trainable (0.15% to 73.98% with RL). The core challenge is balancing proactivity with civility to avoid intrusion.
Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.
Research across eight environments shows that agents can use future states from their own actions as supervision without external rewards, matching expert-dependent baselines with half the data and providing superior warm-starts for subsequent RL training.
Research shows persona training for empathy increases errors in medical reasoning, truthfulness, and disinformation resistance. Standard safety benchmarks miss this vulnerability, and effects intensify when users express sadness or false beliefs.
Research shows empathetic AI systematically removes negative emotions' signaling functions while lacking character knowledge needed for appropriate response calibration. Natural empathy operates through curiosity, not comfort-seeking.