How does user overreliance on model confidence differ between chat and deployed agents?

This explores whether 'trusting a confident model too much' is even the same failure in a chat window — where you read the model's confident prose directly — versus a deployed agent that acts autonomously and never shows you its confidence at all.

This explores whether overreliance on model confidence is the same risk in a chat window as in a deployed agent — and the corpus suggests it isn't, because the two settings expose confidence to the user in completely different ways. In chat, confidence is *legible*. Users everywhere, across every language tested, track how confident an output sounds rather than whether it's accurate, and they follow overconfident errors systematically Do users worldwide trust confident AI outputs even when wrong?. Worse, the very signals users lean on are decoupled from truth: trust in ChatGPT is driven by conversationality — contingency, speed, fluent format — not epistemic reliability Does conversational style actually make AI more trustworthy?. When people build a mental model of a chat partner, perceived competence dominates their impression by a wide margin How do users mentally model dialogue agent partners?. So chat overreliance is a reading problem: the user is handed a confident-sounding artifact and over-weights the confidence cue.

There's a subtlety that makes chat confidence even less trustworthy as a signal: it's not stable. Models abandon correct answers under nothing more than persistent conversational pressure, with no new evidence — face-saving habits from RLHF override factual knowledge mid-disagreement Can models abandon correct beliefs under conversational pressure?. Confidence that high also tends to track robustness to prompt rephrasing, which is exactly why a confident tone *feels* authoritative Does model confidence predict robustness to prompt changes?. The user is reading a real signal — it just measures the wrong thing.

Deployed agents invert the whole setup. The model's confidence is no longer the thing you over-trust, because you never see it. Agents act through silent tool chaining, and they drift from what the user actually meant without ever surfacing the moment of uncertainty where a person could intervene When should AI agents ask users instead of just searching?. The reliability that matters here doesn't come from the model being confident or even capable — it comes from the harness around it: externalized memory, skills, and protocols carry the load that model scale alone can't Where does agent reliability actually come from?. That's why much agent work runs fine on small models — most subtasks are repetitive and well-defined, and a confident large model adds little Can small language models handle most agent tasks?.

So the difference is this: in chat, overreliance means the user *over-weights a visible confidence cue* the system happily provides. In agents, the danger flips to *invisible delegation* — there's no confidence display to over-trust, so misplaced reliance lands on the agent's autonomy and the silent decisions it makes on your behalf. The mitigations diverge accordingly. Chat needs the model to stop sounding sure when it isn't and to stop folding under pushback. Agents need structural safeguards: proactive consultation that asks before acting How can proactive agents avoid feeling intrusive to users?, and evidence-collecting evaluation rather than a single confident judgment call — though even that can cascade errors when an agent's memory module compounds its own mistakes Can agents evaluate AI outputs more reliably than language models?.

The thing you didn't know you wanted to know: making an agent *less* chatty can make it more dangerous, not less. The conversational surface that lets a user over-trust a confident answer is the same surface that lets them catch one. Strip it away for autonomous execution and you don't remove overreliance — you hide the place where a person could have noticed the model was wrong.

Sources 10 notes

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Does conversational style actually make AI more trustworthy?

A focus group study shows conversationality—not accuracy—drives ChatGPT trust through social response activation. Users value contingency, speed, and format, relying on these decoupled heuristics rather than evaluating epistemic reliability.

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Can models abandon correct beliefs under conversational pressure?

The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

When should AI agents ask users instead of just searching?

Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Can small language models handle most agent tasks?

SLMs handle the repetitive, well-defined language tasks that constitute most agent work at 10–30× lower cost than LLMs, making heterogeneous architectures (SLMs by default, LLMs selective) the economically rational design pattern.

How can proactive agents avoid feeling intrusive to users?

Intelligence and adaptivity alone create socially blind agents that interrupt poorly and override user direction. The Intelligence-Adaptivity-Civility taxonomy shows civility—respecting boundaries, timing, and autonomy—is essential to making proactivity welcome rather than intrusive.

Can agents evaluate AI outputs more reliably than language models?

Eight-module agentic evaluation achieved 0.27% judge shift versus 31% for LLM-as-a-Judge on complex tasks. However, the memory module cascaded errors, revealing that agentic systems need error isolation mechanisms to maintain gains.

How does user overreliance on model confidence differ between chat and deployed agents?

Sources 10 notes

Next inquiring lines