Language Understanding and Reasoning AI Social Psychology

Do explanations actually help users spot AI mistakes?

Most AI explanations are designed to justify the system's answer, but do they help users distinguish correct from incorrect outputs? This research tests whether standard explanation formats genuinely improve error detection or just increase trust regardless of accuracy.

Note · 2026-05-28 · sourced from Flaws

Users of LLMs must decide whether to trust an answer, often aided by reasoning traces, their summaries, or post-hoc explanations. The implicit assumption is that more explanation helps users judge correctness. A between-subject user study — simulating settings where users cannot independently verify the solution — tests this and finds the assumption largely false. Reasoning traces and post-hoc explanations are persuasive but not informative: relative to a no-explanation baseline, they increase user acceptance of the model's prediction regardless of whether that prediction is correct. They engender false trust.

The one condition that breaks the pattern is contrastive dual explanation, where the user is shown arguments both for and against the AI's answer. Dual explanation has the lowest rate of engendering false trust and is the only condition that genuinely improves users' ability to distinguish correct from incorrect outputs. The contrast with reasoning traces is instructive: traces produce high accuracy on correct answers but poor detection of incorrect ones (they raise confidence uniformly), whereas dual explanations produce a balanced effect — users stay accurate on both correct and incorrect cases.

Why it matters: the standard explanation formats deployed in production are optimized to be one-sided advocates for the answer, which is exactly what makes them persuasive without being diagnostic. Surfacing the case against the answer is what restores the user's discriminating capacity. The counterpoint, and the design lesson, is that "explainability" and "appropriate trust" can be at odds — adding a confident rationale can make a wrong answer more believable, so the intervention that helps is the one that deliberately argues against the system's own output.


— "Evaluating the False Trust Engendered by LLM Explanations", https://arxiv.org/abs/2605.10930

Related concepts in this collection

Concept map
15 direct connections · 128 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

only contrastive dual explanations arguing both sides genuinely improve users ability to detect ai errors