Should explanation quality be measured by user satisfaction or behavior prediction?

This explores a forced choice — do we judge an AI explanation good because the person liked it, or because it lets them predict what the model will do next — and the corpus argues both are flawed proxies for the thing that actually matters.

This reads the question as a contest between two yardsticks: user satisfaction versus behavior prediction. The corpus's sharpest move is to show that both can fail, and fail in opposite directions, so picking one over the other is the wrong frame. Satisfaction is the easier metric to game. Work on STORM finds that people report being satisfied while remaining internally confused — especially when they don't know what they don't know — and that durable understanding tracks sustained engagement, not the happiness score collected right after Does user satisfaction actually measure cognitive understanding?. Worse, explanations that feel good can actively mislead: reasoning traces and post-hoc justifications make users accept AI answers whether or not those answers are correct, manufacturing false trust Do explanations actually help users spot AI mistakes?.

But behavior prediction has its own trap. The counterfactual-simulatability research is the surprise here: explanations humans rate as correct and coherent routinely fail to predict how the model behaves on slightly altered inputs, and plausibility turns out to be uncorrelated with predictive accuracy Can LLM explanations actually help humans predict model behavior?. Crucially, RLHF makes explanations more convincing without making them more predictive — optimizing for satisfaction literally widens the gap, leaving users confident and wrong. So the two metrics aren't just different lenses on one quality; optimizing the first can degrade the second.

The more interesting answer the corpus offers is to stop measuring the explanation in isolation. One line of work reframes explainability as a communication problem: quality lives in the triad of who presents the explanation, how it's framed, and what the recipient is trying to do — not as an intrinsic property you can score once What if XAI is fundamentally a communication problem?. A study of 399 everyday explanations reinforces this, showing understanding is co-constructed through dialogue moves rather than delivered as a monologue — which is exactly what current one-shot LLM explanations get wrong What makes explanations work in real conversation?.

If you want a usable target instead of a binary, the corpus keeps pointing at the same one: does the explanation help the user do the right thing? That's why dual, contrastive explanations — arguing both for and against the answer — are the only kind shown to actually improve a person's ability to catch AI mistakes Do explanations actually help users spot AI mistakes?. And the RecExplainer line suggests you don't have to choose at all: it trains an LLM surrogate with separate behavior-alignment (matching outputs) and intention-alignment (reading internal states), then hybridizes them so the explanation is both faithful to the model and intelligible to the person Can LLMs explain recommenders by mimicking their internal states?.

The thing you might not have expected to learn: satisfaction and prediction aren't endpoints of one scale — they're closer to faithfulness and intelligibility, two requirements that can trade off against each other. The explanations worth building for hold both at once, and the cleanest way to detect a failure is behavioral but adversarial — can the user spot the model's error — rather than either a smile or a raw prediction score.

Sources 6 notes

Does user satisfaction actually measure cognitive understanding?

STORM shows users express satisfaction despite internal confusion, especially when unaware of knowledge gaps. Sustained engagement correlates with actual self-understanding, not immediate satisfaction ratings.

Do explanations actually help users spot AI mistakes?

Reasoning traces and post-hoc explanations increase user acceptance of AI answers regardless of correctness, engendering false trust. Only dual explanations presenting arguments for and against the answer genuinely help users distinguish correct from incorrect outputs.

Can LLM explanations actually help humans predict model behavior?

Explanations that humans judge as correct and coherent fail to predict model behavior on counterfactuals. RLHF optimization improves how convincing explanations seem without improving their actual predictive accuracy, leaving users confident but wrong.

What if XAI is fundamentally a communication problem?

Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.

What makes explanations work in real conversation?

Analysis of 399 daily-life explanations shows that topic relation, dialogue act, and explanation move jointly predict understanding success. Explanations are co-constructed through interaction patterns, not monological delivery—challenging how LLMs currently generate explanations.

Can LLMs explain recommenders by mimicking their internal states?

RecExplainer trains LLMs via three alignment methods: behavior (mimicking outputs), intention (incorporating neural embeddings), and hybrid (combining both). The hybrid approach produces explanations that are simultaneously faithful to the target model and intelligible to users by balancing internal-state inspection with human-readable reasoning.

Should explanation quality be measured by user satisfaction or behavior prediction?

Sources 6 notes

Next inquiring lines