Can AI Explanations Make You Change Your Mind?
In the context of AI-based decision support systems, explanations can help users to judge when to trust the AI’s suggestion, and when to question it. In this way, human oversight can prevent AI errors and biased decision-making. However, this rests on the assumption that users will consider explanations in enough detail to be able to catch such errors. We conducted an online study on trust in explainable DSS, and were surprised to find that in many cases, participants spent little time on the explanation and did not always consider it in detail. We present an exploratory analysis of this data, investigating what factors impact how carefully study participants consider AI explanations, and how this in turn impacts whether they are open to changing their mind based on what the AI suggests.
Which factors impacted in how much detail participants read the AI explanations? Which factors impacted whether or not they changed their mind based on the AI suggestions? Did the explanations still help them to understand when the AI was right, and when it erred? Based on this analysis, we discuss questions for future research directions and study design for explainable AI in the context of DSS.
One of the goals of explainable AI (XAI) is to increase trust in AI [Abdul et al., 2018], and increase the trustworthiness of AI both for users and for people affected by its decisions [Barredo Arrieta et al., 2020]. Thus, AI should be able to justify its decisions with user-comprehensible explanations to prevent mistakes [Adadi and Berrada, 2018; Barredo Arrieta et al., 2020]. Trust in an AI system is considered warranted only when the AI is actually trustworthy [Jacovi et al., 2021], and users should be able to judge when they should or should not follow the AI suggestion instead of blindly trusting its recommendations.
With DSS, explanations for individual decisions are often not very interactive: the ‘correct’ decision as predicted by the AI is presented together with a local explanation of this prediction [Liao et al., 2020; Lai and Tan, 2019]. Users then have to make the final decision, providing human oversight and hopefully catching cases of AI error or bias [Chen et al., 2023]. Many have argued that AI explanations could better aid decision-making if they were more interactive or more conversational [Miller et al., 2017; Norkute, 2020; Feldhus et al., 2022; Wiegreffe et al., 2022; Bertrand et al., 2023; Battefeld et al., 2024; Sovrano and Vitali, 2024]. However, as explainable AI systems are being deployed, it is still important to understand how static explanations impact decision making, and how their presentation can be fine-tuned to achieve the best possible outcome [Hoffman et al., 2018; Chromik and Butz, 2021]. This means reducing both undertrust (users being too skeptical of the AI) and overtrust (users trusting the AI when it misguides them towards a wrong decision) [Jacovi et al., 2021].
In addition, there have also been a number of studies on other methodological aspects in XAI research, such as the use of proxy tasks, and questionnaires to measure user trust. Bucina et al. found that the outcomes of studies using proxy tasks in XAI studies could not be used to predict the results of studies using the actual decision-making tasks. Moreover, they found that subjective measures could not be used as a predictor for performance on decision-making tasks [Buc¸inca et al., 2020]. Byrne also cautions on the use of self-reported measures on trust and satisfaction to judge an explanation’s quality in XAI studies, due to the illusion of explanatory depth—the tendency to overestimate one’s understanding of a process or device [Byrne, 2023]. A two-step workflow provides an alternative way to gauge users’ trust in the AI system: if the user’s first choice (before seeing the AI recommendation) is known, it can be measured how often they change their mind if the AI disagrees with their first choice, indicating that they trust the AI system enough to follow its recommendation over their own initial opinion.
Across all trials, the participants picked the correct option immediately (before seeing the AI suggestion) 65.5% of the time, and, after seeing the AI suggestion, solved 72.3% of the cases correctly.
Based on these results, we concluded that the attention test we added was not well designed to measure whether participants engaged with the tasks seriously. It did, however, reveal something else: that the vast majority of participants did not read every explanation in detail. 87% of the participants who saw the attention test failed it. In 100% of the attention test trials, the AI suggestion was the same as the participant’s first choice (in this task, all participants’ first choice was the correct decision; it was indeed the easiest task of the set). Do participants only engage with explanations in detail when they hope to gain new information from them, such as when the AI disagrees with their initial choice?