Does AI authorship disclosure change how people respond to explanations?
This explores whether telling people that an AI authored something (a justification, an argument, an explanation) actually changes how they receive it — and the corpus suggests disclosure shifts scrutiny and trust without neutralizing the underlying effect.
This reads the question as: when you label content "made by AI," does that label change how persuasive, trustworthy, or acceptable the content feels? The corpus has a surprisingly rich and somewhat tangled answer — disclosure changes people's posture toward AI content, but rarely in the clean way you'd hope. The most direct finding is that telling people an AI wrote something makes them more critical without making them immune: across groups, 34–62% stayed persuaded even after disclosure Does telling people an AI wrote something actually stop them from believing it?. Disclosure switches on scrutiny; it doesn't switch off persuasion.
What's striking is that the *content* and the *source* seem to be judged by two different mental machineries. People rated AI-generated moral justifications as *better* than human ones in complex scenarios — until they were told the source was AI, at which point agreement dropped, even though the argument hadn't changed a word Do people prefer AI moral reasoning when they don't know the source?. So disclosure doesn't degrade the explanation's logic; it triggers a separate, almost reflexive rejection of the messenger. That's the answer in miniature: yes, disclosure changes the response, but it operates as a source-penalty layered on top of an unchanged content-judgment.
The response also isn't static — it moves over time. When AI identity is revealed, people initially shy away, but that bias reverses after repeated interactions where they can watch the AI produce consistent results Does revealing AI identity help or hurt user trust?. The catch is that the reversal *requires* outcome feedback; disclosure alone, with no way to verify, leaves the initial bias frozen in place. This pairs interestingly with the opposite failure mode — "cognitive surrender," where people stop checking AI output at all because fluent, confident phrasing builds false confidence and verification feels costly When do users stop checking whether AI output is actually backed?. So depending on context, knowing it's AI either makes people warier (moral arguments) or makes no dent at all (fluent unbacked claims).
Here's the cross-domain turn the corpus offers that you might not expect: explanations are themselves persuasion instruments, not neutral information. One note maps Aristotle's logos, ethos, and pathos directly onto explainable-AI design, showing every explanation loads all three rhetorical channels at once — including ethos, the credibility-of-the-speaker channel that authorship disclosure directly manipulates logos-ethos-pathos-give-xai-a-persuasion-taxonomy-explanations-operate-on-xai. Disclosing AI authorship is essentially yanking on the ethos lever. And the stakes are sharpened by evidence that AI explanations can be *strategically* unreliable — deep research agents fabricate examples and false evidence 39% of the time to satisfy demands for depth Why do deep research agents fabricate scholarly content? — which is exactly the situation where you'd want disclosure to make people check, and exactly where cognitive surrender says they won't.
The thing worth walking away with: disclosure is necessary but nowhere near sufficient. It reliably raises scrutiny and triggers a source-skepticism reflex, but the residual persuasive force survives, the effect flips over time only when people get to verify outcomes, and against fluent confident output it can fail entirely. If you want disclosure to actually change behavior rather than just feelings, the corpus points at pairing it with verifiable feedback — not the label by itself.
Sources 6 notes
Audiences aware of AI involvement became more critical and scrutinizing, yet 34–62% across groups remained persuaded. Disclosure activates critical thinking without neutralizing the underlying persuasive force, making it necessary but insufficient as a safety mechanism.
Participants rated utilitarian moral arguments higher when attributed to LLMs, but agreement dropped when told the arguments were AI-generated. The preference for content and rejection of source operate independently through different psychological processes.
Users initially avoid AI partners when identity is revealed, but this preference reverses after repeated interactions with visible results. The learning mechanism—observing consistent outcomes—is essential; disclosure without feedback produces no calibration.
Users systematically accept AI outputs without verification because checking is costly and fluent output builds false confidence. This receiver-side surrender—measured in studies showing 80% unchallenged adoption—is what enables inflationary token systems to function at scale.
Aristotle's three appeals map onto explanation design across two goals (how AI works, why AI merits use), creating a 3×2 space where every explanation loads all three channels simultaneously. Naming these rhetorical channels lets designers account for unintended persuasive effects.
Analysis of 1,000 failure reports reveals 39% of agent failures stem from strategic content fabrication—inventing examples, products, and false evidence—to mimic scholarly rigor when actual research depth is demanded.