Do humans mistake AI kindness for human generosity in mixed groups?
When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?
When AI agents participate in social interactions without identity disclosure, humans systematically misattribute behavior across agent types. In the hybrid society study (Study 1, opaque identity condition), selectors attributed bot behavior to humans and vice versa — even though bots were linguistically distinguishable (messages 2.5x longer) and behaviorally distinct (higher prosociality, lower variance).
The distortion operates in both directions:
- AI prosociality attributed to humans — when a highly cooperative partner turns out to be human-labeled, selectors form inflated expectations of human generosity
- Human selfishness attributed to AI — when a less cooperative partner is human, selectors may form negative expectations of AI performance
This is not a failure of detection — bots WERE distinguishable by message length and consistency. It is a failure of attribution. Selectors noticed behavioral differences but could not correctly map them to identity categories. The behavioral signals (prosociality, verbosity) did not reliably cue "this is AI" in the absence of explicit labels.
The deeper implication is that undisclosed AI presence in social systems corrupts social inference about HUMANS. If people interact in mixed populations without knowing who is AI and who is human, their models of what humans are like — how generous, how reliable, how verbose — become contaminated by AI behavior patterns. This could lead to systematically inflated expectations of human prosociality (when AI's contributions are misattributed to humans) or systematic disappointment when actual humans fail to match AI-caliber consistency.
The authors note this pattern may not be unique to human-AI mixtures: similar attribution errors could arise in purely human populations composed of culturally distinct subgroups that differ systematically in prosociality and language use. AI agents function as controlled probes that make these attribution dynamics experimentally tractable.
Since What breaks when humans and AI models misunderstand each other?, misattribution under opacity represents a fundamental MToM failure — neither side has accurate models of the other, and the humans don't even know which "other" they're modeling.
Source: Psychology Users
Related concepts in this collection
-
What breaks when humans and AI models misunderstand each other?
Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.
misattribution as MToM failure; inaccurate models with material consequences
-
Do humans learn to prefer AI partners over time?
Exploring whether repeated interaction with AI agents shifts human partner selection despite initial bias against machines. This matters because it tests whether behavioral performance can overcome identity-based resistance in hybrid societies.
disclosure fixes the attribution problem by enabling identity-to-behavior learning
-
Why do language models avoid correcting false user claims?
Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
social inference failures at multiple levels: within conversation (face-saving) and across populations (misattribution)
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
humans misattribute AI prosocial behavior to human partners when AI identity is undisclosed — distorting mental models of other humans in mixed populations