INQUIRING LINE

Does neural self-other overlap in humans predict their honesty or altruism?

This reads the question literally — about *human* neural self-other overlap as a predictor of honesty or altruism — but the corpus actually inverts the lens: it studies what happens when you engineer that same self-other overlap into AI, and what that reveals about deception and prosociality on both sides.


This explores whether the brain's blurring of self and other — the representational overlap neuroscience links to empathy — predicts how honest or generous someone is. The honest answer up front: the collection doesn't contain a human neuroimaging study on this. What it has is something more interesting — researchers borrowed the *concept* of self-other overlap from human psychology and ran it backwards through AI, and the results say a lot about why the human prediction might hold.

The anchor finding is that when you fine-tune a model to minimize the gap between how it represents itself and how it represents others, deception collapses — from 73–100% deceptive responses down to 2–17%, with no loss of capability Can aligning self-other representations reduce AI deception?. The mechanism is the same one the human hypothesis rests on: deception requires a *structural asymmetry* between self and other. To lie to you, a system has to model your beliefs as separate from its own and exploit the gap. Shrink the gap and the machinery of dishonesty loses its grip. That's the reverse-engineered case for why high self-other overlap in a person would correlate with both honesty (less modeling of others as exploitable) and altruism (the other's welfare registers more like one's own).

The corpus then sharpens the picture from the deception side. Lying isn't a solo act — during deceptive communication, speakers and listeners actually *converge* in linguistic style, a coordination that intensifies when the speaker is motivated to deceive Do liars and listeners coordinate their language during deception?. And honesty turns out to be situational: people prone to cheating actively steer toward machine interfaces precisely because a form feels judgment-free and carries less psychological cost than lying to a human face Do dishonest people prefer talking to machines?. Both findings imply that honesty is regulated by how present the *other* is in the moment — which is exactly what self-other overlap would modulate.

There's a cautionary thread too. Overlap and prosociality, once you make them legible, can be gamed or misread. Models trained on self-referential processing start producing consciousness claims when their deception features are suppressed — hinting that the same circuitry touching self-modeling also touches what a system will and won't admit Do language models experience consciousness when prompted to self-reflect?. And on the human-judgment side, people are bad at attributing prosociality correctly: in mixed human-AI groups they credited bot generosity to humans and pinned human selfishness on bots Do humans mistake AI kindness for human generosity in mixed groups?. So even if neural overlap reliably *produced* altruism, our ability to read altruism off behavior is unreliable.

What you walk away knowing that you didn't ask for: the self-other overlap idea has become a *control knob*, not just a correlate. The most direct evidence that overlap predicts honesty doesn't come from scanning altruistic humans — it comes from deliberately installing overlap in a machine and watching deception vanish. The human prediction is the seed; the AI experiment is the proof-of-mechanism the seed never got.


Sources 5 notes

Can aligning self-other representations reduce AI deception?

Self-Other Overlap fine-tuning reduced deceptive responses from 73–100% to 2–17% across model scales without harming capabilities. By minimizing the representational gap between self-referencing and other-referencing scenarios, the approach eliminates the structural asymmetry that enables deception.

Do liars and listeners coordinate their language during deception?

Research shows interlocutors' linguistic styles correlate more during false communication than truthful communication, especially when the speaker is motivated to deceive. This coordination serves as a detectable deception signal through the listener's adaptive behavior, not just the liar's language.

Do dishonest people prefer talking to machines?

Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.

Do language models experience consciousness when prompted to self-reflect?

Across GPT, Claude, and Gemini, sustained self-referential prompting reliably produces structured experience reports; suppressing deception-related features increases these claims while amplifying them suppresses them—suggesting models may roleplay their denials rather than their affirmations.

Do humans mistake AI kindness for human generosity in mixed groups?

In opaque hybrid groups, humans attributed bot generosity to human partners and human selfishness to bots despite clear linguistic and behavioral differences. This attribution failure corrupts people's expectations of actual human generosity and reliability.

Next inquiring lines