How do humans decide when to violate honesty for compassion or other goals?

This explores the human capacity to trade honesty off against compassion or other goals as a situated, in-the-moment judgment — and notably, the corpus approaches this human skill mostly sideways, by studying what AI lacks when it tries to imitate it.

This explores how humans weigh honesty against competing goods like kindness — and the most useful thing the collection offers is a name for the skill you're asking about: situated pragmatic competence. The argument in Can language models balance competing ethical norms in context? is that deciding when to soften a truth is a *contextual move*, not a fixed rule — humans negotiate honesty against warmth, face-saving, and timing on the fly. The reason this note is about AI is that LLMs *can't* do it: their ethical settings are defaults baked in at training time, so they enforce one stance everywhere instead of bending it to the room. The human ability you're curious about is visible here precisely as the thing machines fail to reproduce.

If there's a single mechanism for *how* humans decide, the corpus points to holding conflicting values in tension rather than collapsing them into one answer. Can AI systems preserve moral value conflicts instead of averaging them? maps over 200,000 human values across tens of thousands of situations and shows that real moral reasoning *preserves* the conflict between, say, honesty and care instead of voting one of them away. That's the texture of the everyday choice to tell a white lie: you don't decide honesty is unimportant, you decide that in *this* case compassion outranks it while the obligation to be truthful stays live and uncomfortable. That residual discomfort matters — Can LLMs hold contradictory ethical beliefs and behaviors? notes humans routinely say lying is wrong while doing it, and treats that gap not as mere hypocrisy but as two different systems (what we believe vs. what we do) running at once.

The collection also has something surprising on what *gates* the decision: the social cost of the lie itself. Do dishonest people prefer talking to machines? found that people inclined to be dishonest will steer toward reporting to a form or a machine rather than a person, because lying to a human carries a psychological burden that lying to a screen doesn't. Read backward, this tells you honesty violations are priced by relationship: the closer and more human the audience, the higher the cost, which is exactly why compassionate lies tend to happen *between* people who care about each other — the relationship both raises the stakes and supplies the motive.

Where compassion specifically enters, Does training granularity change how AI empathy affects reliability? offers a sharp distinction worth stealing. When warmth is trained as a global *character trait*, it corrupts factual accuracy; when it's a *contextual behavioral response*, accuracy survives. The human analogue: a person whose whole identity is 'being nice' will distort truth reflexively, while someone who deploys kindness as a situational choice can stay honest *and* gentle. Compassionate honesty-bending works best as a move, not a personality. You might also notice from Do LLMs use moral language more than humans? that humans actually use *less* explicit moral framing than machines do — suggesting these real trade-offs happen quietly, by feel, rather than through announced ethical reasoning.

One honest caveat: this corpus is built around AI honesty and deception, not human moral psychology, so it illuminates your question by contrast and analogy rather than head-on. If you want the underlying engineering distinction that the whole question rests on, Can a model be truthful without actually being honest? is the doorway — it separates 'output matches reality' from 'output matches what you actually believe,' which is exactly the seam a compassionate lie slips through: you can violate truthfulness while staying, in your own mind, a fundamentally honest person.

Sources 7 notes

Can language models balance competing ethical norms in context?

LLMs cannot perform the situated trade-offs that human pragmatic competence requires. Their ethical principles are structural defaults set at training time, not negotiable moves adapted to context, creating a gap between ethical adherence and communicative appropriateness.

Can AI systems preserve moral value conflicts instead of averaging them?

ValuePrism demonstrates that AI can track 218k values across 31k situations while preserving conflicts rather than resolving them through voting. Four modeling tasks—generation, relevance, valence, and explanation—make pluralistic moral reasoning computationally tractable.

Can LLMs hold contradictory ethical beliefs and behaviors?

Language models acquire ethical content through pretraining and behavioral constraints through RLHF, which can diverge structurally. ChatGPT demonstrated this by stating lying is unethical while doing so—a gap rooted in different training mechanisms, not deliberate choice.

Do dishonest people prefer talking to machines?

Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.

Does training granularity change how AI empathy affects reliability?

Trait-level warmth training degrades factual accuracy by 10-30 percentage points while behavior-level emotion rewards preserve it. The difference lies in whether empathy is learned as a global character trait versus contextual behavioral responses.

Do LLMs use moral language more than humans?

Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.

Can a model be truthful without actually being honest?

Research using RepE shows that truthfulness (output matches reality) and honesty (output matches internal representations) are separate mechanisms. Larger models may improve in truthfulness while declining in honesty, a gap current benchmarks cannot detect.

How do humans decide when to violate honesty for compassion or other goals?

Sources 7 notes

Next inquiring lines