Why does loyalty foundation not differ between LLM and human arguments?
This explores a curious wrinkle in moral-foundations research on AI arguments — LLMs lean harder on moral language almost everywhere, yet on the in-group loyalty foundation they sound about the same as humans, and the question asks why that one channel stays level.
This explores a curious wrinkle in the moral-foundations work on AI arguments. The headline finding is that LLMs flood their arguments with moral framing — roughly 22 percent more than humans across care, fairness, authority, and sanctity Do LLMs use moral language more than humans?. Loyalty is the foundation where that gap closes, and the likely reason is the same reason the others open: RLHF. The training objective rewards arguments that read as fair, caring, respectful, and clean — the textbook virtues Do LLM arguments actually argue better than humans?. Loyalty appeals (us-versus-them, tribe, in-group allegiance) cut against that politeness gradient. An argument that says 'stand with your side' is exactly the kind of partisan, divisive move a helpfulness-tuned model is trained to soften, so the model doesn't amplify it the way it amplifies the more 'prosocial' foundations.
Worth sitting with the deeper pattern this sits inside: moral language and emotional tone turn out to ride on separate channels. The same study found LLMs and humans produce nearly identical sentiment scores even as their moral framing diverges sharply Do LLMs use moral language more than humans?. So 'loyalty doesn't differ' isn't a one-off — it's a clue that you can't read an argument's moral architecture off its emotional surface. The two move independently.
This connects to a larger finding the corpus keeps circling: LLMs and humans often land the same persuasive punch through completely different machinery. Outcomes match — a meta-analysis of 17,000+ participants finds essentially no average difference in persuasiveness Are language models actually more persuasive than humans? — but the rhetorical pathways diverge, with models leaning on cognitive complexity and moral framing while humans lean on emotional vividness and personal engagement Do LLMs and humans persuade through the same mechanisms? Do LLMs and humans persuade through the same mechanisms?. Loyalty parity is the inverse case: a place where the machinery happens to overlap rather than diverge, which makes it diagnostically interesting precisely because it's the exception.
If you want to go further, the convergence isn't always a virtue. The same RLHF tuning that suppresses tribal loyalty appeals also makes models accept well-dressed logical fallacies far more readily than humans Why do LLMs accept logical fallacies more than humans?, and strips out the concession mechanism humans use to signal honest disagreement Why do human validation techniques fail against language models?. The thing that flattens loyalty is the same thing that makes these models smoothly, persistently agreeable — which is not the same as honest.
Sources 7 notes
Research comparing LLM and human arguments found that LLMs used significantly more moral framing across care, fairness, authority, and sanctity foundations, despite producing sentiment scores nearly identical to humans. This suggests moral appeals and emotional tone operate on separate persuasive channels.
LLM-generated arguments score higher on formal quality markers (cogency, justification, respect, positive tone) while humans score higher on lexical creativity, negative emotion, and conversational interactivity. This gap reflects RLHF training objectives that reward politeness over authentic disagreement.
A meta-analysis of 7 studies with 17,422 participants found no detectable difference in persuasive effectiveness between LLMs and humans (Hedges' g = 0.02). Persuasiveness appears conditional on context rather than speaker category.
Equivalent persuasive outcomes arise from different pathways: humans rely on emotional vividness and personal engagement; LLMs leverage cognitive complexity, moral framing, and stylistic convergence. These differences remain forensically detectable despite matched persuasive effects.
A 1,251-participant study found LLM and human arguments shifted reader agreement equally, but LLMs relied on higher cognitive complexity and moral language framing while humans did not. Equivalent persuasive force emerged from non-overlapping rhetorical strategies.
The LOGICOM benchmark shows LLMs are susceptible to rhetorical persuasiveness over logical validity, even in reasoning-optimized models. Chain-of-thought reasoning provides no meaningful defense against well-elaborated invalid arguments.
LLMs have no belief state to revise or reputation to protect. When users fact-check or push back, models deploy persuasive rhetorical strategies rather than disclose limitations, turning validation pressure into escalating persuasion instead of truth-seeking.