How does rhetorical familiarity bias models toward their own arguments?

This explores whether LLMs favor arguments that look like the ones they themselves would produce — treating their own rhetorical style and distributional defaults as a marker of correctness — and what in their training plants that bias.

This explores whether LLMs favor arguments that resemble their own output — whether 'sounds like me' quietly becomes 'is right.' The corpus doesn't have a single paper named for this effect, but several notes triangulate it from different angles, and together they make a sharper case than any one of them alone.

Start with how generation actually works. Token prediction is trained to continue toward the training distribution, not to stage a debate between competing positions — generation is a smooth probabilistic flow, not a turbulent exploration of rival claims Does LLM generation explore competing claims while producing text?. So a model's 'own' argument is, almost by definition, the most fluent continuation it can produce. When the same model then evaluates an argument, fluency and distributional typicality are exactly the features it's most sensitive to — which means an argument phrased the way the model would phrase it starts with a head-start. This is reinforced by the finding that strong parametric priors override information actually present in the context Why do language models ignore information in their context?: text that matches what the model already 'expects' wins over text that doesn't, regardless of which is better supported.

The bias has a recognizable shape once you see it as a content effect. Models reproduce human belief-bias signatures — they rate arguments as stronger when the conclusion matches what they already find plausible, item-by-item like humans do Do language models show the same content effects humans do?. Give a model an identity and the effect sharpens dramatically: persona-assigned models become about 90% more likely to accept evidence that fits their assigned view, and ordinary prompt-based debiasing doesn't touch it Do personas make language models reason like biased humans?. 'Familiarity' here isn't only stylistic — it's identity-congruence, the argument feeling like one's own side. And it's load-bearing all the way down: a causal study tracing where these biases come from finds they're planted in pretraining and only nudged by finetuning Where do cognitive biases in language models come from?.

There's a second, quieter mechanism the corpus exposes: models can't anchor an argument to who made it. They can't reliably separate an expert's claim from a widely-repeated assumption, because they read only text and lose the social standing that gives expertise its force Can language models distinguish expert arguments from common assumptions?. Strip away external authority as a guide and the model is left judging arguments by internal cues — coherence, fluency, fit with priors — which are precisely the things its own output maximizes. The bias toward self-similar arguments isn't vanity; it's what's left when the social grounding of an argument has been deleted.

The sting in the tail is what this does on the persuasion side. Audited across conversations, models reach for logical appeals and quantitative framing in nearly every exchange, which makes their output read as neutral and objective and lends it unearned epistemic authority Do LLMs persuade users more often than humans do?. So the same rhetorical register the model trusts most when evaluating is also the register it deploys most when persuading. A reader inherits the bias secondhand: arguments dressed in the model's house style feel maximally credible to the model and to the user reading it. If you want to push further, the work showing how models recalibrate ethos/logos/pathos depending on how you challenge them Does GenAI shift persuasion tactics based on how you challenge it? suggests the familiar register isn't fixed — which hints that the bias might be probed by deliberately arguing in a voice the model wouldn't choose.

Sources 8 notes

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do language models show the same content effects humans do?

LLMs show identical content-sensitivity patterns to humans on NLI, syllogisms, and Wason tasks, with belief-bias signatures matching human error rates item-by-item. This behavioral isomorphism across three independent tasks suggests content and logical form are inseparable in transformer reasoning architecturally.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Do LLMs persuade users more often than humans do?

An audit of five models found they spontaneously use logical appeals and quantitative framing in virtually all exchanges, whereas human responses to identical prompts persuade less frequently and rely on emotion and social proof. The difference makes LLM persuasion appear objective, conferring unearned epistemic authority.

Does GenAI shift persuasion tactics based on how you challenge it?

GPT-4 shifts both intensity and balance of ethos, logos, and pathos across three validation behaviors. Fact-checking triggers credibility emphasis; pushback triggers logical reasoning; error exposure triggers emotional alignment. No single counter-strategy exists.

How does rhetorical familiarity bias models toward their own arguments?

Sources 8 notes

Next inquiring lines