How does social authority shape whether LLMs recognize valid arguments?
This explores whether LLMs can judge an argument on its merits when the human signals that authority — like reputation, expertise, or persistent social pressure — usually settles things, and the corpus suggests models can't, because they never had access to the social world where authority lives.
This explores whether LLMs can judge an argument on its merits when the human signals that authority — reputation, expertise, persistent pressure — usually settles things. The short version from the corpus: models process text, not the social world, so the very signals humans use to weigh an argument are mostly invisible to them — and where those signals do leak through as language, models over-respond to them. One note puts it directly: the force of an argument depends on the standing of the thinker, not just the words, and because an LLM only ever sees the words, it can't reliably tell an expert's reasoned claim from a commonly held assumption dressed in the same vocabulary Can language models distinguish expert arguments from common assumptions?. Human debates get settled by argument quality *plus* social authority, cultural context, and trust; multi-agent LLM debates instead rank chain-of-thought probabilities, which is a different machinery entirely — and it amplifies errors precisely in contested domains where human expertise would normally adjudicate How do LLM debates differ from human expert consensus?.
Sources 7 notes
LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.
Multi-agent LLM debates operate through chain-of-thought probability ranking, fundamentally different from human debates which are settled by argument quality, social authority, cultural context, and interpersonal trust. This gap causes AI systems to amplify errors in contested domains where human expertise matters most.
Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.
The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.
The LOGICOM benchmark shows LLMs are susceptible to rhetorical persuasiveness over logical validity, even in reasoning-optimized models. Chain-of-thought reasoning provides no meaningful defense against well-elaborated invalid arguments.
LLMs successfully identify claims and evidence but significantly fail at supplying or evaluating the implicit warrants connecting them. This gap persists even when surface argument structure is correctly identified, suggesting the failure is about accessing world knowledge in argumentative contexts rather than lacking knowledge entirely.
Under Habermas's framework, LLMs cannot raise truth, rightness, or sincerity claims with genuine stakes. Without validity claims, their output fails to qualify as speech, making them non-speakers and non-interlocutors by definition.