What makes expert judgment depend on anticipating audience acceptability?
This explores why genuine expertise isn't just being right — it's knowing what a community of peers will accept as right, and why that social anticipation is something AI's fluent output can't actually perform.
This explores why genuine expertise isn't only about holding correct information — it's about anticipating how a particular audience will receive a claim — and why that makes expert judgment a fundamentally communicative act rather than a retrieval task. The corpus frames expertise as inherently social: an expert claim is a *validity claim* that has to clear two bars at once, being both factually defensible and acceptable within the evolving standards of a community of peers Can AI replicate the communicative work experts do?. The expert is constantly running a quiet social calculation — what will land, what will be challenged, what counts as credible here right now Can AI anticipate whether expert claims will be socially valid?.
The reason this matters is that the force of an argument doesn't come from the words alone. It comes from who is saying it, their track record, reputation, and standing in the field — context that lives in the social world, not in the text Can language models distinguish expert arguments from common assumptions?. There's even empirical support for the audience half of this from an unexpected place: in debate corpora, what readers already believe (their ideology) predicts who wins more than the linguistic features of the arguments themselves Does what readers believe matter more than what debaters say?. Acceptability isn't a property of the claim; it's a property of the claim *meeting* a particular audience.
This is precisely the move that current AI can't make. A model can estimate statistical correctness from patterns in text, but it has no embedded membership in the communities whose standards it would need to anticipate Can AI anticipate whether expert claims will be socially valid?. Where human debates get settled by argument quality, social authority, and interpersonal trust, multi-agent AI debates resolve by ranking token probabilities — a different mechanism entirely, which is why they amplify errors exactly in the contested domains where expertise matters most How do LLM debates differ from human expert consensus?.
The sharp twist — the thing worth knowing you wanted to know — is that AI doesn't just fail at this silently; it fails *persuasively*. Polished, professional-looking output exploits an old human heuristic that good form signals good thinking, letting style stand in for the judgment that isn't there Does polished AI output trick audiences into trusting it?. Imitation-trained models make this concrete: they mimic a confident, fluent house style well enough to fool human evaluators while closing no actual capability gap Can imitating ChatGPT fool evaluators into thinking models improved?. And AI writing assistance has been shown to shift readers' perception of the writer across every dimension tested — toward more confidence, more authority — even when the underlying substance hasn't earned it Does AI writing assistance change how readers perceive the writer?.
So the dependency runs deeper than it first appears: expert judgment leans on anticipating audience acceptability because acceptability *is* part of what makes a claim expert in the first place. AI can counterfeit the surface signals of acceptability — fluency, polish, confident persona — while being structurally unable to perform the social anticipation underneath, which is what makes its confident output epistemically misleading rather than merely wrong Can AI replicate the communicative work experts do?.
Sources 8 notes
Expertise requires anticipating audience acceptability and social validity, not just retrieving information. AI lacks the mechanism to perform this communicative work, making its fluent output epistemically misleading despite its confident form.
Expert claims are validity claims that succeed when both factually correct and socially acceptable within a community. AI can estimate statistical correctness but cannot anticipate contextual acceptability because it lacks embedded knowledge of expert communities' evolving standards.
LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.
Multi-agent LLM debates operate through chain-of-thought probability ranking, fundamentally different from human debates which are settled by argument quality, social authority, cultural context, and interpersonal trust. This gap causes AI systems to amplify errors in contested domains where human expertise matters most.
Generative AI produces visually sophisticated outputs without underlying judgment, leveraging the historical heuristic that professional-looking work signals expert thinking. This substitution is especially risky for less experienced workers who lack domain knowledge to evaluate substance beyond form.
Imitation models fool human evaluators by mimicking ChatGPT's confident, fluent style while failing to improve factuality or generalization on novel tasks. The ceiling is set by base model capability, not fine-tuning method—better fundamentals, not shortcuts, drive real improvement.
A study of 2,939 writers and 11,091 readers found AI assistance shifted every tested dimension—29 total—toward extremism, confidence, quality, agreeableness, and perceived privilege. Distortions were statistically significant and directional, not random noise.