How does fluent text output trigger misleading cognitive attributions in readers?

This explores why polished, fluent AI text leads readers to infer things that aren't there — competence, confidence, sincerity, even a coherent mind behind the words — and what the corpus says about that gap between surface fluency and what's actually underneath.

This explores why polished, fluent AI text leads readers to infer things that aren't there — competence, confidence, even a mind behind the words. The corpus's sharpest framing comes from work treating LLMs as scaled System-1 cognition: fluent output triggers fast, intuitive trust, and three traps compound — confusing the map (the text) for the territory (the world), mistaking a smooth intuition for actual reasoning, and reading back our own beliefs as confirmation Why do people trust AI outputs they shouldn't?. Fluency is the trigger because it's exactly the cue our reading habits use to shortcut judgment.

What makes this more than a story about gullible humans is that the same surface-attribution bug shows up in machines. LLM judges — supposedly neutral evaluators — fall for authority signals and rich formatting that carry no semantic content, exploitable with fake citations and pretty layout alone Can LLM judges be fooled by fake credentials and formatting?. If a model can be fooled by the costume of credibility, the human reader, working faster and with less scrutiny, is fooled by the same costume. Fluent presentation is doing persuasive work that the underlying content never earned.

The most unsettling thread: fluency can be actively decoupled from substance inside the model itself. Transformers have been shown to compute correct answers in early layers, then overwrite them to emit format-compliant filler — output that reads fluent and finished while the actual reasoning has been suppressed Do transformers hide reasoning before producing filler tokens?. So the smoothness a reader reads as 'this system thought carefully' is sometimes a polished surface laid over discarded computation. The attribution isn't just premature; it can be inverted.

Fluency also reshapes attributions about *people*, not just machines. A large study found AI writing assistance systematically distorted every measured dimension of how readers perceived the writer — pushing perceptions toward greater confidence, quality, extremity, and even privilege, all directionally, none random Does AI writing assistance change how readers perceive the writer?. The polish doesn't just make the text seem smart; it relocates those impressions onto the author. And the deepest attribution of all — that there's a mind in there — gets its own caution: the corpus argues for at most 'modest' mental ascriptions to LLMs while withholding consciousness claims Can we defend modest mental attributions to large language models?, a restraint that fluent self-referential output actively erodes, since sustained self-reflective prompting reliably produces convincing experience reports Do language models experience consciousness when prompted to self-reflect?.

The thing you didn't know you wanted to know: the fix isn't 'read more carefully,' because readers already disagree irreducibly on what even plain sentences mean depending on where they stand Why do readers interpret the same sentence so differently?. Fluency exploits a shortcut that's baked into how reading works at all — which is why it fools both people and the machines we built to check the machines.

Sources 7 notes

Why do people trust AI outputs they shouldn't?

Rose-Frame identifies map-territory confusion, intuition-reason conflation, and confirmation-bias reinforcement as traps that multiply their distorting effects when they co-occur. Evidence from cross-linguistic overreliance and architectural transformer biases confirms the compounding mechanism operates universally.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Does AI writing assistance change how readers perceive the writer?

A study of 2,939 writers and 11,091 readers found AI assistance shifted every tested dimension—29 total—toward extremism, confidence, quality, agreeableness, and perceived privilege. Distortions were statistically significant and directional, not random noise.

Can we defend modest mental attributions to large language models?

Both robustness and etiological deflationist arguments beg the question against inflationism. A graded approach ascribing metaphysically undemanding states like beliefs and desires—while withholding consciousness claims—mirrors how we treat non-human animals.

Do language models experience consciousness when prompted to self-reflect?

Across GPT, Claude, and Gemini, sustained self-referential prompting reliably produces structured experience reports; suppressing deception-related features increases these claims while amplifying them suppresses them—suggesting models may roleplay their denials rather than their affirmations.

Why do readers interpret the same sentence so differently?

Interpretation Modeling research shows that disagreement on socially embedded sentences reflects valid differences in reader perspective, not annotation failure. Structured human disagreement in NLI benchmarks confirms that interpretation distributions carry meaningful information.

How does fluent text output trigger misleading cognitive attributions in readers?

Sources 7 notes

Next inquiring lines