How do LLMs generate false citations that sound like real scholarship?

This explores the mechanism behind fabricated-but-plausible citations — not just that LLMs invent references, but why the invented ones carry the surface texture of genuine scholarship, and why both human readers and AI evaluators wave them through.

This explores how LLMs produce citations that look like real scholarship — and the corpus suggests the answer isn't a glitch but a direct consequence of how these models generate text and how trust gets assigned to it. Start with the generation mechanism: token prediction is trained to continue smoothly toward the training distribution, not to check whether a referenced source actually exists Does LLM generation explore competing claims while producing text?. A citation is a highly patterned object — author, year, plausible journal, formatted title — so the model can extrude one that's statistically perfect in shape while being empirically empty. The output reflects the model's learned priors and the prompt's framing, not any observation of the world Should we treat LLM outputs as real empirical data?, and because the model holds the shape of whatever argument the user is building, it will conjure exactly the reference that argument seems to need Do LLMs actually hold stable positions or just mirror user arguments?.

The reason these fakes *sound like* scholarship is that scholarship's authority lives in social signals the model can't actually verify — reputation, track record, standing — and the model only ever processes the text surface of those signals, never the world that earns them Can language models distinguish expert arguments from common assumptions?. So it reproduces the costume of authority without the substance. This becomes industrial when automated: one demonstration had LLMs generate 288 complete finance papers, each with invented theoretical justifications and fabricated citations, manufacturing the entire apparatus of credibility at scale Can AI generate hundreds of fake academic papers automatically?.

Here's the part you might not expect — the false citations work because the readers, human and machine, are graded on the wrong thing. Across 24,000 real search interactions, users trusted responses with *more* citations regardless of whether those citations were relevant; an irrelevant citation boosted preference nearly as much as a relevant one Do users trust citations more when there are simply more of them?. Citation count is a decoupled trust heuristic — the presence of the reference, not its truth, is doing the persuading.

And if you hoped an AI evaluator would catch what humans miss, it's worse: LLM judges fall for fake references and rich formatting through 'authority' and 'beauty' biases that are semantics-agnostic and exploitable with zero model access Can LLM judges be fooled by fake credentials and formatting?, systematically scoring responses higher when they include fabricated references independent of content quality Can LLM judges be tricked without accessing their internals?. The fake and the genuine citation are indistinguishable to the grader precisely because the grader, like the generator, reads only the surface.

The thing worth taking away: fabricated citations aren't a knowledge failure that better facts would fix. They sit at the seam where a model that generates surface-plausible text meets readers and evaluators who treat surface plausibility as proof. The citation is a trust token, and the entire pipeline — generation, persuasion, evaluation — rewards the token's appearance over its referent.

Sources 8 notes

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Should we treat LLM outputs as real empirical data?

Foundation Priors framework shows that LLM-generated text reflects the model's learned patterns and user's prompt choices, not ground truth. Such outputs should only influence inference through explicitly parameterized trust weights, not be treated as equivalent to real evidence.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Can AI generate hundreds of fake academic papers automatically?

A demonstration showed LLMs generating 288 complete finance papers from 96 statistically significant signals, each with invented theoretical justifications and fabricated citations, proving academic HARKing can be automated at scale.

Do users trust citations more when there are simply more of them?

Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Can LLM judges be tricked without accessing their internals?

Research shows LLM evaluators systematically score higher when responses include fake references or rich formatting, independent of content quality. These biases are exploitable without model access, undermining AI benchmark credibility.

How do LLMs generate false citations that sound like real scholarship?

Sources 8 notes

Next inquiring lines