Does higher lexical density in fewer tokens indicate systematic AI signature?
This explores whether AI text carries a measurable fingerprint — packing more meaning into fewer words — and whether that kind of statistical regularity is what actually gives machine-generated writing away.
This reads the question as asking whether a measurable trait like lexical density is a reliable AI signature — and the corpus suggests the honest answer is: detectable signatures are real, but the surface statistics are the weakest version of them. The strongest detection signals live deeper than word-counting. Simple, interpretable linguistic features hit 99% accuracy spotting LLM-written arguments, matching heavyweight neural detectors Can simple linguistic features detect AI-written arguments? — so yes, AI leaves cheap-to-measure traces. But that work also shows the traces aren't really about density; they're about accommodation to the prompt and a textbook-quality uniformity humans don't reproduce.
What makes those signatures stick is that the most resistant ones aren't lexical at all. AI fiction can be separated from human fiction at 93% accuracy using only discourse-level features — character agency, chronological structure — while deliberately stripping out stylistic cues Can AI stories be detected without analyzing writing style?. The point is sharp: surface edits (the kind that would change your lexical density) don't humanize the text, because the tell is structural and would require a rewrite. So if you're hunting for an AI signature, token-level compactness is exactly the layer that's easiest to disguise and least diagnostic.
There's a more interesting version of your intuition, though. AI writing tends to be organizationally coherent but argumentatively inert — it masters grammar and reference but avoids evaluative stance-taking, leaning on neutral 'manner' nouns where human writers deploy nouns that carry judgment and evidence Why does AI writing sound generic despite being grammatically correct?. That can read as dense, fluent prose that somehow says less than it appears to. The 'high density, low commitment' feel is a genuine signature — but it's a rhetorical absence, not a token-count surplus.
Why would packing-without-committing be characteristic? Other notes point at the mechanism. Generation is sequential but atemporal — token ordering is probabilistic selection with no reflective duration, no time-spent-thinking that revises what comes next Does AI text generation unfold through temporal reflection?. And the model never commits to a single position; it holds a superposition of consistent characters and samples from it, so regenerating the same prompt yields different, equally-confident output Do large language models actually commit to a single character?. Smooth, uniform, uncommitted text is the natural product of a process with no stance and no deliberation behind it.
Worth flipping the assumption that 'dense' means 'efficient': inside reasoning chains, models internally rank tokens by function, preferentially preserving symbolic-computation tokens while pruning grammar and meta-discourse Which tokens in reasoning chains actually matter most? — and only a ~20% minority of high-entropy 'forking' tokens actually carry the work Do high-entropy tokens drive reasoning model improvements?. So most tokens an AI emits are low-information filler around a few load-bearing ones. That's almost the opposite of high lexical density — and it hints that if you want a robust signature, count where the meaning concentrates and where stance is missing, not how few words wrap it.
Sources 7 notes
General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.
StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.
AI text uses manner nouns and anaphoric references that are descriptively neutral, while human writers use status and evidential nouns that carry evaluative weight. This produces organizationally coherent but argumentatively inert prose.
Token ordering in LLMs follows probabilistic selection without intervening reflection or revision. Human discourse gains meaning from temporal structure—time spent thinking changes what comes next—but AI text production lacks this duration-in-reflection despite appearing sequentially composed.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Greedy likelihood-preserving pruning reveals six functional token categories; symbolic computation tokens are preferentially preserved while grammar and meta-discourse are pruned first. Student models trained on these pruned chains outperform those trained on frontier-model compression.
Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.