Language Understanding and Pragmatics

Can human judges detect AI writing through lexical patterns?

While AI text shows measurable differences from human writing across six lexical dimensions, judges—including experts—fail to identify AI authorship reliably. Why does perceptible quality diverge from measurable reality?

Note · 2026-02-21 · sourced from Discourses
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

The lexical diversity study compared ChatGPT-generated text with human writing across six dimensions:

  1. Volume — total word count
  2. Abundance — richness of vocabulary
  3. Variety-repetition — ratio of unique to total words
  4. Evenness — distribution evenness across vocabulary
  5. Disparity — semantic distance between words used
  6. Dispersion — spread of vocabulary across text length

One-way MANOVAs confirm: LLM text differs significantly from human text on ALL six dimensions. The differences are statistically robust.

And yet: human judges in multiple studies — including applied linguists and NLP researchers — cannot reliably distinguish AI-generated from human-written text. This is not a new finding, but the combination with specific lexical diversity measurement is new: the differences are real and measurable, but they are the wrong kind for human perception. Human judges are apparently not attending to lexical diversity patterns when making authorship judgments.

This paradox has implications in multiple directions:


Source: Discourses

Related concepts in this collection

Concept map
17 direct connections · 128 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm text differs measurably from human text on lexical diversity but human judges cannot detect the differences