Why do newer AI models diverge further from human writing patterns?

As language models improve, they seem to generate text that is measurably less human-like in lexical patterns, yet humans struggle to detect this difference. What drives this divergence, and what does it reveal about how models optimize for quality?

Note · 2026-02-21 · sourced from Discourses

The lexical diversity study compared ChatGPT-3.5, 4, o4-mini, and 4.5. The key finding: the newer models — o4-mini and 4.5 — differ most from human-written text on lexical diversity measures. They are the least human-like by measurable metric.

At the same time, human judges consistently fail to detect AI-generated text regardless of model version. More capable models don't become easier to detect; the failure of human judgment is stable across model generations.

ChatGPT-4.5 produces higher lexical diversity than older models despite generating fewer tokens — it is more lexically dense, but the density pattern is still non-human. The implication: newer models aren't converging on human-like writing by becoming better at mimicking human lexical patterns; they are becoming better at generating high-quality text that is nonetheless systematically different from human text.

This suggests that the training objective (RLHF, quality preference) is pushing models toward a different optimum than "human-like lexical diversity." The optimum models converge on is rated higher quality by human raters but is more measurably distinct from how humans naturally write.

The widening gap between measurable and perceptible has an important practical consequence: as models improve, naive human-based detection becomes less viable, not more. Detection requires moving to statistical/computational analysis that humans don't spontaneously perform.

Source: Discourses

Related concepts in this collection

Concept map

12 direct connections · 93 in 2-hop network ·medium cluster

Why do newer AI models diverge further from huma… Can human judges detect AI writing through lexical… Can humans detect AI writing if it looks natural?

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

newer llm generations diverge further from human lexical patterns while becoming harder to detect