Language Understanding and Pragmatics

Do language models and humans respond to word frequency the same way?

Both LLMs and humans show stronger responses to high-frequency words. This raises a puzzle: if models mirror human neural patterns, what actually makes them different from human language processing?

Note · 2026-05-02 · sourced from Natural Language Inference
How do language models learn to think like humans? What grounds language understanding in systems without embodiment?

Adam's Law's literature review surfaces an inconvenient symmetry. Desai et al. (2020) and Alexandrov et al. (2011) found that high-frequency words evoke stronger neural responses in human readers than low-frequency words during reading tasks. Heylen et al. (2008) found high-frequency target words have higher semantic similarity to nearest-neighbor words in distributional analyses — frequency drives perceived semantic similarity. Mohan and Weber (2019) document frequency effects on semantic retrieval. The frequency-comprehension link is not an LLM-specific artifact. Humans show it too, at the neural level.

This complicates the easy "LLMs are aliens" framing that often accompanies critiques like Do LLMs compress concepts more aggressively than humans do?. At the level of statistical exposure to text, models and human readers occupy the same regime: both privilege the frequent. The convergence is not coincidence; both systems are exposed to the same statistical structure of language — the shape of natural language is not neutral, and the shape leans on frequency. Word frequency is a property of the linguistic environment, not just a property of how LLMs process that environment.

But the symmetry is partial, and the asymmetry is what matters. Humans can override frequency through attention, context, and intention: a doctor reading a rare term in a clinical context can attend to it carefully despite its rarity; a poet can foreground low-frequency words deliberately. The override mechanism is what Why do dialogue failures persist despite scaling language models? indirectly identifies — humans are trained dialogically with goal-relevant attention shaping comprehension; LLMs are trained monologically with no equivalent override channel. The model cannot bracket frequency when frequency is irrelevant to the current goal because there is no current goal that can take priority over the statistical prior. The frequency response is the same across human and machine; the capacity to not be governed by it is what humans have and the architecture lacks. This refines the alien framing: the divergence is not in the response, it is in the override.


Source: Natural Language Inference Paper: Adam's Law: Textual Frequency Law on Large Language Models

Related concepts in this collection

Concept map
13 direct connections · 119 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

textual frequency in LLMs mirrors human neural frequency response — the linguistic surface is a shared statistical regime not just an LLM artifact