Language Understanding and Pragmatics LLM Reasoning and Architecture

Do language models really understand meaning or just surface frequency?

Explores whether LLMs comprehend semantic meaning independently of textual frequency, or whether high-frequency paraphrases systematically outperform rare ones even when meaning is identical across math, translation, and reasoning tasks.

Note · 2026-05-02 · sourced from Natural Language Inference
How do language models learn to think like humans? Why do LLMs fail at understanding what remains unsaid?

Adam's Law (TFL) generalizes a previously local finding into a global property of LLM computation. The earlier NLI work showed predicates in entailment hypotheses skew higher-frequency than premises, and that fine-tuning amplifies rather than dilutes this bias — see Does fine-tuning on NLI teach inference or amplify shortcuts?. Adam's Law extends this across four task families: math reasoning, machine translation across hundreds of language pairs, commonsense reasoning, and agentic tool calling. The constant: when meaning is held fixed and only surface form varies, the higher-frequency paraphrase outperforms the lower-frequency one.

The mechanism is straightforward but uncomfortable. Higher-frequency text occurred more often during pre-training, so it sits in a denser, better-modeled region of the distribution. The model's "comprehension" is therefore not meaning-recognition first and surface-decoding second — it is statistical-mass recognition first, with meaning emerging downstream of that recognition. This converges with Can models pass tests while missing the actual grammar?: correct outputs do not certify that meaning is what the model is tracking.

The pattern matters because paraphrase invariance is a load-bearing assumption almost everywhere LLMs are deployed. We assume the same prompt, said two ways, will yield the same answer. Adam's Law says no: it will yield the frequency-weighted answer, and the surface form is a covariate of accuracy, not a transparent vehicle for the request. This also shadows the output side. Why do different LLMs generate nearly identical outputs? documents convergence in what models say; Adam's Law documents the same convergence in how models comprehend what is said to them. Both endpoints of the prompt-response loop pull toward the corpus mean. Frequency is not noise around meaning. Frequency is a substantial fraction of what comprehension means inside a transformer.


Source: Natural Language Inference Paper: Adam's Law: Textual Frequency Law on Large Language Models

Related concepts in this collection

Concept map
12 direct connections · 102 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

high-frequency phrasing wins — LLMs systematically prefer textually frequent paraphrases over rare ones with the same meaning