Language Understanding and Pragmatics LLM Reasoning and Architecture

Does word frequency correlate with semantic abstraction?

Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.

Note · 2026-05-02 · sourced from Natural Language Inference
Why do LLMs fail at understanding what remains unsaid? How do language models learn to think like humans?

The companion paper "LLMs are Frequency Pattern Learners in NLI" measured WordNet hyponym-hypernym pairs (e.g., "whisper" → "talk") and found hypernyms — the more general concepts — occur more frequently than their hyponyms. Hypernym frequency exceeds hyponym frequency systematically. Combined with Adam's Law's finding that LLMs prefer high-frequency phrasing across tasks, this yields a non-obvious correlation: when an LLM prefers a higher-frequency paraphrase, it is also preferring a more abstract paraphrase. Frequency is not just a register property; it is also a generalization-gradient property.

This sharpens Does fine-tuning on NLI teach inference or amplify shortcuts?. Fine-tuning on NLI does not just amplify a frequency preference — it amplifies a preference for inferences that move from specific to general (the upward semantic-entailment direction WordNet calls generalization). The model is not learning entailment; it is learning the surface signal of generalization, which happens to correlate with entailment in the kinds of sentences NLI corpora contain.

The implication for the Knowledge Custodian frame is uncomfortable. Expert knowledge lives in the hyponyms — the specific cases, the qualifying conditions, the rare technical terms. When LLMs prefer high-frequency paraphrases at parse time, they drift up the generalization gradient: away from the specific cases that distinguish an expert from a competent generalist, and toward the abstract concepts that any reasonably literate reader could state. This is the same direction Do LLMs compress concepts more aggressively than humans do? identifies in concept representations. The compression is not random — it has a direction, and the direction is from specific toward abstract, from rare toward common, from distinctive toward median. An expert who prompts in their own register is asking the model to comprehend in a region the model is bad at; the model's "help" is to gently flatten the request back toward the register where it performs well, which is exactly the register that erases what the expert was trying to say.


Source: Natural Language Inference Paper: Adam's Law: Textual Frequency Law on Large Language Models

Related concepts in this collection

Concept map
12 direct connections · 96 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

frequency tracks the generalization gradient — hypernyms outnumber hyponyms so frequent phrasing is also more abstract phrasing