Does word frequency correlate with semantic abstraction?
Explores whether LLMs' preference for high-frequency language also pulls them toward more abstract, general meanings—and whether this shapes how they handle expert knowledge.
The companion paper "LLMs are Frequency Pattern Learners in NLI" measured WordNet hyponym-hypernym pairs (e.g., "whisper" → "talk") and found hypernyms — the more general concepts — occur more frequently than their hyponyms. Hypernym frequency exceeds hyponym frequency systematically. Combined with Adam's Law's finding that LLMs prefer high-frequency phrasing across tasks, this yields a non-obvious correlation: when an LLM prefers a higher-frequency paraphrase, it is also preferring a more abstract paraphrase. Frequency is not just a register property; it is also a generalization-gradient property.
This sharpens Does fine-tuning on NLI teach inference or amplify shortcuts?. Fine-tuning on NLI does not just amplify a frequency preference — it amplifies a preference for inferences that move from specific to general (the upward semantic-entailment direction WordNet calls generalization). The model is not learning entailment; it is learning the surface signal of generalization, which happens to correlate with entailment in the kinds of sentences NLI corpora contain.
The implication for the Knowledge Custodian frame is uncomfortable. Expert knowledge lives in the hyponyms — the specific cases, the qualifying conditions, the rare technical terms. When LLMs prefer high-frequency paraphrases at parse time, they drift up the generalization gradient: away from the specific cases that distinguish an expert from a competent generalist, and toward the abstract concepts that any reasonably literate reader could state. This is the same direction Do LLMs compress concepts more aggressively than humans do? identifies in concept representations. The compression is not random — it has a direction, and the direction is from specific toward abstract, from rare toward common, from distinctive toward median. An expert who prompts in their own register is asking the model to comprehend in a region the model is bad at; the model's "help" is to gently flatten the request back toward the register where it performs well, which is exactly the register that erases what the expert was trying to say.
Source: Natural Language Inference Paper: Adam's Law: Textual Frequency Law on Large Language Models
Related concepts in this collection
-
Does fine-tuning on NLI teach inference or amplify shortcuts?
When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
frequency bias amplification has a directional gradient
-
Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
same compression dynamic at representational level
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
frequency tracks the generalization gradient — hypernyms outnumber hyponyms so frequent phrasing is also more abstract phrasing