Language Understanding and Pragmatics LLM Reasoning and Architecture Reinforcement Learning for LLMs

Does fine-tuning on NLI teach inference or amplify shortcuts?

When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.

Note · 2026-02-21 · sourced from Natural Language Inference
What kind of thing is an LLM really? Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

"LLMs are Frequency Pattern Learners in NLI" identifies a consistent frequency bias in NLI datasets: predicates in hypotheses are more frequent in training data than predicates in premises, for positive (entailment) instances. LLMs exploit this pattern. The disturbing finding: fine-tuning on NLI corpora increases reliance on frequency bias rather than decreasing it.

The mechanism connects to a real property of language. Hypernyms (more general terms: "animal") are more frequent than hyponyms (more specific terms: "dog") in natural text. Since upward entailment works from specific to general (SPRINT entails RUN), frequency can be a useful proxy for entailment direction. Fine-tuning teaches models to exploit this proxy more aggressively.

The problem: frequency is a statistical artifact, not a semantic relationship. It works often enough to appear as learning on standard benchmarks but fails on adversarial cases where the frequency pattern disagrees with the actual entailment label. After fine-tuning, LLMs perform significantly worse on adversarial instances than base models — they have learned the shortcut more deeply.

This is a general pattern in the vault: Can models pass tests while missing the actual grammar? shows that surface heuristics enable correct behavior on easy cases while degrading robustness on unusual ones. Fine-tuning amplifies this problem by rewarding the heuristic through the training signal. The model that appears to "learn inference" has learned to use training data statistics more efficiently.

What distinguishes this from the attestation bias (memorization of specific sentences): frequency bias operates at the corpus level — it is a statistical regularity learned from the distribution of natural text, not from specific memorized statements. Both are shortcuts that substitute for inference, but they originate from different levels of the training data.


Source: Natural Language Inference

Related concepts in this collection

Concept map
15 direct connections · 135 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

fine-tuning on nli amplifies llm frequency bias rather than teaching genuine inference