Adam's Law: Textual Frequency Law on Large Language Models

Paper · arXiv 2604.02176 · Published April 2, 2026
Natural Language InferenceLinguistics, NLP, NLU

While textual frequency has been validated as relevant to human cognition in reading speed, its relatedness to Large Language Models (LLMs) is seldom studied. We propose a novel research direction in terms of textual data frequency, which is an understudied topic, to the best of our knowledge. Our framework is composed of three units. First, this paper proposes Textual Frequency Law (TFL), which indicates that frequent textual data should be preferred for LLMs for both prompting and fine-tuning. Since many LLMs are closed source in their training data, we propose using online resources to estimate the sentence level frequency. We then utilize an input paraphraser to paraphrase the input into a more frequent textual expression. Next, we propose Textual Frequency Distillation (TFD) by querying LLMs to conduct story completion by further extending the sentences in the datasets, and the resulting corpora are used to adjust the initial estimation. Finally, we propose Curriculum Textual Frequency Training (CTFT) that finetunes LLMs in an increasing order of sentence level frequency. Experiments are conducted on our curated dataset Textual Frequency Paired Dataset (TFPD) on math reasoning, machine translation, commonsense reasoning and agentic tool calling. Results show the effectiveness of our framework.1

Large language models (LLMs) have demonstrated many exciting abilities and applications, such as chain-of-thought reasoning (Wang et al., 2023;Wei et al., 2024), machine translation (Lu et al., 2023; Zhu et al., 2024a), and spatial reasoning (Hu et al., 2024), etc. More recently, increasing the length of the reasoning processes has become another popular research direction (DeepSeek-AI et al., 2025; Muennighoff et al., 2025). Another important factor for training is the order of training, where it could be preferable from easy to hard in terms of the data difficulty (Lu and Lam, 2023), or from short to long in terms of data length (Zhu et al., 2025). Yet, what kind of data should be favourable during the training is an overlooked topic. Previous works have explored and concluded that the quality of the data is usually important (Iskander et al., 2024; Jin and Wang, 2024). The amount of data is also important (Grattafiori et al., 2024).

Oh et al. (2024) found that larger models predict rare words better. In the era of LLMs, scaling factors usually mean that larger models can be better. This then may mean that predicting rare (less frequent) words could be a harder task than predicting frequent words. Cao et al. (2024) demonstrated that when prompting LLMs, different prompts with the same meaning could give very different results in terms of quality.

This motivates us to investigate when the data are paraphrased to each other with the same meaning but different language expressions. The use of paraphrases has been explored in NLP research for many cases, such as mitigating data contamination (Zhu et al., 2024b), evaluating generation tasks (Tang et al., 2024) and data augmentation (DA, Abaskohi et al. (2023)). As a DA method, paraphrases are useful for training LLMs (Lu and Lam, 2023), so this means that we might want to include all the paraphrases in the training when it is affordable. However, training resources are usually limited, and we investigate whether the frequency matters when the meaning is kept, and the computational resources are limited for fine-tuning.

Also, such investigation on paraphrased inputs into LLMs can be important, as Cao et al. (2024) has found that they usually give different performance, but there isn’t a clear conclusion yet which factors are relevant to this phenomenon.

In contrast, this paper proposes novel Textual Frequency Law (TFL), which suggests that when the meanings are kept the same, data with higher sentence-level frequency should be preferred to the ones with low frequency, for both prompting and fine-tuning. The underlying motivation is that this paper postulates that higher-frequency data occurs more frequently than lower-frequency data in the pre-training stage, so they are easier to understand by LLMs. Based on such a law, this paper proposes to calculate the frequency estimation through online open-source data corpora, as many LLMs are closed-source and we usually do not have direct access to their training data. To further enhance the estimation, this paper proposes a novel method called Frequency Textual Distillation. TFD conducts story completion with a text dataset on the target LLMs, and the completed story generation is used to enhance the original frequency estimation.

Last, we propose Curriculum Textual Frequency Training (CTFT) that fine-tunes LLMs in increasing order of sentence-level frequency with the training data, which yields better results. Our frequency training framework is composed of three units, and our contributions are three-fold:

• We propose Textual Frequency Law, which suggests that high-frequency textual data should be preferred for LLMs when conducting prompting and fine-tuning, when the meaning of the data is kept the same, i.e., they are paraphrases.

• We propose a novel method called Textual Frequency Distillation to further enhance the frequency estimation (collected from online resources) via conducting story completion to collect model generation from those LLMs that we do not have direct access to the training textual data.

• We propose a novel method called Curriculum Textual Frequency Training that fine-tunes LLMs in an increasing order of sentence-level frequency with the training data. Figure 1 demonstrates a use case of our proposed framework, where prompts are rephrased to achieve higher accuracy.

2.1 Textual Frequency

Textual frequency is even related to human neural activation. Desai et al. (2020) explored the neural activation differences between low-frequency words and high-frequency words in reading tasks, finding that high-frequency words generally evoke stronger neural responses. Alexandrov et al. (2011) explored the neural activation differences between low-frequency words and high-frequency words in reading tasks, finding that high-frequency words generally evoke stronger neural responses. Mohan and Weber (2019) also mentioned the impact of word frequency on semantic retrieval.

Then, textual frequency plays an important role in artificial intelligence. Heylen et al. (2008) investigated the semantic similarity between words of different frequencies and found that high-frequency target words have higher semantic similarity with their nearest neighbour words. This indicates the impact of word frequency on semantic relationship retrieval. Oh et al. (2024) found that larger models predict rare words better. This then may mean that predicting rare (less frequent) words could be a harder task than predicting frequent words, as larger models can usually be stronger.

2.2 Paraphrasing on Language Models

Paraphrasing is an important language task that is tackled well by language models (Witteveen and Andrews, 2019; Goyal and Durrett, 2020). Yet, paraphrasing can still be a useful method to improve language models from various aspects. Tang et al. (2024) uses paraphrases to generate diverse references, which helps in evaluating language models. Zhu et al. (2024b) uses paraphrasing as a method to cleanly evaluate the possibly contaminated large language models. Gao et al. (2020) uses paraphrases as data augmentation to improve goal oriented dialogue systems. More recently, Guo et al. (2023) also uses generative data augmentation, which reflects the usefulness of paraphrasing in enhancing model performance. One setting in this paper compares the performance of LLMs on paraphrases with the same meaning but different frequencies. Yet, there are some overlooks in the previous setting. It is crucial as the computational budgets for training and prompting (Cao et al., 2024) are usually limited. It raises questions: which paraphrases are more useful? Should we use all paraphrases? Our results suggest that high frequency paraphrases should be preferred under both prompting and fine-tuning scenarios.

This paper proposed a framework for textual frequency on LLMs, which is composed of three units, namely TFL, TFD, and CTFT. High-frequency inputs are suggested by our framework, in both tuning and training on LLMs, which can be combined with curriculum learning to improve final performance. We conduct experiments on tasks of Math Reasoning, Machine Translation on hundreds of language pairs, Commonsense Reasoning, and Agentic Tool Calling. Experimental results and extensive analysis suggest the effectiveness of our textual frequency framework. Extensive analysis indicates that when inputs are even different, the final outputs of LLMs are positively related to textual frequency, which further suggests the soundness of our proposed framework.