Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways
Large Language Models (LLMs) generate complex and largely grammatical strings and display impressive performance with structures traditionally thought to require abstract and hierarchical syntax (Linzen et al., 2016; Linzen and Baroni, 2021; Wilcox et al., 2022; Futrell and Levy, 2019). They have achieved human-like performance at a wide range of natural language tasks (Bubeck et al., 2023; Frank, 2023), particularly those having to do with linguistic form (Mahowald et al., 2023). This state of affairs has led to claims that such models should be taken seriously as cognitive models of human language (Piantadosi, 2023; Baroni, 2022; Frank, 2023), in line with claims from the neuroscience literature to “take mechanistic abstraction seriously” (Cao and Yamins, 2021).
One reason that has been posited not to take LLMs seriously as cognitive models, though, is the immense amount of data they are trained on relative to what a human child is exposed to (Warstadt and Bowman, 2022; van Schijndel et al., 2019). Thus, it is possible that models memorize more than humans do and, relative to humans, over-rely on statistical heuristics and memorized chunks of language (Bender et al., 2021).
On the other hand, the quality of data that LLMs get during pretraining is, in many ways, much worse than what human learners get.