Dissociating language and thought in large language models

Paper · arXiv 2301.06627 · Published January 16, 2023

Here, we evaluate LLMs using a distinction between formal linguistic competence—knowledge of linguistic rules and patterns—and functional linguistic competence—understanding and using language in the world. We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty and often requires specialized fine-tuning and/or coupling with external modules. We posit that models that use language in humanlike ways would need to master both of these competence types, which, in turn, could require the emergence of mechanisms specialized for formal linguistic competence, distinct from functional competence.

We argue that LLMs have turned out to be surprisingly successful at mastering formal competence—qualitatively different in their formal linguistic capacities from models from before roughly 2018 in a way that was predicted by few practitioners in the field, and which was unexpected given longstanding claims that grammatically competent systems would require strong language-specific priors

In this paper, we have advanced the thesis that formal and functional linguistic competence are distinct capabilities, with formal competence relying on distinct language machinery and function competence requiring the integration of diverse brain networks. We have shown that formal competence emerges in contemporary LLMs as a result of the word-in-context prediction objective; however, this objective alone appears insufficient for equipping LLMs with functional linguistic competence skills. Based on the neuroscience evidence, we suggest that models that succeed at real-life language use will need to be modular, mimicking the division of labor between formal and functional competence in the human brain.

We see at least two ways to separate LLM circuits responsible for formal and functional competence: explicitly building modularity into the architecture of the system (we call this Architectural Modularity)