Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

Paper · arXiv 2407.08790 · Published July 11, 2024

Languaging is not the kind of thing that can admit of a complete or comprehensive modelling. From an enactive perspective we identify three key characteristics of enacted language; embodiment, participation, and precariousness, that are absent in LLMs, and likely incompatible in principle with current architectures. We argue that these absences imply that LLMs are not now and cannot in their present form be linguistic agents the way humans are. We illustrate the point in particular through the phenomenon of “algospeak”, a recently described pattern of high-stakes human language activity in heavily controlled online environments. On the basis of these points, we conclude that sensational and misleading claims about LLM agency and capabilities emerge from a deep misconception of both what human language is and what LLMs are

we place side by side on the one hand, what it is that LLMs do, and on the other hand, what human beings do when engaged in linguistic interaction.

statistical and computational sciences behind the development of LLMs on the one hand, and enactive cognitive science on the other, involve sharply distinct conceptions of what language is. In fact, the former rarely engages in rigorous conceptual understanding and analysis of language, but in engineering tools that imitate linguistic activity. This is a key point that underscores differences in values and goals between these different research communities Chemero (2023). As an analogy, artificial flight does not involve the kinds of things that are used to achieve flight by animals in non-human ecosystems. The goals of aeronautical engineers are not those of zoologists. Their methods and aims diverge accordingly.

In 1948, Claude Shannon wrote on the relation between language and entropy that: “Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem.” (Shannon, 1948, p.3). Current large language models such as Gemini, Bard, Llama, Megatron Turing, Bloom, and the GPT variants, are an engineering endeavour in these terms, albeit much more sophisticated and enormous in scale compared with Shannon’s original concept. Shannon’s original idea of entropy, that what is being measured, represented, and manipulated is form and not meaning, remains relevant now for LLMs.

From this description we can see that claims regarding linguistic capabilities of LLMs depend on two implicit assumptions of language. The first is what we call the assumption of language completeness - that there exists a “thing”, called a “language” that is complete, stable, quantifiable, and available for extraction from traces in the environment. The engineering problem then becomes how that “thing” can be reproduced artificially. The second assumption is the assumption of data completeness - that all of the essential characteristics can be represented in the datasets that are used to initialise and “train” the model in question. In other words, all of the essential characteristics of language use are assumed to be present within the relationships between tokens, which presumably would allow LLMs to effectively and comprehensively reproduce the “thing” that is being modelled.

Both of these assumptions are rejected by an enactive view of language, which sees it not as a “thing” to be captured by text data, but a practice in which to participate, whether that participation is through speech, written, sign, or other modality. In contrast with computational approaches’ emphasis on form, the enactive approach to language recognises that what truly matters for language is its meaning (in this there are strong resonances with Bender and Koller (2020)’s critical examination of the relationship between form and meaning in language model output). As such, the enactive approach to language starts not with tokens of verbal activity, but with the fundamental issue of agency, embodiment, precarity, and how meaning arises within situations where things matter to those involved.

Linguistic acts are those which manage an inter-subjective tension – a precariousness in the coordination between two or more people engaged together in a shared activity. Their analysis places great emphasis on the variety of ways in which such shared activities occur at multiple temporal and spatial scales, and that the resolution of a tension at one scale tends to introduce tensions at another.

The first is that we are always doing more than one thing. Linguistic actions are made within a nested set of contexts. When we encounter other people we are always already in some broad form of coordination with them in which we are participating. For instance, if we meet someone for the first time at a job interview we are already both participating in the behaviour setting Barker (1968); Schoggen (1989) of job interview. Our actions are thus already coordinated at a coarse grain of analysis, but also constrained – there are pressures and processes which will guide and drive our behaviour appropriate to the setting. Thus, coordination produces new tensions that must be managed through our linguistic skills. We spend pretty much all of our lives within behaviour settings Heft (2001), and our actions are organised accordingly, classically illustrated with the slogan, “people in church behave church, people in school behave school”.

This flow of tensions brings the second implication of the enactive account of linguistic agency to the fore. From an enactive perspective, any linguistic act is necessarily partial or incomplete in two different ways. First, an individual’s utterances are partial in that they are always made in response to (or in anticipation of a future response) and in coordination with another person as part of a shared on-going activity. Second, while an utterance or other linguistic act manages the tension arising at one level of the interaction, it cannot resolve all such potential conflicts and therefore introduces new tensions in the nested contexts that characterise the situation as a whole. These new unresolved tensions become the animating force for the driving forward of the interaction as it continues to unfold, precariously, over time.

This perforce very brief sketch of the enactive theory of linguistic agency illustrates how essential embodiment, participation, and precarity2 are to human language practice. We can see how strikingly different the conceptions of language maintained by computational approaches and enactive cognitive science are.

There is no more detail to see. Such un-grounded production of grammatically accurate but contentfully empty or vague text has been described as “confabulation”, or sometimes, “hallucination” OpenAI (2023), but these are inaccurate terms.

“Hallucination” is a failure of perception, the experience of something as present in the world that is not actually present. LLMs do not perceive – they are statistical models of a corpus of data. Nothing about their operation tracks or engages with the physical environment around them.

“Confabulation” is a similarly psychological term and perhaps less obviously misplaced. Human confabulation is the production of quasi-sensible narratives or explanations, in response to queries or prompts, that bear little to no relationship to the state of the world.

This is precisely because this text has no grounding in a shared context or experience, only in statistical relationships between words. LLM (mal)functionality is not confabulation, it is fabrication. Rather than an invented story that helps keep a flow of dialogue active and continuing, it is the generation of sensible-seeming, yet nonsensical text output on the basis of processed corpus. Crucially, because there is no difference in the processes used to produce the different outputs, LLM text is fabrication even when the resulting text output is appropriate and accurate to the reader’s needs and reality.

It is precisely these active, collaborative, and dynamic aspects of languaging which are not and cannot be captured in static representations and included in a corpus of training data. Languaging — including the casual chit-chats as we enter an elevator with others, gestures, body languages, tones, pauses, and hesitations — is not something that can be entirely captured in text but is an often fleeting phenomena without clear formalizable rules. These embodied linguistic participations can be peculiar, unrepeatable and take on a “life” of their own in a way that is not predictable Di Paolo et al. (2018).

When a person has knowledge of a domain, they typically have a strong sense of how to ask questions and what details to seek or avoid in order to support an on-going dialogue. People are aware both of what they know, what they don’t know, and how well the conversation overall is going. The seeking of clarification is a kind of activity that is grounded in a shared direction for the conversation, in which the discussion is continually being sculpted and steered as a collaboration. To be capable of clarification and repair, the participants have to be sensitive to divergence and breakdown. Indeed, the lack of question asking, or metacognition regarding the tentativeness of much of our understanding, is part of what has resulted in LLMs being experienced as fluent in the ‘mansplaining’ idiom Harrison (2023).

To understand language is not to be able to produce grammatical strings of words, but rather to participate in this process of negotiated, participatory meaning making. As we have noted above, it is this active, participatory character of language that has led enactive researchers to adopt the verb languaging in preference to the nominal ‘language’ in the research literature. It is an inherently collaborative, dynamic negotiation of meaning, the textual aspects of which are only part of the story.

This emphasis on participation and coordination over sentence construction means that much of the research comparing human and LLM production is simply not germane to the question of human linguistic activity. There is a wealth of such research now. Analyses find some parallels between the two (e.g. in variation of word use based on recent semantic context from both its own output and prompt input, Cai et al. (2023)), and some differences (e.g. in appropriate coordination of output with scalar and general conversational implicature of recent output and prompt text; Qiu et al. (2023)).

Given that a LLM is a curve fitted to a dataset with a sophisticated mechanisms for sampling, such analyses have a potentially important engineering role, in evaluating the extent to which there is appropriate correspondence between the map (the LLM) and the territory (human word production in text-based linguistic activity). They cannot, however, provide any argument for the validity of conflating map and territory. Evidence for that simply lies outside of word production, in the field of embodied, participatory, and valueladen interaction between agents. It is possible that artificial linguistic agents might be developed and engineered in the future, but evidence of such success cannot be on the basis of patterns of fluent token sequence production.

The enactive conception of language, because it involves dynamism and sociality, is one which recognises every linguistic act to be radically incomplete. According to this perspective, language is a partial act that can only be completed when it is taken up and extended, embellished, or steered and redirected, by other agents. This can be other people engaged in a complementary or counter move, which is itself also incomplete, dependent on that gesture or utterance being taken up in turn. Language is always and inevitably overspilling the kinds of information that can be made to ‘freeze in time’ within specific computational data structures and used to engineer LLMs.

Importantly, social norms and asymmetrical power structures permeate and shape our linguistic agency and the world around us. This means that factors such as our class, gender, ethnicity, sexuality, (dis)ability, place of birth, the language we speak (including our accents), skin colour, and other similar subtle factors either present opportunities or create obstacles in how a person’s capabilities are perceived.

4.3 Precarity

Linguistic agency, as described by Di Paolo et al. (2018) (see also Cuffari et al. (2015); Di Paolo (2021)), is a matter of continuous concernful management of conflicts, frictions, and tensions. These tensions emerge within intersubjective interactions, and while they can be addressed, every action taken to address them will unavoidably set up conditions for new tensions and mis-coordinations either immediately at a finer grain of action, or at some point in the future. Agency, within the enactive conception, whether of its basic biological kind, at the level of skilful action in the world, or in the intersubjective domain in which we find language, is seething with frictions, and the possibility of failure and the unravelling of the ongoing process in question (the interaction, the skilled action, the living body).

LLMs do not participate in social interaction, and having no basis for shared experience, they also have nothing at stake. There is no set of processes of self-production that are at risk, and which their behaviour continually stabilises, or at least moves them away from instability and dissolution. A model does not experience a sense of satisfaction, pleasure, guilt, responsibility or accountability for what it produces.

An enactive cognitive science perspective makes salient the extent to which language is not just verbal or textual but depends on the mutual engagement of those involved in the interaction. The dynamism and agency of human languaging means that language itself is always partial and incomplete. It is best considered not as a large and growing heap, but more a flowing river. Once you have removed water from the river, no matter how large a sample you have taken, it is no longer the river. The same thing happens when taking records of utterances and actions from the flows of engagement in which they arise. The data on which the engineering of LLMs depends can never be complete, partly because some of it doesn’t leave traces in text or utterances, and partly because language itself is never complete.

Large language models signify an extraordinary engineering achievement and a technological revolution like we have not seen before. However, they are tools – developed, used, and controlled by humans – that aid human linguistic interaction. These tools will increasingly aid human linguistic activities, but are not themselves linguistic agents, they do not demonstrate linguistic agency. To assume so is, as we have explained, to mistake the map for the territory.