What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
The enactive approach to language (Di Paolo et al., from "Large Models of What?" 2024) identifies language not as a thing to be captured in data but as a practice to participate in. From this view, three properties are constitutively essential to linguistic agency — and all three are absent from LLMs:
Embodiment: Language depends on the mutual engagement of those involved in interaction. Languaging — casual chit-chats, gestures, body language, tones, pauses, hesitations — is not fully capturable in text. It is an often fleeting phenomenon without formalizable rules, arising in embodied participatory interaction. Text-based training data can never be complete because some of language doesn't leave text traces.
Participation: Language is an inherently collaborative, dynamic negotiation of meaning. It is always partial — an utterance only becomes complete when it is taken up and extended by other agents. Each utterance is both a response to prior acts and an anticipation of future uptake. The key insight: linguistic acts are made within a nested set of contexts (behaviour settings), already coordinated at a coarse grain while introducing new tensions that require linguistic management. This participation cannot be precomputed from corpus data.
Precariousness: Linguistic agency, in the enactive view, involves continuous management of intersubjective tensions. Something must be at stake. Agency is "seething with frictions, and the possibility of failure and the unravelling of the ongoing process." LLMs have no self-production processes that are at risk, no sense of satisfaction, guilt, responsibility, or accountability. Without precariousness, there is no genuine linguistic agency — only generation.
The enactive view makes a categorical claim, not a degree claim: these absences are "likely incompatible in principle with current architectures." This distinguishes it from the graded account in Can LLMs acquire social grounding through linguistic integration?.
A convergent argument arrives from a different philosophical tradition: Can disembodied language models ever qualify as conscious? (Shanahan's Wittgensteinian analysis). Where the enactive view identifies embodiment, participation, and precariousness as necessary for linguistic agency, Shanahan's argument identifies shared-world co-presence as necessary for consciousness candidacy. Both are categorical arguments; both name embodiment as the missing condition; they do so through different routes — enactive cognitive science vs. Wittgenstein's language games. The convergence strengthens the claim that embodiment is not just one feature among many that LLMs lack, but the enabling condition for the deeper properties both frameworks require.
Source: Linguistics, NLP, NLU
Related concepts in this collection
-
Can LLMs acquire social grounding through linguistic integration?
Explores whether LLMs gradually develop social grounding as they become embedded in human language practices, analogous to child language acquisition. Tests whether grounding is a fixed property or an outcome of participatory use.
the counterargument: grounding is gradual; enactive view says it's categorical
-
Can language models learn meaning from text patterns alone?
Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.
complementary argument from different theoretical framework
-
Do LLMs develop the same kind of mind as humans?
Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermas framing: absent participatory dimension is the same gap
-
Can AI systems learn social norms without embodied experience?
Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?
directly challenges the strong embodiment requirement: GPT-4.5 at 100th percentile for social norm prediction without any embodied experience, though systematic correlated errors may preserve space for weaker embodiment claims
-
Can AI agents learn people better from interviews than surveys?
Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
another empirical challenge to strong embodiment: text-based interview transcripts provide sufficient content richness for 85% response fidelity without any embodied participation; though the enactive view would note that the interview itself was an embodied interaction — the text merely captures its residue
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
linguistic agency from an enactive perspective requires embodiment participation and precariousness all absent in llms