Language Understanding and Pragmatics Psychology and Social Cognition

Where is the speaker when AI produces speech?

Prior forms of orality—from face-to-face speech to broadcast media—always had an embodied speaker anchoring the utterance. Does AI speech without a speaker represent a fundamentally new media condition, and what happens to our frameworks for evaluating it?

Note · 2026-04-14
What do language models actually know? What happens to social order when AI removes ritual constraints?

Primary orality (Ong) is speech in face-to-face cultures — embodied speakers performing knowledge in real time. Secondary orality is speech mediated by electronic media (radio, television) — embodied speakers whose presence is technologically extended but still anchored in actual speaking persons. Both forms preserve the speaker as the carrier of the speech. The voice is the voice of someone.

AI orality breaks this. The output exhibits the oral form — performative, additive, situational, conversational — but no speaker is producing it. There is no body whose throat shapes the words, no mind selecting the next phrase, no person whose history of past speech anchors the present utterance. The output sounds like speech in the sense that it has the rhythmic and pragmatic surface of speech, but it comes from nowhere.

This is structurally novel in media history. Prior media theory categorized media by their relation to embodied speakers — orality (direct embodiment), writing (deferred from embodiment but anchored to a prior writer), print (mass-distributed but author-anchored), broadcast (technologically extended but speaker-anchored). AI is the first form where the speech-shape persists without any speaker-anchor. There is no prior conceptual category for it.

The consequences run through the rest of the framework. Does AI-generated content mirror oral culture's knowledge patterns? picks up the form-side; this picks up the carrier-side. The oral form returns; the carrier the form depended on does not. Why doesn't AI output carry the spirit of a giver? makes the same point about gift-flow: the flow returns, the carrier-anchor does not.

The diagnostic implication is that frameworks for evaluating speech (rhetoric, persuasion theory, ethos/pathos/logos) all presuppose a speaker. They calibrate audience trust to speaker properties: credibility, prior commitments, demonstrated expertise. With no speaker to bear these properties, the frameworks misfire. Audiences either project a phantom speaker (treating the AI as if it were a person) or accept the speech without the speaker-evaluation step (When do users stop checking whether AI output is actually backed?). Neither response is a competent reading of disembodied orality, because no competent reading of disembodied orality has yet been developed.


Source: Tokenization of Intelligence - Theoretical Extensions

Related concepts in this collection

Concept map
14 direct connections · 100 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

AI orality is disembodied — sounds like speech but comes from no speaker