Where is the speaker when AI produces speech?
Prior forms of orality—from face-to-face speech to broadcast media—always had an embodied speaker anchoring the utterance. Does AI speech without a speaker represent a fundamentally new media condition, and what happens to our frameworks for evaluating it?
Primary orality (Ong) is speech in face-to-face cultures — embodied speakers performing knowledge in real time. Secondary orality is speech mediated by electronic media (radio, television) — embodied speakers whose presence is technologically extended but still anchored in actual speaking persons. Both forms preserve the speaker as the carrier of the speech. The voice is the voice of someone.
AI orality breaks this. The output exhibits the oral form — performative, additive, situational, conversational — but no speaker is producing it. There is no body whose throat shapes the words, no mind selecting the next phrase, no person whose history of past speech anchors the present utterance. The output sounds like speech in the sense that it has the rhythmic and pragmatic surface of speech, but it comes from nowhere.
This is structurally novel in media history. Prior media theory categorized media by their relation to embodied speakers — orality (direct embodiment), writing (deferred from embodiment but anchored to a prior writer), print (mass-distributed but author-anchored), broadcast (technologically extended but speaker-anchored). AI is the first form where the speech-shape persists without any speaker-anchor. There is no prior conceptual category for it.
The consequences run through the rest of the framework. Does AI-generated content mirror oral culture's knowledge patterns? picks up the form-side; this picks up the carrier-side. The oral form returns; the carrier the form depended on does not. Why doesn't AI output carry the spirit of a giver? makes the same point about gift-flow: the flow returns, the carrier-anchor does not.
The diagnostic implication is that frameworks for evaluating speech (rhetoric, persuasion theory, ethos/pathos/logos) all presuppose a speaker. They calibrate audience trust to speaker properties: credibility, prior commitments, demonstrated expertise. With no speaker to bear these properties, the frameworks misfire. Audiences either project a phantom speaker (treating the AI as if it were a person) or accept the speech without the speaker-evaluation step (When do users stop checking whether AI output is actually backed?). Neither response is a competent reading of disembodied orality, because no competent reading of disembodied orality has yet been developed.
Source: Tokenization of Intelligence - Theoretical Extensions
Related concepts in this collection
-
Does AI-generated content mirror oral culture's knowledge patterns?
Walter Ong's framework for oral versus literate cultures may describe how AI content functions on social media. Understanding this parallel could explain why AI discourse feels fundamentally different from print-era knowledge.
companion claim about the form-side of AI orality
-
Why doesn't AI output carry the spirit of a giver?
Does AI-generated output function like a gift in Mauss's sense, where the giver's spirit obligates the receiver? This explores whether statistical residue can replace the moral weight of personal obligation.
same carrier-absence pattern in the gift-economy frame
-
When do users stop checking whether AI output is actually backed?
What causes users to accept AI-generated content at face value without verifying its basis? Understanding this receiver-side acceptance reveals how intelligence-token systems maintain value despite lacking real backing.
one of the two failed receiver-side responses to disembodied orality
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
AI orality is disembodied — sounds like speech but comes from no speaker