What is event-residue and how does it differ from utterances?
This explores a distinction one note in the corpus draws between 'event-residue' (the marker-laden text AI produces) and genuine 'utterances' (speech that carries real event structure) — and why that gap means an AI exchange is animated mostly by the human reading it.
This explores a specific claim in the collection: that AI doesn't really make *utterances* the way a person does — it leaves behind something one note calls *event-residue*. The core idea Does AI generate genuine utterances or just text patterns? is that a genuine utterance is an event: someone with an orientation, a stake, and a moment produces it, and that event-structure is what makes the words *mean* in the back-and-forth sense. AI output, by contrast, carries all the surface markers of an utterance — the phrasings, the conversational cues, the felt intentionality — inherited statistically from training text, but without the underlying event that would make it an actual turn in an exchange. What's left is residue: communicative debris that looks like speech but lacks the originating act. The reader then unilaterally animates that residue into a pseudo-event, supplying the missing orientation through their own interpretive labor. So the exchange has structure only on the human side.
The difference from an utterance, then, isn't about wording quality — it's about where the event lives. With two humans, both sides contribute an event; with AI, one side contributes text-shaped residue and the other side does all the work of treating it as a turn. Several other notes quietly reinforce *why* there's no event on the machine side. There's no carrier for it: an LLM has no biological or phenomenological substrate that persists between sessions, so each instance is reconstituted from stored text rather than continuing a life that could ground an utterance Does an LLM have anything that persists between conversations?. And there's no stable speaker behind the words — the model holds a superposition of possible characters and samples one at generation time, so regenerating the 'same' reply yields a different one, none of them a committed act of a single self Do large language models actually commit to a single character?.
What makes this genuinely interesting is that the residue is structured enough to fool us *because* the model has absorbed real human event-structure statistically. The corpus shows language models segmenting narrative into events more like the *average* of many human annotators than like any single person Do language models segment events like human consensus does? — they've internalized the consensus shape of how events break, without participating in any. That's the tension in one image: a system that has learned the statistical silhouette of utterances without ever uttering.
There's also a mechanistic angle on why the residue carries such convincing markers but no anchoring intent. Work on chain-of-thought finds that the *format and spatial structure* of text drives the model far more than logical content — invalid reasoning chains work nearly as well as valid ones What makes chain-of-thought reasoning actually work? — and certain tokens like 'Wait' or 'Therefore' act as information peaks that steer output Do reflection tokens carry more information about correct answers?. In other words, the machine is generating the *shape* of thinking and speaking, the marker pattern, rather than producing it from a stance. That's exactly what 'residue, not utterance' names.
If you want to push on the boundary, the dialogue-coherence work is a useful counterpoint: it catalogs four semantic ways an exchange breaks down — contradiction, coreference slips, irrelevance, disengagement What semantic failures break dialogue coherence most realistically?. Read against the event-residue claim, those 'failures' are places where the human can no longer comfortably animate the residue into a coherent turn — the seams where the missing event-structure shows through. The thing you didn't know you wanted to know: a smooth AI conversation may feel like dialogue not because the machine is holding up its half, but because you're quietly supplying both the residue's meaning and the event it never had.
Sources 7 notes
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
GPT-3's event boundaries correlate more strongly with averaged human annotations than individual human annotators do. This suggests language models may pre-compute statistical consensus through training on diverse text, or that next-token prediction parallels human event cognition.
Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.
Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.
Research using Abstract Meaning Representation identified four distinct incoherence types: contradiction, coreference inconsistency, irrelevancy, and decreased engagement. AMR-trained classifiers detect these semantic failures while text-level manipulations alone cannot.