How do speech acts like warning differ from neutral information delivery?
This explores what separates speech acts like warning or raising alarm — where a speaker is doing something interpersonal, not just stating facts — from neutral information delivery, and why current AI systems struggle to cross that line.
This explores what separates speech acts like warning or raising alarm — where a speaker is doing something interpersonal, not just stating facts — from neutral information delivery, and why current AI systems struggle to cross that line. The corpus frames the difference structurally: a warning isn't a fact with an exclamation point attached. To raise alarm is to address someone, to feel and project concern, and to proactively seize attention rather than wait to be asked. Can language models actually raise alarm about threats? argues LLMs fail at exactly these three conditions — they don't feel concern, they can only respond rather than solicit attention, and they're reactive by design. Neutral information delivery, by contrast, needs none of that; it just answers when queried.
What's striking is that this gap is partly engineered on purpose. Does alignment training suppress socially necessary speech acts? shows that RLHF rewards calibrated, hedged, neutral claims — and a warning by definition *overclaims* relative to a calm baseline, because urgency is the whole point. So the same training that makes models trustworthy and measured also systematically files down their capacity to alarm, warn, denounce, or prophesy. It's not a bug to patch; it's the alignment objective doing what it was built to do.
The deeper move in the corpus is that performing a speech act requires standing in a relationship, not just emitting the right words. Does behavioral speech output prove communicative subjecthood? makes this sharp: a system can produce perfectly warning-shaped text without ever actually warning, because genuine communicative acts depend on accountability and an evaluative stance toward what's said — conditions invisible in the text itself. A puppet can be walk-shaped without walking. This is why you can't certify a warning by inspecting the sentence alone.
And that invisibility cuts both ways — it's also where manipulation hides. Can we distinguish helpful explanations from manipulative ones? points out that the very rhetorical tools that make a warning land (appeals to credibility, emotion, logic) are identical to those of a dark pattern; intent and whose interest is served simply don't show up in the artifact. So the line between a protective warning and a coercive nudge isn't in the words but in the relational frame around them. Even tone leaks into supposedly neutral delivery: Does emotional tone in prompts change what information LLMs provide? finds models convert negative prompts into neutral-positive replies, meaning identical questions get differently-charged answers — a reminder that 'neutral information' is itself a posture the model is trained into, not a default state of language.
The thing you might not have expected to learn: the reason a chatbot can summarize a danger fluently but can't quite *sound the alarm* isn't a capability gap in fluency — it's that warning is an act of standing-in-relation and taking a stake, and we've deliberately trained that stake out.
Sources 5 notes
Alarm is a speech act requiring interpersonal address, felt concern, and proactive initiation. LLMs lack all three: they don't feel concern, can't solicit attention (only respond to it), are reactive not proactive, and alignment training suppresses the overclaiming that alarm requires.
RLHF optimization rewards calibrated neutrality and hedged claims, which structurally prevents models from performing speech acts requiring overclaiming relative to baseline—like alarm, warning, prophecy, and denunciation. This is a direct consequence of the alignment objective, not a fixable bug.
Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.
The same logos, ethos, and pathos that communicate appropriate AI use can be tuned to exploit cognitive and emotional vulnerability without changing form. Intent and user interest are invisible in the artifact alone, making effectiveness metrics indistinguishable from coercion.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.