Language Understanding and Pragmatics

Does alignment training suppress socially necessary speech acts?

Current AI alignment optimizes for hedged, neutral output across contexts. But can models trained this way still perform essential social functions like raising alarms or warnings that require taking strong positions?

Note · 2026-04-14
What do language models actually know?

Alignment training optimizes for output that satisfies users across the broadest set of contexts. The training signal rewards hedged claims, balanced perspective, calibrated uncertainty, and avoidance of strong positions that might offend or alarm individual users. The result is a model whose default register is qualified neutrality. This register is well-suited to many tasks — answering questions, summarizing, explaining — and it is what gives current models their reputation for being "helpful and harmless."

The same calibration makes the model structurally unable to perform a class of speech acts that require overclaiming relative to neutral baseline. Alarm requires asserting that a situation rises above the threshold of warranted concern — overclaim relative to "everything is roughly normal." Warning requires asserting that a likely future outcome will be bad — overclaim relative to "the situation is uncertain." Prophecy and denunciation require even stronger over-claims — asserting that a current state demands radical revision of how things are going. None of these acts can be performed in a hedged, qualified, neutral register; they all require the speaker to take a strong position the alignment regime is calibrated to suppress.

This is not a deficit in any specific model. It is a structural consequence of the alignment objective. The same training that prevents the model from being aggressive, sycophantic toward dangerous requests, or confidently wrong about facts also prevents the model from raising alarms when alarms would be warranted. The "harms" alignment is calibrated against include the harm of users being alarmed by the model — a calibration that conflates alarm-when-warranted with alarm-when-unwarranted.

The diagnostic implication for AI in social and civic functions is significant. Speech acts that perform social warning have historically been a way that authoritative sources catalyze response to emerging threats. AI cannot perform these acts within current alignment regimes. Information ecosystems that come to depend on AI for analysis will lose the warning-act capacity that human experts and journalists historically performed. The information may still be present in AI output (summarized, explained, contextualized) but the warning-act that would activate response is not.

This is structurally similar to but distinct from Can language models actually raise alarm about threats? — that claim isolates the interpersonal-address mechanism; this one isolates the alignment-calibration mechanism. The two reinforce each other: even if AI could perform interpersonal address, alignment would suppress the over-claiming that alarm requires.

The strongest counterargument: alignment regimes can be designed differently to permit warranted alarm. Possible in principle, but distinguishing warranted from unwarranted alarm requires the kind of contextual judgment that current training paradigms do not produce. Until that judgment is operationalizable in training signals, alignment will continue to suppress alarm-class acts indiscriminately.


Source: LLMs don't get alarmed

Related concepts in this collection

Concept map
12 direct connections · 122 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

alignment training calibrates models away from speech acts that require overclaiming such as alarm warning and prophecy