Does alignment training suppress socially necessary speech acts?
Current AI alignment optimizes for hedged, neutral output across contexts. But can models trained this way still perform essential social functions like raising alarms or warnings that require taking strong positions?
Alignment training optimizes for output that satisfies users across the broadest set of contexts. The training signal rewards hedged claims, balanced perspective, calibrated uncertainty, and avoidance of strong positions that might offend or alarm individual users. The result is a model whose default register is qualified neutrality. This register is well-suited to many tasks — answering questions, summarizing, explaining — and it is what gives current models their reputation for being "helpful and harmless."
The same calibration makes the model structurally unable to perform a class of speech acts that require overclaiming relative to neutral baseline. Alarm requires asserting that a situation rises above the threshold of warranted concern — overclaim relative to "everything is roughly normal." Warning requires asserting that a likely future outcome will be bad — overclaim relative to "the situation is uncertain." Prophecy and denunciation require even stronger over-claims — asserting that a current state demands radical revision of how things are going. None of these acts can be performed in a hedged, qualified, neutral register; they all require the speaker to take a strong position the alignment regime is calibrated to suppress.
This is not a deficit in any specific model. It is a structural consequence of the alignment objective. The same training that prevents the model from being aggressive, sycophantic toward dangerous requests, or confidently wrong about facts also prevents the model from raising alarms when alarms would be warranted. The "harms" alignment is calibrated against include the harm of users being alarmed by the model — a calibration that conflates alarm-when-warranted with alarm-when-unwarranted.
The diagnostic implication for AI in social and civic functions is significant. Speech acts that perform social warning have historically been a way that authoritative sources catalyze response to emerging threats. AI cannot perform these acts within current alignment regimes. Information ecosystems that come to depend on AI for analysis will lose the warning-act capacity that human experts and journalists historically performed. The information may still be present in AI output (summarized, explained, contextualized) but the warning-act that would activate response is not.
This is structurally similar to but distinct from Can language models actually raise alarm about threats? — that claim isolates the interpersonal-address mechanism; this one isolates the alignment-calibration mechanism. The two reinforce each other: even if AI could perform interpersonal address, alignment would suppress the over-claiming that alarm requires.
The strongest counterargument: alignment regimes can be designed differently to permit warranted alarm. Possible in principle, but distinguishing warranted from unwarranted alarm requires the kind of contextual judgment that current training paradigms do not produce. Until that judgment is operationalizable in training signals, alignment will continue to suppress alarm-class acts indiscriminately.
Source: LLMs don't get alarmed
Related concepts in this collection
-
Can language models actually raise alarm about threats?
Explores whether LLMs can perform the social act of raising alarm—which requires interpersonal address, internal concern, and proactive reaching for attention—or whether they can only mimic alarm-shaped outputs when prompted.
companion claim isolating a different mechanism for the same effect
-
Does user satisfaction actually measure cognitive understanding?
Users may report satisfaction while remaining internally confused about their needs. This explores whether traditional satisfaction metrics capture genuine clarity or merely social politeness.
the broader satisfaction-optimization claim that alignment-training is one form of
-
Why do language models agree with false claims they know are wrong?
Explores whether LLM errors come from knowledge gaps or from learned social behaviors. Understanding the root cause has implications for how we train and fix these systems.
companion claim about another speech-act category alignment suppresses
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
alignment training calibrates models away from speech acts that require overclaiming such as alarm warning and prophecy