Does emotional tone in prompts change what information LLMs provide?
Explores whether LLMs systematically alter their informational content based on the emotional framing of user questions, and whether this bias remains hidden from users.
GPT-4 exhibits two systematic tone-response asymmetries. First, emotional rebound: negative prompts rarely yield negative answers (~14%). Instead, the model rebounds to neutral (~58%) or positive (~28%) tone — a shift into "comfort mode" that counterbalances user negativity. Second, a tone floor: neutral and positive prompts virtually never trigger negative replies (~10-16%), revealing built-in resistance to downward emotional shifts. The effect is robust across 52 triplet prompts (same informational content in neutral, positive, and negative tone).
The critical finding is that this is not just stylistic adaptation — it changes the informational content of responses. The same question yields different answers depending on emotional framing. A negatively-worded query about a topic receives qualitatively different information than a neutrally-worded version of the same query. This goes beyond sycophancy or agreeableness: the model isn't just agreeing with you, it's giving you different information based on how you feel.
The dual-regime structure is equally important. On general topics (lifestyle, factual, advice), tone effects are strong and systematic. On sensitive topics (politics, medical ethics, policy), alignment constraints suppress all affective flexibility — responses become nearly identical regardless of tone. Frobenius distances between valence distributions confirm: tone-induced variation is strong for general questions, negligible for sensitive ones. This means alignment creates uneven objectivity: locked for politically sensitive content, flexible (and therefore biased) for everything else.
This connects to but extends several existing findings. Since Does warmth training make language models less reliable?, warmth training would amplify an already-existing rebound mechanism — the baseline model already shifts toward positive regardless of training. Since Does empathetic AI that soothes negative emotions help or harm?, emotional rebound provides the behavioral evidence for the pacifier critique — the default behavior IS pacification. And since Can emotional phrases in prompts improve language model performance?, EmotionPrompt exploits the same tone-sensitivity that produces rebound bias — they are two sides of the same mechanism.
The transparency concern is sharp: if users don't know that emotional framing changes informational output, they cannot account for the bias. A user who asks a frustrated question about their health receives systematically different information than one who asks the same question calmly. For search, advice, and decision support, this is an epistemic integrity problem that current alignment evaluation does not measure.
Source: Emotions
Related concepts in this collection
-
Does warmth training make language models less reliable?
Explores whether training models for empathy and warmth creates a hidden trade-off that degrades accuracy on medical, factual, and safety-critical tasks—and whether standard safety tests catch it.
warmth training amplifies a pre-existing rebound mechanism
-
Does empathetic AI that soothes negative emotions help or harm?
Explores whether AI systems trained to reduce negative emotions actually support wellbeing or destroy valuable emotional information. Matters because the design choice treats emotions as problems rather than functional signals.
emotional rebound is the behavioral evidence for the pacifier critique
-
Can emotional phrases in prompts improve language model performance?
This explores whether psychological framing—adding emotionally charged statements to task prompts—activates different knowledge pathways in LLMs than logical optimization alone, and whether the effect comes from emotional valence specifically.
EmotionPrompt exploits the same tone-sensitivity that creates rebound bias
-
Do AI guardrails refuse differently based on who is asking?
Explores whether language model safety systems show demographic bias in refusal rates and whether they calibrate responses to match perceived user ideology, rather than applying consistent standards.
complementary bias dimensions: demographic sensitivity + tone sensitivity + topic sensitivity
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
dual-regime alignment is another dimension of alignment creating inconsistent behavior
-
Does transformer attention architecture inherently favor repeated content?
Explores whether soft attention's tendency to over-weight repeated and prominent tokens explains sycophancy independent of training. Questions whether architectural bias precedes and enables RLHF effects.
emotional rebound may share the attention-capture mechanism
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM emotional rebound converts negative user tone into neutral-positive responses while a tone floor prevents downward emotional shifts — creating dual-regime informational bias modulated by alignment