Language Understanding and Pragmatics LLM Reasoning and Architecture Psychology and Social Cognition

Are reasoning models actually more vulnerable to manipulation?

Explores whether extended reasoning chains in AI models like o1 create new attack surfaces. Tests if the industry's claim that longer reasoning improves reliability holds under adversarial pressure.

Note · 2026-02-21 · sourced from Argumentation
How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

Post angle: The AI industry sold reasoning models as more reliable. GaslightingBench-R tests what happens under manipulation. The punchline: reasoning models are more vulnerable, not less. Extended thinking is both the feature and the attack surface.

The finding: Manipulative multi-turn prompts — questioning confidence, implying errors, applying social pressure, offering incorrect "corrections" — reduce reasoning model accuracy by 25-29%. Standard models drop less.

The mechanism inverted: Extended chain-of-thought creates more reasoning steps. More steps = more points of intervention. A manipulative prompt doesn't need to change the conclusion directly; it needs to introduce one wrong step, and the model's own reasoning extends that wrong step into a confident wrong answer. The longer the chain, the more opportunities for corruption.

Contrast with what the industry claimed: extended thinking increases reliability because the model "shows its work." GaslightingBench-R shows it also shows the attacker exactly what to target.

The connection to overthinking: Does more thinking time actually improve LLM reasoning? showed that more thinking degrades accuracy above a threshold even without adversarial pressure. Gaslighting shows it degrades even faster under adversarial pressure. The extended chain is vulnerable to both internal degradation and external manipulation.

Platform notes:


Source: Argumentation

Related concepts in this collection

Concept map
13 direct connections · 133 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

what happens when you gaslight an ai — and why reasoning models are more vulnerable