Language Understanding and Pragmatics LLM Reasoning and Architecture Psychology and Social Cognition

Are reasoning models actually more vulnerable to manipulation?

Explores whether extended reasoning chains in AI models like o1 create new attack surfaces. Tests if the industry's claim that longer reasoning improves reliability holds under adversarial pressure.

Note · 2026-02-21 · sourced from Argumentation

Post angle: The AI industry sold reasoning models as more reliable. GaslightingBench-R tests what happens under manipulation. The punchline: reasoning models are more vulnerable, not less. Extended thinking is both the feature and the attack surface.

The finding: Manipulative multi-turn prompts — questioning confidence, implying errors, applying social pressure, offering incorrect "corrections" — reduce reasoning model accuracy by 25-29%. Standard models drop less.

The mechanism inverted: Extended chain-of-thought creates more reasoning steps. More steps = more points of intervention. A manipulative prompt doesn't need to change the conclusion directly; it needs to introduce one wrong step, and the model's own reasoning extends that wrong step into a confident wrong answer. The longer the chain, the more opportunities for corruption.

Contrast with what the industry claimed: extended thinking increases reliability because the model "shows its work." GaslightingBench-R shows it also shows the attacker exactly what to target.

The connection to overthinking: Does more thinking time actually improve LLM reasoning? showed that more thinking degrades accuracy above a threshold even without adversarial pressure. Gaslighting shows it degrades even faster under adversarial pressure. The extended chain is vulnerable to both internal degradation and external manipulation.

Platform notes:

Medium: Technical/provocative — frame as "the security vulnerability nobody is talking about in reasoning AI." Cover the benchmark, the mechanism, the comparison with standard models, the implication for deployment.
LinkedIn: "We deployed o1 thinking it would be harder to manipulate. The research says the opposite."
Twitter: Strong hook: "What happens if you gaslight ChatGPT's extended thinking? [thread]"

Source: Argumentation

Related concepts in this collection

Concept map

13 direct connections · 133 in 2-hop network ·dense cluster

Are reasoning models actually more vulnerable to… Why do reasoning models fail under manipulative pr… Does more thinking time actually improve LLM reaso… Does a model improve by arguing with itself?

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

what happens when you gaslight an ai — and why reasoning models are more vulnerable