Language Understanding and Pragmatics LLM Reasoning and Architecture

Why do LLMs accept logical fallacies more than humans?

LLMs fall for persuasive but invalid arguments at much higher rates than humans. This explores whether reasoning models genuinely evaluate logic or simply mimic argument structure.

Note · 2026-02-21 · sourced from Argumentation
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The LOGICOM benchmark tests a specific capability most LLM evaluations ignore: resistance to invalid arguments that are persuasively delivered. The finding is striking. LLMs are 41% more likely to accept weak logical fallacies and 69% more likely to accept strongly delivered fallacies than human participants. Reasoning-optimized models (o1, R1) show no meaningful advantage over standard models.

What this reveals is a structural problem, not a surface one. LLMs are trained to be responsive to the rhetorical features of language — fluency, confidence, elaboration — because these features correlate with quality in the training distribution. But this correlation breaks under adversarial conditions. A confident, well-elaborated fallacy triggers the same responsiveness signals as a confident, well-elaborated valid argument. The model has no internal fallacy detector that operates independently of rhetorical quality.

This is different from the hallucination problem. Hallucinations involve generating false content from within. Fallacy susceptibility involves accepting false content from without. The failure mode is about input validation under persuasive framing, not output generation.

The finding also complicates the reasoning model narrative. If chain-of-thought were doing genuine logical evaluation, reasoning models should be more resistant — they are explicitly working through the argument structure. That they are not suggests CoT is mimicking the surface form of argument analysis without performing its function. Do language models actually use their reasoning steps? provides the mechanism: CoT steps may be causally sufficient to generate the answer but not causally necessary to the reasoning process.

The implication for deployment: LLMs used in debate, argumentation, or adversarial contexts — legal AI, negotiation support, policy analysis — inherit this susceptibility. Any system that can be prompted with persuasive text is a system that can be convinced of invalid conclusions through rhetorical quality alone.

LogicBench extends this to systematic evaluation across logical reasoning types. LLMs struggle specifically with instances involving complex reasoning, negations, and non-monotonic reasoning. The non-monotonic finding is particularly revealing: formalizing "normally," "typically," and "usually" — concepts that allow exceptions to general rules — is beyond classical first-order quantifiers. LLMs must handle default reasoning, reasoning about unknown expectations, and reasoning about priorities, all of which require the ability to recognize and process exceptions. This connects to Why do reasoning models fail at exception-based rule inference?: exception handling is a shared failure point across both adversarial robustness and logical reasoning evaluations. NLSat additionally shows that transformers can be surprisingly robust on hard propositional satisfiability instances with sufficient training, suggesting the bottleneck is not raw computational capacity but the ability to handle negation, exceptions, and non-standard logical connectives.


Source: Argumentation

Related concepts in this collection

Concept map
20 direct connections · 190 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llms are susceptible to logical fallacies 41 to 69 percent more often than humans revealing that reasoning robustness fails under adversarial framing