Do language models fail reasoning tests that humans pass?

Standard critiques claim LLMs lack real reasoning ability, but do humans actually perform better on content-independent reasoning tasks? Examining whether the cognitive bar differs for artificial versus human intelligence.

Note · 2026-05-02 · sourced from Linguistics, NLP, NLU

Lampinen et al. relitigate a fifty-year cognitive-science debate using LLM behavior as the new evidence. The classical symbolist line (Marcus, Fodor) defines abstract reasoning as content-independent: "X is bigger than Y" implies "Y is smaller than X" regardless of what X and Y are, and a system whose reasoning depends on the values of X and Y is not really reasoning. By that criterion, current LLMs fail. But the inconvenient parallel evidence Lampinen marshals is that humans fail it too — across Wason, syllogisms, and NLI, human reasoning is heavily content-sensitive in exactly the patterns LMs show.

The conclusion forks. Either the criterion is wrong, or human cognition isn't doing what the symbolist account claims it does. Lampinen leans toward the former: if humans and LMs both succeed and fail along the same content-form axis, the connectionist account where inferences are grounded in learned semantics may describe both better than the symbolist account describes either. This converges with llm semantic grounding is tri-partite — functional grounding is strong social grounding is weak causal grounding is indirect — the grounding picture is more nuanced than "absent or present," and the same nuance applies to human reasoning, just with different mixtures.

For Language as Event, this insight is load-bearing. The standard critique — "LLMs don't really reason, they just match patterns" — collapses into a parallel claim about humans: humans also don't reason in pure logical form; we reason in patterns weighted by semantic content, and we reach correct logical conclusions partly by being lucky that the content supports them. In Saussurean terms: there is no actual reasoner that operates over pure langue. Reasoning always happens in parole — in particular utterances with particular content. The content effects literature is the empirical evidence that langue/parole separation breaks at the cognitive level too, not just at the linguistic level.

The symmetry claim does not absolve LLMs of their distinctive failure modes. It does block one specific framing: "LLMs fail where humans succeed" is not what the data show. The data show: both succeed and fail along the same content-form axis. Where they diverge is elsewhere — in the override capacity, in the handling of novel structure, in the relation to grounded experience — but content-sensitivity itself is shared, and using it as the criterion for distinguishing real reasoning from fake reasoning fails the test on humans.

Source: Linguistics, NLP, NLU Paper: Language models show human-like content effects on reasoning tasks

Related concepts in this collection

Do large language models reason symbolically or semantically? Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
same property described from the LLM side

Concept map

12 direct connections · 121 in 2-hop network ·dense cluster

Do language models fail reasoning tests that hum… Do large language models reason symbolically or se…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

content-independence is the wrong target — the symbolic-versus-connectionist debate dissolves once content effects are recognized as ubiquitous in both humans and LMs