Can LLMs identify the hidden assumptions that make arguments work?
LLMs recognize what arguments claim and what evidence they offer, but struggle to identify implicit warrants—the unstated principles that connect evidence to conclusion. This matters because valid reasoning requires understanding these hidden logical bridges.
Toulmin's argument model distinguishes claim, data, and warrant. The claim is what is being argued. The data is the evidence. The warrant is the often-unstated principle connecting data to claim — the implicit assumption that makes the inference valid.
In natural language, warrants are almost never stated. When someone argues "This policy failed in Europe, so it will fail here," the unstated warrant is something like "contexts similar to Europe will produce similar outcomes." Evaluating the argument requires identifying this warrant and assessing its validity in context.
The Argument Reasoning Comprehension task tests this capability directly. LLMs perform well on identifying the explicit claim-data structure — recognizing what is being argued and what evidence is offered. They fail significantly at supplying or evaluating the implicit warrant. The gap between structural recognition and warrant identification is large.
This is a different failure than Why does ChatGPT fail at implicit discourse relations?. That finding concerns discourse relations (because, therefore, although). This finding concerns argumentative inference — the background knowledge required to evaluate whether data actually supports a claim. Both are implicit-structure failures, but at different levels.
The failure is not simply about world knowledge being absent. Do language models actually use their encoded knowledge? suggests relevant knowledge may be encoded but not accessed when needed. Warrant identification requires activating world knowledge in response to argumentative context — a different retrieval trigger than direct factual recall.
Practically: LLMs can generate the surface form of argumentation (claim, evidence, conclusion) without the inferential work that makes the argumentation valid. They can look like they are reasoning about arguments without engaging with the warrants that determine whether the arguments hold.
Source: Argumentation
Related concepts in this collection
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
same implicit-structure failure at discourse level; this is the argumentative-inference level
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
knowledge encoded but not causally active in warrant retrieval
-
Can large language models translate natural language to logic faithfully?
This explores whether LLMs can convert natural language statements into formal logical representations without losing meaning. It matters because faithful translation is essential for any AI system that reasons formally or verifies specifications.
related failure: surface form of logic without semantic content
-
Can critical questions improve how language models reason?
Does structuring prompts around argumentation theory's warrant-checking questions force language models to perform deeper reasoning rather than surface pattern matching? This matters because models might produce correct answers without actually reasoning correctly.
the intervention that targets this gap
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
implicit warrants in argumentation require world knowledge that llms cannot reliably supply even when surface argument structure is correctly identified