Language Understanding and Pragmatics

Why do different people reconstruct the same argument differently?

When humans and LLMs extract logical structure from arguments, they produce different reconstructions. Is this disagreement a problem to solve, or does it reveal something fundamental about how arguments work?

Note · 2026-02-21 · sourced from Argumentation
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

Argunauts (Argument Annotation Units) is a dataset and benchmark for argument reconstruction — extracting explicit logical structures from natural language arguments. The dataset's most significant finding is methodological: when multiple annotators (human and LLM) reconstruct the same argument independently, they produce different but equally valid reconstructions.

This is not annotation disagreement in the sense of noise to be resolved. Multiple reconstruction schemas — different choices about what counts as a premise, how to formalize the conclusion, what implicit assumptions to make explicit — are each internally valid. There is no gold standard because the text underdetermines the reconstruction.

This connects directly to Why do readers interpret the same sentence so differently? but at a structural rather than semantic level. Interpretive multiplicity in NLI is about meaning — what a sentence means depends on the reader's social position. Reconstruction multiplicity in argumentation is about structure — how an argument should be formalized depends on which reconstruction schema is applied.

Both findings converge on a challenge to the NLP assumption that language processing tasks have unique correct outputs. Do standard NLP benchmarks hide LLM ambiguity failures? describes how benchmarks respond to this problem by exclusion. For argumentation, exclusion is not possible — underdetermination is not a feature of edge cases but of the task itself.

The practical implication: evaluating LLMs on argument reconstruction requires acknowledging that precision and recall metrics assume ground truth that does not exist. Models that disagree with a reference annotation may be producing equally valid reconstructions. The field is measuring agreement with one valid interpretation and calling it correctness.

This also grounds Why do speakers deliberately use ambiguous language? from a new angle: structural ambiguity (multiple valid formalizations of the same argument) is as fundamental as semantic ambiguity.


Source: Argumentation

Related concepts in this collection

Concept map
14 direct connections · 143 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

argument reconstruction is fundamentally underdetermined because multiple valid reconstructions exist for the same text with no ground truth