Can models learn argument quality from labeled examples alone?

Explores whether fine-tuning on quality-labeled examples teaches models the underlying criteria for evaluating arguments, or merely surface patterns. Matters because high-stakes assessment tasks depend on reliable, transferable quality judgment.

Note · 2026-02-21 · sourced from Argumentation

Argument Quality Assessment research trains models to evaluate the quality of arguments — are they logically valid? Well-supported? Relevant? Clear? The standard approach is supervised fine-tuning: label examples as high/low quality, train on them, evaluate transfer.

The finding: fine-tuning on quality-labeled examples does not reliably teach the models what makes arguments good. Models learn to pattern-match against the labeled examples but do not acquire the underlying criteria that would generalize to new argument types. When explicit theoretical frameworks (RATIO: Relevance, Acceptability, Sufficiency; QOAM: Quality of Argumentation Model) are provided as structured instruction, performance improves significantly.

Theory injection works where pattern learning fails.

This is a specific instance of Can models pass tests while missing the actual grammar?: models that score highly on quality assessments in the training distribution fail to transfer the criteria to out-of-distribution argument types. The learned pattern is "this looks like high-quality arguments in the training data" rather than "this argument satisfies the following criteria for quality."

The implication extends beyond argumentation. Whenever an evaluation task requires applying principled criteria that are not explicit in the labeled data — quality, fairness, coherence, persuasiveness — fine-tuning on examples risks teaching the distribution rather than the criteria. Why do different people reconstruct the same argument differently? points at the same problem from the other direction: if there's no gold standard, labeled examples cannot straightforwardly encode the right criteria.

The practical consequence: assessment tasks in high-stakes domains (argument quality in legal reasoning, argument validity in policy analysis) should not rely on fine-tuned models trained only on labeled examples. Explicit criteria instruction — prompting with theoretical frameworks, structured evaluation rubrics — is required.

Source: Argumentation

Related concepts in this collection

Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
same pattern: training distribution ≠ underlying criteria
Why do different people reconstruct the same argument differently? When humans and LLMs extract logical structure from arguments, they produce different reconstructions. Is this disagreement a problem to solve, or does it reveal something fundamental about how arguments work?
no gold standard means labeled examples may encode arbitrary choices
Can critical questions improve how language models reason? Does structuring prompts around argumentation theory's warrant-checking questions force language models to perform deeper reasoning rather than surface pattern matching? This matters because models might produce correct answers without actually reasoning correctly.
explicit theory injection (CQoT) works for the same reason: making implicit criteria explicit
What makes explanations work in real conversation? Does explanation quality depend on how dialogue partners interact—testing understanding, adjusting based on feedback, and coordinating their communicative moves—rather than just information content alone?
parallel decomposition: argument quality requires framework instruction (RATIO, QOAM) and explanation quality requires tracking three interacting dimensions; both reject unitary quality measures in favor of multi-dimensional criteria that models cannot learn from examples alone

Concept map

17 direct connections · 174 in 2-hop network ·dense cluster

Can models learn argument quality from labeled e… Can models pass tests while missing the actual gra… Why do different people reconstruct the same argum… Can critical questions improve how language models… What makes explanations work in real conversation?

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

argument quality assessment requires explicit theoretical framework instruction because quality criteria cannot be learned from examples alone