Why do paraphrased definitions work better than expert ones?
When instructing LLMs to classify argument schemes, should we use formal Walton definitions or LLM-generated paraphrases? This explores which source better enables reliable scheme recognition and why.
When the task is to tell an LLM what an argument scheme is so it can recognize one, two strategies are available: paste in the formal Walton definition (the normative source) or generate a description with another LLM (operational paraphrase). Intuition says the formal definition wins — it is the source of truth, written by domain experts. The evaluation shows the opposite. LLM-generated descriptions yield better classification performance than formal definitions.
The mechanism is worth taking seriously because it inverts a common assumption in prompt engineering. Formal definitions are written for readers who already share a technical vocabulary. They presuppose the reader can decode terms like "presumptive inference," "warrant," and "defeasible conclusion." An LLM-generated description rewrites the scheme in the model's native distribution: less precise, more redundant, anchored to examples and paraphrases the model has seen during training. The model understands its own paraphrase better than it understands the original.
This is operationalization-beats-definition as a prompting principle. The same lesson appears in instruction-tuned datasets where rewriting expert instructions in conversational style outperforms preserving the original. The model is not "dumb" for failing on the formal definition; it is reading the definition through a distribution shaped by web text, where formal logical vocabulary is rare. Paraphrasing into the training distribution is the cheap fix.
The deeper implication is that normative sources and operational prompts are different artifacts. A normative source aims for unambiguous truth; an operational prompt aims for reliable behavior. The two optimize different objectives and produce different texts. For task instructions, optimize for the second.
Related concepts in this collection
-
Can large language models classify argument schemes reliably?
Explores whether LLMs can recognize Walton's 60+ argument schemes—abstract patterns of reasoning rather than surface features—and what conditions enable accurate classification.
same paper, the size-and-format dependency that motivates description-based prompting
-
Can structured argument prompts make LLM reasoning more rigorous?
Does requiring language models to explicitly check warrants, backing, and rebuttals—rather than reasoning freely—improve reasoning quality and catch failures that standard step-by-step prompting misses?
another case where operationalizing argument theory into prompt structure beats handing models the theory directly
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
LLM-generated descriptions of argument schemes outperform formal Walton definitions for prompting scheme classification