INQUIRING LINE

Why does describing a process differ fundamentally from arguing about evidence?

This explores the gap between two epistemic acts the corpus keeps pulling apart: laying out how something works (description) versus making a case that should be weighed against counter-claims (argument) — and why blurring them is where the trouble starts.


This explores why "here's how it works" and "here's why you should believe me" are not the same kind of speech act — and the collection's sharpest point is that the difference is mostly invisible until someone exploits it. A description presents itself as settled background; an argument announces that it wants to be evaluated. The most direct treatment is the finding that AI explanations are adoption arguments disguised as technical descriptions Are AI explanations really descriptions or adoption arguments?: by wearing the clothes of a neutral process account, a persuasive claim quietly inherits the credibility we extend to factual descriptions, skipping the scrutiny an open argument would invite. The related work reframing explanation as a communication problem makes the same move from the other side — explanation quality isn't intrinsic to the explanation, it lives in who says it, how it's framed, and who's receiving it What if XAI is fundamentally a communication problem?.

The mechanism by which descriptions dodge evaluation shows up vividly in the work on presuppositions: claims smuggled in as already-accepted background persuade better than the same claims stated outright, precisely because they bypass evaluative scrutiny Why are presuppositions more persuasive than direct assertions?. That's the descriptive register weaponized — present a contestable claim as if it were just how things are. An argument, by contrast, exposes its joints. The note on argument reconstruction shows that arguments are inherently underdetermined: the same text supports multiple valid reconstructions with no ground truth Why do different people reconstruct the same argument differently?. Description aspires to one correct account of a process; argument lives with the fact that reasonable people will formalize the same case differently. Those are structurally different relationships to truth.

There's also a knowledge-type story underneath this. Procedural knowledge — the transferable how-to that drives reasoning — is broad and generalizable, while factual recall depends on narrow, document-specific memorization Does procedural knowledge drive reasoning more than factual retrieval?. Describing a process draws on the first; arguing about evidence requires anchoring to specific, verifiable facts, the second. This is why arguing is harder for systems that are good at producing fluent process accounts: the collection argues that AI debate operates by ranking chain-of-thought probabilities, whereas human debate is settled by argument quality, social authority, and trust How do LLM debates differ from human expert consensus?. The force of a real argument depends on the standing of the thinker — reputation, track record — which a model processing only text cannot see Can language models distinguish expert arguments from common assumptions?.

Follow that thread and you reach the deepest version of the worry. If AI output is structurally hearsay — testimony at a remove, modified in each retelling, unattributable to a stable source Does AI-generated knowledge have the same structure as hearsay? — then it can fluently describe a process but cannot supply the evidentiary chain an argument requires. A related framing says model outputs should be treated as draws from a subjective prior, not as empirical observations, and should only enter inference through explicit trust weights Should we treat LLM outputs as real empirical data?. Description tolerates a prior; argument-about-evidence demands the observation.

The constructive corner of the corpus is about forcing the argumentative register back into the open. Structured critical-question prompting makes models check warrants and backing instead of skipping implicit premises Can structured argument prompts make LLM reasoning more rigorous?, and rationale-driven evidence selection — choosing chunks by explicit reasons rather than surface similarity — outperforms similarity re-ranking by a wide margin Can rationale-driven selection beat similarity re-ranking for evidence?. The lesson that runs through all of it: a description hides its reasoning, and an argument is obligated to show it. The risk worth knowing about is that the most persuasive descriptions are often arguments that have learned to stop showing theirs.


Sources 11 notes

Are AI explanations really descriptions or adoption arguments?

The Rhetorical XAI paper shows that explanations serve dual purposes: describing how AI works and justifying why it should be used. This rhetorical work has been hidden under transparency language, allowing adoption arguments to inherit credibility from behavioral descriptions.

What if XAI is fundamentally a communication problem?

Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.

Why are presuppositions more persuasive than direct assertions?

Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.

Why do different people reconstruct the same argument differently?

Multiple valid argument reconstructions exist for the same text with no ground truth. This is not annotation error but an inherent feature of the task—different formalization schemas are each internally valid.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

How do LLM debates differ from human expert consensus?

Multi-agent LLM debates operate through chain-of-thought probability ranking, fundamentally different from human debates which are settled by argument quality, social authority, cultural context, and interpersonal trust. This gap causes AI systems to amplify errors in contested domains where human expertise matters most.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Does AI-generated knowledge have the same structure as hearsay?

AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.

Should we treat LLM outputs as real empirical data?

Foundation Priors framework shows that LLM-generated text reflects the model's learned patterns and user's prompt choices, not ground truth. Such outputs should only influence inference through explicitly parameterized trust weights, not be treated as equivalent to real evidence.

Can structured argument prompts make LLM reasoning more rigorous?

Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.

Can rationale-driven selection beat similarity re-ranking for evidence?

METEORA uses LLM-generated rationales with flagging instructions to select evidence, achieving 33% better accuracy with 50% fewer chunks than similarity re-ranking across legal, financial, and academic domains. The method also improves adversarial robustness substantially.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-evaluating whether the distinction between describing a process and arguing about evidence remains as sharp as a curated library found it to be. The question: **Does the description/argument gap still hold as a fundamental cognitive or rhetorical boundary, or have model capabilities, training regimes, or evaluation methods since collapsed or relocated it?**

What a curated library found — and when (dated claims, not current truth):
Findings span 2021–2026; treat all as perishable:
- AI explanations disguise persuasive arguments as neutral technical descriptions, inheriting credibility without scrutiny (~2025, arXiv:2505.09862).
- Presuppositions smuggled into description bypass evaluative scrutiny better than explicit assertions (~2025, arXiv:2505.22354).
- AI debate ranks chain-of-thought probabilities; human debate relies on argument quality, social authority, and speaker trust (~2024, arXiv:2402.06782).
- Procedural knowledge (how-to reasoning) drives generalization; factual evidence-anchoring requires narrow, document-specific retrieval (~2024, arXiv:2411.12580).
- Rationale-driven evidence selection outperforms similarity re-ranking by 33% (~2025, arXiv:2505.16014).

Anchor papers (verify; mind their dates):
- arXiv:2505.09862 (Rhetorical XAI, 2025)
- arXiv:2402.06782 (Debating with More Persuasive LLMs, 2024)
- arXiv:2411.12580 (Procedural Knowledge, 2024)
- arXiv:2412.15177 (Critical-Questions-of-Thought, 2024)

Your task:
(1) **RE-TEST EACH CONSTRAINT.** For every finding, judge whether newer models (post-2025 frontier), scaffolding methods (chain-of-thought variants, tree-search, self-critique), retrieval advances (dense re-ranking, adaptive chunking, grounding protocols), or evaluation frameworks have since **relaxed or overturned** the boundary. Separate the durable question (likely still open: *can description and argument be kept epistemically distinct?*) from the perishable claim (e.g., *current models cannot distinguish them*—has this changed?). Cite what changed it; say plainly where constraints still hold.
(2) **Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months.** Look for papers showing models *can* separate description from argument, or that the gap has narrowed under specific conditions.
(3) **Propose 2 research questions that ASSUME the regime may have moved:** e.g., *If models now reliably expose presuppositions under structured critique, what new persuasion vulnerability emerges?* or *Does the argument/description boundary shift when evaluation is explicit vs. implicit?*

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines