Language Understanding and Pragmatics LLM Reasoning and Architecture

Can critical questions improve how language models reason?

Does structuring prompts around argumentation theory's warrant-checking questions force language models to perform deeper reasoning rather than surface pattern matching? This matters because models might produce correct answers without actually reasoning correctly.

Note · 2026-02-21 · sourced from Argumentation

CQoT (Critical-Questions-of-Thought) adapts Toulmin's argument model into a prompting framework. Standard chain-of-thought prompting asks the model to reason step by step. CQoT additionally requires the model to answer specific critical questions about its own reasoning: What is the warrant connecting evidence to claim? What backing supports the warrant? What potential rebuttals exist? Does the claim need qualification?

These questions are not open-ended reflection requests. They are the specific interrogation targets from argumentation theory — the structural requirements that valid arguments must satisfy. By instantiating them as required prompting steps, CQoT converts implicit argumentative requirements into explicit reasoning constraints.

The improvement over standard CoT is consistent. Forcing warrant-checking catches the specific failure that Can LLMs identify the hidden assumptions that make arguments work? documents: models that correctly identify claim-data structure still fail at the implicit premise. CQoT makes the implicit premise an explicit required output.

The mechanism generalizes beyond argumentation tasks. Can models pass tests while missing the actual grammar? describes the broader problem: correct outputs do not prove structural learning. CQoT forces the structural reasoning into the surface output where it can be evaluated and — critically — where the model must perform it rather than skip it.

This is an instance of the broader principle that structured decomposition of implicit reasoning requirements improves LLM performance on tasks where those requirements would otherwise be skipped. The cognitive science parallel: experts who have internalized decision criteria can execute them fluently; forcing novices to answer structured questions makes explicit what experts do implicitly. CQoT structures the novice reasoning process.

The limitation: CQoT assumes the model can correctly identify what the warrant should be, once it is asked to. For domains where the warranting relationship is itself contested, the structured prompt provides the form of warrant-checking without guaranteeing the content.

Source: Argumentation

Related concepts in this collection

Can LLMs identify the hidden assumptions that make arguments work? LLMs recognize what arguments claim and what evidence they offer, but struggle to identify implicit warrants—the unstated principles that connect evidence to conclusion. This matters because valid reasoning requires understanding these hidden logical bridges.
the failure this targets; CQoT forces warrant identification
Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
surface-vs-structural; CQoT makes structural requirements surface
Do language models actually use their reasoning steps? Chain-of-thought reasoning looks valid on the surface, but does each step genuinely influence the model's final answer, or are the reasoning chains decorative? This matters for trusting AI explanations.
CQoT can improve necessity by making each step serve an explicit argumentative function
Can modular cognitive tools boost LLM reasoning without training? Does structuring reasoning as discrete, sandboxed tool calls elicit stronger problem-solving in language models compared to monolithic prompting approaches, and can this approach match specialized reasoning models?
generalizes the CQoT principle from argumentation-specific warrant checking to domain-general cognitive operations: both use structured decomposition of reasoning requirements, but cognitive tools enforce modular isolation via sandboxed tool calls rather than monolithic prompting

Concept map

17 direct connections · 177 in 2-hop network ·dense cluster

Can critical questions improve how language mode… Can LLMs identify the hidden assumptions that make… Can models pass tests while missing the actual gra… Do language models actually use their reasoning st… Can modular cognitive tools boost LLM reasoning wi…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

applying argumentation scheme critical questions as structured prompts improves llm reasoning by forcing warrant checking