Does logical validity actually drive chain-of-thought gains?
What if invalid reasoning in CoT exemplars still improves performance? Testing whether logical correctness or structural format is the real driver of CoT's effectiveness.
"Invalid Logic, Equivalent Gains" runs a clean experiment: replace valid reasoning in CoT exemplar prompts with completely illogical reasoning, then measure performance on BIG-Bench Hard tasks. The result: logically invalid CoT prompts perform close behind valid CoT and outperform answer-only prompting. The reasoning content of CoT exemplars is not what drives the performance gain.
This is a sharp test because it isolates the contribution of logical validity from everything else CoT provides: output format, step decomposition, intermediate token generation, attention pattern scaffolding. If invalid reasoning still helps, then the benefit comes from these structural properties, not from the reasoning itself.
The finding directly supports Does chain-of-thought reasoning reveal genuine inference or pattern matching?. If the model were learning to reason from exemplars, invalid exemplars would degrade performance substantially. Instead, the model is learning the FORM of step-by-step output — the structure activates latent capabilities without the exemplar content needing to be logically sound.
This also deepens Do language models actually use their reasoning steps?. If the exemplar reasoning doesn't need to be valid for CoT to work, then the model's own generated reasoning may similarly be decorative rather than causal. The exemplar finding makes the faithfulness concern bidirectional: neither the input reasoning (exemplars) nor the output reasoning (generated CoT) need be logically valid for the performance gain to occur.
The practical implication: CoT prompt engineering should focus on structural properties (step count, decomposition format, answer scaffolding) rather than on the logical correctness of the exemplar reasoning. Since Why do chain-of-thought examples fail across different conditions?, the dimensions that matter are structural (complexity, order, style), not logical.
Source: Reasoning Logic Internal Rules
Related concepts in this collection
-
Does chain-of-thought reasoning reveal genuine inference or pattern matching?
Explores whether CoT instructions unlock real reasoning capabilities or simply constrain models to mimic familiar reasoning patterns from training data. This matters for understanding whether language models can actually reason abstractly.
invalid exemplars still working confirms form-over-content thesis
-
Do language models actually use their reasoning steps?
Chain-of-thought reasoning looks valid on the surface, but does each step genuinely influence the model's final answer, or are the reasoning chains decorative? This matters for trusting AI explanations.
bidirectional unfaithfulness: exemplar validity and output validity both decorative
-
Why do chain-of-thought examples fail across different conditions?
Chain-of-thought exemplars show surprising sensitivity to order, complexity level, diversity, and annotator style. Understanding these brittleness dimensions could reveal what makes reasoning prompts robust or fragile.
the dimensions that matter are structural, not logical
-
Do large language models reason symbolically or semantically?
Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
same source batch: if reasoning is semantic not symbolic, logical validity of exemplars is irrelevant
-
Do reasoning traces need to be semantically correct?
Can models learn to solve problems from deliberately corrupted or irrelevant reasoning traces? This challenges assumptions about what makes intermediate tokens useful for learning.
convergent finding from training rather than prompting: invalid exemplars (this note) and corrupted training traces (that note) both preserve performance, confirming that logical content is dispensable and structure/scaffolding is the active ingredient
-
What do models actually learn from chain-of-thought training?
When models train on reasoning demonstrations, do they memorize content details or absorb reasoning structure? Testing with corrupted data reveals which aspects of CoT samples actually drive learning.
the structural explanation for why invalid logic still works: CoT gains come from structural coherence (step decomposition, scaffolding) not content correctness, so logically invalid exemplars provide the same structural benefits
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
logically invalid cot prompts perform nearly as well as valid ones — valid reasoning is not the chief driver of cot gains