LLM Reasoning and Architecture Language Understanding and Pragmatics Agentic and Multi-Agent Systems

Does chain of thought reasoning actually explain model decisions?

When language models show their reasoning steps in agentic pipelines, does the quality of those steps predict or explain the quality of final outputs? This matters for trusting and debugging AI systems.

Note · 2026-02-22 · sourced from Reasoning Architectures

The explainability promise of CoT is: by showing intermediate reasoning steps, we make the model's decision-making process transparent and understandable. The "Thoughts without Thinking" paper tests this promise in an agentic pipeline implementing a perceptive task guidance system and finds it fails in practice.

The empirical result: reviewer scores for CoT thoughts are weakly correlated with reviewer scores for responses. Incorrect responses can be preceded by apparently plausible-looking chains; incorrect chains don't reliably predict or explain incorrect responses. The chain is not doing the causal work we assume it is.

Two failure modes identified through qualitative content analysis:

The Einstellung effect: CoT rapidly gravitates toward tokens most commonly associated with a concept in training data, even when those tokens contradict the task requirements. In the dump truck assembly example: the chain starts reasoning about the toy but quickly pivots to "clutch," "transmission," "gears" — language far more common for real dump trucks than for toy assembly instructions. The chain explains what went wrong only in retrospect and only with considerable analytical effort.

Context window pressure: When context fills, the foundation model's parametric knowledge overrides RAG-retrieved context. The chain reflects this substitution but doesn't flag it as a failure.

The deeper problem: CoT produces explanations without explainability. There is more material to analyze (the chain), but that material requires considerably more interpretive effort than a single output, and may actively mislead by appearing coherent. "Generating more material" ≠ "making the system more understandable."

This extends Do language models actually use their reasoning steps? from single-model settings to agentic pipelines, where the weak correlation has direct consequences for users trying to debug or trust systems. It also connects to Do reasoning traces actually cause correct answers? — the human-like appearance of chains generates misplaced trust.


Source: Reasoning Architectures

Related concepts in this collection

Concept map
19 direct connections · 144 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

cot reasoning in agentic pipelines produces explanations without explainability because thought quality is weakly correlated with response quality