LLM Reasoning and Architecture Reinforcement Learning for LLMs Design & LLM Interaction

Do chain of thought traces actually help humans understand reasoning?

When models show their work through chain of thought traces, do humans find them interpretable? Research tested whether the traces that improve model performance also improve human understanding.

Note · 2026-02-22 · sourced from Reasoning Critiques
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

A common assumption behind CoT traces: they serve as explanations. The model shows its work, users can follow the reasoning, trust is established. This assumption turns out to be wrong in a specific and quantifiable way.

Empirical findings from a 100-participant human-subject study:

The traces that are most useful for the model to generate correct answers are least useful for humans trying to understand those answers. The two objectives pull in opposite directions.

The mechanism: CoT traces used for SFT are optimized to be a training signal — to push the model toward correct token sequences through backpropagation. The properties that make a trace useful for training (complex recursive structure, non-linear exploration, self-doubt and revision cycles) are exactly the properties that make it cognitively opaque to humans.

This has a design implication that some systems are already acting on: GPT-OSS models generate a CoT trace (for model performance), a summary (for human communication), and a final answer. The trace is not shown to users. This separation acknowledges the decoupling.

The implication for AI transparency: showing users CoT traces is not showing them how the model reasons. It is showing them the model's training scaffold. What users need is a summary; what models need is the trace. Conflating the two in the name of "explainability" produces outputs that feel transparent without providing genuine interpretability.

This is a distinct claim from Do reasoning traces actually cause correct answers? — that note warns against inferring intentional reasoning from traces. This note adds: even if you don't anthropomorphize, the traces are the wrong artifact for human interpretability. Both wrong in different ways.


Source: Reasoning Critiques

Related concepts in this collection

Concept map
16 direct connections · 131 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

cot traces optimize model performance, not user interpretability — the two objectives are decoupled