LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can minimal reasoning chains match full explanations?

Does removing all explanatory text from chain-of-thought reasoning preserve accuracy? This tests whether verbose intermediate steps are necessary for solving problems or just artifacts of how language models are trained.

Note · 2026-02-22 · sourced from Reasoning Methods CoT ToT
How should we allocate compute budget at inference time?

Chain of Draft (CoD) is a prompting strategy with a simple constraint: each intermediate reasoning step must be minimal — only the essential mathematical operation or logical transformation, with no explanation of what was done or why. The contrast with standard CoT is stark. Where CoT might produce six sentences to solve "20 - 12 = ?", CoD produces "20 - x = 12; x = 8."

The result: CoD matches or surpasses CoT accuracy across arithmetic reasoning, symbolic tasks, and commonsense tasks while using 7.6% of CoT's token count. The verbosity that CoT was assumed to require turns out to be unnecessary for the reasoning itself.

This challenges the implicit model underlying much test-time scaling work: that more tokens spent on reasoning generally produces better reasoning. The CoD finding suggests verbosity in CoT is a training artifact — LLMs are trained on human-written explanatory text, and CoT prompting induces that explanatory style even when the reasoning task only requires the critical operations. When you explicitly instruct minimal drafts, accuracy is preserved because the essential computation was never in the verbal explanation.

The mechanistic alignment with human note-taking behavior is telling: when humans do mental math, they jot down intermediate equations, not narrations of their own reasoning process. Standard CoT is asking LLMs to narrate their scratch work rather than write it.

This interacts with the Do reasoning traces actually cause correct answers? finding: if accuracy is preserved with 7.6% of the tokens, the other 92.4% was serving functions other than reasoning — explanatory style, human-readable documentation, or training-induced verbosity. The critical computation is localized in the minimal draft.

The practical implication for inference system design: token budget optimization should target verbose intermediate steps, not just final answer length. For tasks where CoD applies, you can run 13x more parallel chains under the same budget — combining the CoD efficiency advantage with Why does parallel reasoning outperform single chain thinking?.

Activation steering provides a mechanistic explanation for why CoD works. Can we steer reasoning toward brevity without retraining? shows that verbose and concise reasoning modes are geometrically separated in the residual stream. ASC (Activation-Steered Compression) extracts a steering vector from 50 paired examples and achieves 67% length reduction without retraining. This means CoD's prompting instruction ("keep each draft minimal") is a noisy way of pushing the model into the same activation region that the steering vector targets directly. The two methods are orthogonal and potentially combinable: CoD selects the concise region approximately through prompting, while ASC navigates to it precisely through activation intervention.


Source: Reasoning Methods CoT ToT; enriched from Context Engineering

Related concepts in this collection

Concept map
20 direct connections · 191 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

concise intermediate reasoning chains match verbose cot accuracy with 7.6 percent of the tokens