LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can minimal reasoning chains match full explanations?

Does removing all explanatory text from chain-of-thought reasoning preserve accuracy? This tests whether verbose intermediate steps are necessary for solving problems or just artifacts of how language models are trained.

Note · 2026-02-22 · sourced from Reasoning Methods CoT ToT

Chain of Draft (CoD) is a prompting strategy with a simple constraint: each intermediate reasoning step must be minimal — only the essential mathematical operation or logical transformation, with no explanation of what was done or why. The contrast with standard CoT is stark. Where CoT might produce six sentences to solve "20 - 12 = ?", CoD produces "20 - x = 12; x = 8."

The result: CoD matches or surpasses CoT accuracy across arithmetic reasoning, symbolic tasks, and commonsense tasks while using 7.6% of CoT's token count. The verbosity that CoT was assumed to require turns out to be unnecessary for the reasoning itself.

This challenges the implicit model underlying much test-time scaling work: that more tokens spent on reasoning generally produces better reasoning. The CoD finding suggests verbosity in CoT is a training artifact — LLMs are trained on human-written explanatory text, and CoT prompting induces that explanatory style even when the reasoning task only requires the critical operations. When you explicitly instruct minimal drafts, accuracy is preserved because the essential computation was never in the verbal explanation.

The mechanistic alignment with human note-taking behavior is telling: when humans do mental math, they jot down intermediate equations, not narrations of their own reasoning process. Standard CoT is asking LLMs to narrate their scratch work rather than write it.

This interacts with the Do reasoning traces actually cause correct answers? finding: if accuracy is preserved with 7.6% of the tokens, the other 92.4% was serving functions other than reasoning — explanatory style, human-readable documentation, or training-induced verbosity. The critical computation is localized in the minimal draft.

The practical implication for inference system design: token budget optimization should target verbose intermediate steps, not just final answer length. For tasks where CoD applies, you can run 13x more parallel chains under the same budget — combining the CoD efficiency advantage with Why does parallel reasoning outperform single chain thinking?.

Activation steering provides a mechanistic explanation for why CoD works. Can we steer reasoning toward brevity without retraining? shows that verbose and concise reasoning modes are geometrically separated in the residual stream. ASC (Activation-Steered Compression) extracts a steering vector from 50 paired examples and achieves 67% length reduction without retraining. This means CoD's prompting instruction ("keep each draft minimal") is a noisy way of pushing the model into the same activation region that the steering vector targets directly. The two methods are orthogonal and potentially combinable: CoD selects the concise region approximately through prompting, while ASC navigates to it precisely through activation intervention.

Source: Reasoning Methods CoT ToT; enriched from Context Engineering

Related concepts in this collection

Do reasoning traces actually cause correct answers? Explores whether the intermediate 'thinking' tokens in R1-style models genuinely drive reasoning or merely mimic its appearance. Matters because false confidence in invalid traces could mask errors.
CoD isolates what trace content is computationally necessary; the 92.4% of tokens removed are the stylistic layer
Why does parallel reasoning outperform single chain thinking? Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
CoD multiplies the benefit: same budget, more parallel chains, each chain minimal
Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
CoD inverts the overthinking frame: instead of adding tokens until degradation, start minimal and add only when accuracy demands it
Does extended thinking actually improve reasoning or just increase variance? When models think longer, do they reason better, or do they simply sample from a wider distribution of outputs that happens to cover correct answers more often? This matters because it determines whether test-time compute is genuinely scaling reasoning capability.
verbose CoT extends into the variance-inflating range; minimal CoD stays in the efficient range
Can we steer reasoning toward brevity without retraining? This explores whether model reasoning style occupies learnable geometric directions in activation space, and whether we can shift toward concise thinking by steering through that space without expensive retraining.
mechanistic explanation: CoD prompting pushes toward the same activation region that ASC steering vectors target directly; orthogonal and combinable
Can we allocate inference compute based on prompt difficulty? Does adjusting how much compute each prompt receives—rather than using a fixed budget—improve model performance? Could smarter allocation let smaller models compete with larger ones?
CoD amplifies adaptive allocation: when each chain uses 7.6% of standard CoT tokens, the same compute budget supports 13x more parallel chains or can be redistributed to harder prompts that genuinely need more reasoning depth
Why does chain of thought accuracy eventually decline with length? Explores why longer reasoning chains don't always improve answers, and how the optimal length shifts based on task difficulty and model capability.
CoD operationalizes the inverted-U finding: capable models prefer shorter chains because the reasoning signal is concentrated in minimal critical operations, not distributed across verbose explanation; CoD's 7.6% token count matches the prediction that the optimal length for capable models is far shorter than standard CoT
Do reasoning models switch between ideas too frequently? Research explores whether o1-like models abandon promising reasoning paths prematurely by switching to different approaches without sufficient depth, and whether penalizing such transitions could improve accuracy.
CoD addresses underthinking from the format side: minimal per-step drafts enforce depth within each step by eliminating the verbal runway for thought-switching; where TIP penalizes switching tokens at decoding time, CoD prevents the verbose intermediate context that enables switching in the first place
Does gradually tightening token budgets beat fixed budget training? Can models learn reasoning more efficiently by starting with generous token allowances and progressively constraining them, rather than training with fixed budgets from the start? This matters because it addresses how to teach models to think effectively while remaining concise.
CoD validates the compression phase: curriculum training discovers strategies with generous budgets then compresses, and CoD demonstrates that the compressed endpoint (7.6% of tokens) retains full accuracy — confirming that the generous-to-tight curriculum removes filler rather than substance

Concept map

20 direct connections · 191 in 2-hop network ·dense cluster

Can minimal reasoning chains match full explanat… Do reasoning traces actually cause correct answers… Why does parallel reasoning outperform single chai… Does more thinking time always improve reasoning a… Does extended thinking actually improve reasoning … Can we steer reasoning toward brevity without retr… Can we allocate inference compute based on prompt … Why does chain of thought accuracy eventually decl… Do reasoning models switch between ideas too frequ…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

concise intermediate reasoning chains match verbose cot accuracy with 7.6 percent of the tokens