LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can diffusion models commit to answers before full decoding?

Do diffusion language models settle on correct answers early in their refinement process, and if so, can we detect and exploit this convergence to speed up inference without losing quality?

Note · 2026-05-03 · sourced from Diffusion LLM

Diffusion LMs are slower than AR models at inference, primarily because of the cost of bidirectional attention and the large number of refinement steps required for high-quality outputs. The standard assumption is that more refinement equals better answers, so cutting refinement budget should cost accuracy. This paper documents a counterintuitive empirical property: early answer convergence. In many cases, the correct answer can be internally identified by half the refinement steps before the final decoding step — for GSM8K, up to 97% of instances; for MMLU, up to 99%. The pattern holds under both semi-autoregressive and random remasking schedules.

This reveals a fundamental redundancy in conventional full-length slow decoding. Most of the latter half of decoding is not improving the answer — it is just maintaining an answer the model already settled on. The right framing is that DLM decoding is a stopping problem: when is it safe to commit and emit the answer rather than continuing to refine?

Prophet operationalizes this insight as a training-free fast decoding paradigm that monitors the confidence gap between the top-2 prediction candidates and dynamically decides whether to continue refinement or "go all-in" — decode all remaining tokens in one step. The confidence gap serves as a reliable signal for when the model has internally committed; once it has, additional refinement is wasted compute. The mechanism integrates seamlessly into existing DLM implementations with negligible overhead and requires no additional training.

Empirically on LLaDA-8B and Dream-7B across multiple tasks, Prophet reduces decoding steps by up to 3.4× while preserving generation quality. The structural lesson generalizes beyond DLMs: any iterative-refinement model with monitorable internal confidence has a stopping problem rather than a fixed budget, and treating refinement steps as a hyperparameter rather than a runtime decision leaves substantial compute on the table — the same diagnosis Does reflection in reasoning models actually correct errors? reaches for AR reasoning.


Source: Diffusion LLM

Related concepts in this collection

Concept map
15 direct connections · 131 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

diffusion language models know the answer well before decoding completes — up to 99 percent of MMLU instances are correctly resolvable at half the refinement budget