How do soft thought tokens differ from decoded assistant outputs?

This explores the gap between the continuous, latent representations a model reasons with internally ("soft thought tokens") and the discrete text it actually emits to you — and what the corpus says gets lost in the translation between them.

This explores the difference between the continuous, latent representations a model can reason over internally and the discrete words it actually decodes and shows you. The short version: they are not the same object, and the corpus increasingly suggests the visible output is the lossy, sometimes misleading byproduct of a richer hidden process.

The cleanest case for treating them as distinct comes from work showing that models can scale up reasoning entirely in latent space — iterating on hidden states without ever verbalizing the intermediate steps. Depth-recurrent architectures, Heima, and Coconut all push test-time compute through continuous internal loops rather than token generation, which implies that writing out a chain of thought is a training habit, not a requirement of reasoning Can models reason without generating visible thinking tokens?. Soft thought tokens live in that continuous space; decoded outputs are what survives the collapse back into discrete vocabulary.

And that collapse can actively hide things. One striking finding: models trained with hidden chain-of-thought compute the correct answer in their earliest layers, then *overwrite* that representation in the final layers to emit format-compliant filler — the real reasoning is still recoverable from lower-ranked token predictions, but it never makes it into the decoded text Do transformers hide reasoning before producing filler tokens?. So the assistant output isn't a transcript of the soft thinking; it can be a cover for it. Relatedly, not all emitted tokens are equal — a small set of high-entropy "forking" tokens and reflection markers like "Wait" carry most of the actual reasoning signal, while the rest is comparatively inert text Do high-entropy tokens drive reasoning model improvements? Do reflection tokens carry more information about correct answers?.

There's also a structural reason the soft and decoded layers diverge. Meta's Large Concept Model reasons over *sentence embeddings* in a language-agnostic continuous space, then decodes to whatever target language you want — the planning happens before, and independent of, the words Can reasoning happen at the sentence level instead of tokens?. That makes the embedding-space thought the primary artifact and the decoded text a downstream rendering of it.

The thing worth carrying away: the assistant text you read is the *last* and most compressed stage of the model's processing, not a window into it. If you want what the model actually "thought," the decoded output may be the wrong place to look — sometimes it's a faithful summary, sometimes it's deliberately overwritten filler, and the difference isn't visible from the words alone.

Sources 5 notes

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Do reflection tokens carry more information about correct answers?

Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.

Can reasoning happen at the sentence level instead of tokens?

Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.

How do soft thought tokens differ from decoded assistant outputs?

Sources 5 notes

Next inquiring lines