Do models cache intentions about response topics before generating the first token?

This explores whether a model has already 'decided' where its answer is heading — its topic or intent — inside its hidden activations before the first visible token appears, rather than figuring it out word-by-word as it writes.

This explores whether a model has already 'decided' where its answer is heading before the first visible token appears. The corpus doesn't have a paper that literally measures 'cached intent,' but several notes circle the same territory from different angles, and together they suggest the answer is a qualified yes — with an important caveat about what kind of 'decision' it really is.

The strongest evidence that something is computed ahead of output comes from work showing models do real reasoning in their early layers and only later convert it to surface tokens. Logit-lens analysis finds that models trained with hidden chain-of-thought compute the correct answer in layers 1–3, then actively suppress it to emit format-compliant filler — the reasoning is fully present internally before any meaningful token is produced Do transformers hide reasoning before producing filler tokens?. In the same spirit, latent-reasoning architectures scale 'thinking' entirely through hidden-state iteration without ever verbalizing it, implying that verbalization is a training artifact layered on top of computation that already happened Can models reason without generating visible thinking tokens?. Diffusion LLMs make the timing visible directly: answer confidence converges early while the surrounding reasoning is still being refined, which is close to a literal demonstration of 'destination locked before the work is shown' Can reasoning and answers be generated separately in language models?.

But here's the twist that reframes the whole question. A 'cached intention' implies a single committed plan, and another line of work says that's not what's sitting in the hidden state. Shanahan's 20-questions regeneration test shows models hold a *superposition* of possible characters or answers and sample from that distribution at generation time — regenerate the same prompt and you get different, each-internally-consistent outputs, proving no fixed commitment exists Do large language models actually commit to a single character?. So what's pre-loaded may be less a chosen topic than a probability landscape over topics, collapsed into one path only as tokens are sampled.

This fits the deeper picture of what transformer hidden state even is. The residual stream transmits knowledge as continuous *flow*, not retrievable *storage* — knowledge exists in the performance, not in an archive you could call up and inspect Do transformer models store knowledge or generate it continuously?. Generation itself is a smooth probabilistic drift toward the training distribution rather than an exploration of alternatives Does LLM generation explore competing claims while producing text?. That framing makes 'caching' the wrong metaphor: there isn't a stored intent so much as a directional momentum that the first tokens reveal rather than create.

Worth knowing if you want to go further: the pivot points where that momentum actually gets set appear to be sparse. A small minority of high-entropy 'forking' tokens carry most of the steering signal Do high-entropy tokens drive reasoning model improvements?, and specific reflection tokens like 'Wait' and 'Therefore' spike in mutual information with the correct answer Do reflection tokens carry more information about correct answers?. So the model's 'intent' may be less a thing fixed before token one and more a thing repeatedly re-committed at a handful of decisive moments along the way.

Sources 8 notes

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Do transformer models store knowledge or generate it continuously?

Transformers organize knowledge as flowing activations rather than retrievable archives, mirroring oral cultures where knowledge exists only in performance. This explains why model knowledge is contextual, difficult to edit, and inseparable from generation.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Do reflection tokens carry more information about correct answers?

Specific tokens like "Wait" and "Therefore" show sharp spikes in mutual information with correct answers. Suppressing them harms reasoning while suppressing equal random tokens does not, and representation recycling improves accuracy 20%.

Do models cache intentions about response topics before generating the first token?

Sources 8 notes

Next inquiring lines