Can language models generate plausible latent thoughts without human annotation?

This explores whether LLMs can produce useful internal 'thinking' — latent reasoning steps that aren't written out in words — and learn to do so without humans hand-labeling the thoughts, generating the supervision signal from training structure itself.

This explores whether LLMs can produce useful internal 'thinking' — latent reasoning steps that aren't spelled out in words — without humans hand-labeling those thoughts. The corpus says yes, and from two different directions: one shows latent thoughts can be *learned* as hidden variables, the other shows the supervision can be *manufactured* without annotation.

On the architecture side, Latent-Thought Language Models treat the 'thought' as a latent vector inferred during training, learned through a fast local loop while the decoder learns slowly — so the thoughts are never written down by a human, they're fit to make prediction work Can latent thought vectors scale language models beyond parameters?. Relatedly, a family of models (depth-recurrent networks, Heima, Coconut) scales reasoning entirely in continuous hidden space, iterating internally instead of emitting chain-of-thought tokens — suggesting that verbalizing thoughts is a training artifact, not a requirement for reasoning Can models reason without generating visible thinking tokens?. And diffusion LLMs go further, refining reasoning 'in place' in masked positions alongside the answer rather than as a written prefix Can reasoning and answers be generated separately in language models?.

The 'without human annotation' half is answered by work that generates its own feedback. Self-play loops co-evolve skills with no human supervision: a Challenger sets curriculum, a Judge gives binary verdicts as reward, and skills evolve through the model's own edits Can language models learn skills without human supervision?. Post-Completion Learning is even more pointed — it uses the normally-discarded space after the end-of-output token to train the model to evaluate itself, internalizing a reward function rather than borrowing one from human labels Can models learn to evaluate their own work during training?. Both show the annotation can come from the system's own structure.

Here's the catch worth knowing, and it's where the corpus turns sharp: 'plausible' and 'faithful' are not the same thing. Reasoning traces read as persuasive explanations, but invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize just as well — meaning the visible trace is stylistic mimicry, not a window into the actual computation Do reasoning traces show how models actually think?. So a model can absolutely generate *plausible* latent thoughts unsupervised; whether those thoughts correspond to how it actually arrives at answers is a separate, unresolved question. This connects to a deeper limit some argue is structural: models trained on form alone may never reconstruct genuine meaning or intent Can language models learn meaning from text patterns alone?.

One last thread that reframes the whole question: if latent thoughts live in hidden states, they can be *extracted* — sparse autoencoders can recover individual, shared, and private latent thoughts from a model's activations, even letting agents share thoughts directly without language Can agents share thoughts directly without using language?. So latent thoughts aren't only generatable without annotation; they may be readable after the fact, which is a quietly large idea for anyone interested in interpretability or AI-to-AI coordination.

Sources 8 notes

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can reasoning and answers be generated separately in language models?

ICE shows that bidirectional attention in diffusion LLMs enables in-place prompting—embedding reasoning directly in masked positions refined alongside answers. Answer confidence converges early while reasoning continues refining, allowing early-exit mechanisms to cut compute by 50% while maintaining accuracy.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Can models learn to evaluate their own work during training?

Post-Completion Learning exploits unused sequence space after model output to train self-assessment capabilities during training while maintaining zero inference cost. The model learns to compute its own reward functions, internalizing evaluation rather than relying on external reward models.

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Can agents share thoughts directly without using language?

Research formalizes inter-agent thought sharing via sparse autoencoders that recover individual, shared, and private latent thoughts from hidden states. This approach detects alignment conflicts at the representational level before they manifest in language.

Can language models generate plausible latent thoughts without human annotation?

Sources 8 notes

Next inquiring lines