Do latent sequence vectors outperform per-token latent iterative computation for reasoning?

This explores two rival ways of doing reasoning inside an LLM's latent space — operating on whole-sequence or concept-level vectors versus grinding through step-by-step iterative computation token by token — and which the corpus thinks actually works.

This explores two rival ways of "thinking in latent space": working with compact sequence- or concept-level vectors versus running iterative, per-token computation in the hidden states. The corpus tilts fairly clearly toward the sequence-vector camp — but mostly because the per-token iterative route turns out to be something LLMs only pretend to do.

The sharpest evidence against per-token latent iteration comes from work showing that Do large language models actually perform iterative optimization?. When you ask a model to internally run an optimization or numerical procedure, it doesn't actually loop — it recognizes the problem as template-similar to something seen in training and emits a plausible-looking answer. The failure persists across scale and training approach. That fits a broader pattern in the collection: Does chain-of-thought reasoning reveal genuine inference or pattern matching? argues chain-of-thought reproduces familiar reasoning shapes rather than executing novel inference, and Do large language models reason symbolically or semantically? shows performance collapses when you strip the semantic cues and leave only the rules. The token-level machinery is good at pattern completion, not at faithfully iterating.

The sequence-vector approaches look healthier. Meta's Large Concept Model reasons over sentence embeddings in a language-agnostic space before decoding, and Can reasoning happen at the sentence level instead of tokens? reports that this higher-altitude planning yields more coherent output than flat token-by-token generation. Complementing it, Can latent thought vectors scale language models beyond parameters? shows that learning explicit latent "thought" vectors opens a scaling axis independent of parameter count, improving sample efficiency and few-shot reasoning. Both treat reasoning as something to do over compressed representations, not as a token-serial march.

But the more interesting twist is that the contest may be a false binary. Can reasoning systems scale wider instead of only deeper? argues the real lever isn't depth-of-iteration at all — sampling many parallel latent trajectories matches the benefits of serial reasoning without paying its latency, suggesting width beats depth in latent space. And Do transformers hide reasoning before producing filler tokens? shows models already compute answers in early layers and then overwrite them with format-compliant filler — meaning the "per-token" surface trace can be decorative relative to where the computation actually lives.

So the takeaway you might not have gone looking for: the question of latent vectors vs. per-token iteration is partly answered by the discovery that LLMs don't genuinely iterate in latent space to begin with. They imitate the form of iteration. That reframes the design choice — the win comes from giving the model a representation worth reasoning over (concepts, latent thoughts, parallel paths), not from coaxing it to loop one token at a time.

Sources 7 notes

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Does chain-of-thought reasoning reveal genuine inference or pattern matching?

CoT works by constraining models to reproduce familiar reasoning patterns from training, not by enabling novel symbolic reasoning. Performance degrades predictably under distribution shifts—the signature of imitation rather than capability emergence.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can reasoning happen at the sentence level instead of tokens?

Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Can reasoning systems scale wider instead of only deeper?

GRAM shows that stochastic latent transitions enabling parallel trajectory sampling sidestep the serial latency cost of depth-only scaling. Width matches token-level parallelism benefits: independent paths sample the solution space without variance inflation.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do latent sequence vectors outperform per-token latent iterative computation for reasoning?

Sources 7 notes

Next inquiring lines