Do language models need words to think or just latent structure?

This explores whether the visible words an LLM generates while 'thinking' are actually doing the reasoning, or whether the real work happens in hidden internal states — and what that says about how much language a model needs to think at all.

This explores whether the visible words an LLM generates while 'thinking' are actually doing the reasoning, or whether the real work happens in hidden internal states. The corpus leans, surprisingly hard, toward the latter: much of what looks like 'thinking out loud' may be ceremony layered on top of computation that already happened silently. The most direct evidence is that models can reason without producing any visible thinking tokens at all. Depth-recurrent architectures, Coconut, and Heima scale test-time reasoning by iterating on hidden states rather than emitting words, which suggests verbalization is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?. A related line of work treats latent 'thought vectors' as a scaling dimension of their own — you can make a model reason better by growing its latent space, independent of its parameter count Can latent thought vectors scale language models beyond parameters?.

Sources 7 notes

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Can language models actually analyze language structure?

OpenAI's o1 model successfully constructs syntactic trees and phonological generalizations through explicit step-by-step reasoning, revealing that LLM linguistic capability extends far beyond behavioral language tasks to genuine language analysis.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Do language models need words to think or just latent structure?

Sources 7 notes

Next inquiring lines