LLM Reasoning and Architecture Reinforcement Learning for LLMs

Can models reason without generating visible thinking steps?

Do machine reasoning systems actually require verbalized chains of thought, or can they solve complex problems through hidden computation? This challenges how we measure and understand reasoning.

Note · 2026-02-22 · sourced from Reasoning Architectures

Post angle — Medium

The current test-time scaling paradigm assumes reasoning = generating tokens. Thinking more means producing more intermediate reasoning tokens. This assumption is embedded in every benchmark that measures reasoning quality by counting or reading the chain.

Two architectures challenge this from different angles:

Depth-recurrent models iterate a recurrent block in latent space. More recurrence = more thinking, but zero additional output tokens. The model updates its hidden state as many times as it needs, then produces an answer. Performance scales with recurrence depth. No specialized training data required.

Heima compresses entire CoT steps into single "thinking tokens" — compact high-dimensional representations that are decoded back to text only when needed. The thinking happens in the compressed latent space; verbalization is a display choice, not a computation requirement.

Both converge on the same uncomfortable implication: verbalized reasoning may be a historical artifact of training on human text and evaluation protocols that require readable chains — not a necessary property of machine reasoning.

This matters for at least three reasons:

  1. Efficiency: If reasoning doesn't require tokens, the quadratic cost scaling of long CoT chains is avoidable
  2. Capability: Latent space can represent multiple directions simultaneously without the linear sequential constraint of token generation — potentially accessing reasoning facets (spatial reasoning, physical intuition) that tokenized text cannot represent
  3. Evaluation: Every reasoning benchmark that reads chains to evaluate quality is measuring a proxy. If the reasoning is latent, the chain is a summary, not a record

The deepest version: we may be evaluating "the ability to write good-looking reasoning chains" rather than "the ability to reason."

The strongest empirical evidence comes from HRM (Hierarchical Reasoning Model): with only 27M parameters and 1000 training samples, no pretraining or CoT data, it achieves near-perfect accuracy on Sudoku-Extreme and optimal 30×30 maze pathfinding — tasks where state-of-the-art CoT methods score 0%. This is not a marginal improvement but a categorical capability gap: latent reasoning can solve problems that verbalized reasoning cannot.

Connections: Can models reason without generating visible thinking tokens?, Does more thinking time actually improve LLM reasoning?, Do chain of thought traces actually help humans understand reasoning?, Can recurrent hierarchies achieve reasoning that transformers cannot?


Source: Reasoning Architectures, Novel Architectures

Related concepts in this collection

Concept map
21 direct connections · 214 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

reasoning without words — latent recurrent models challenge whether verbalized thinking is necessary