Reasoning and Learning Architectures

Can reasoning systems scale wider instead of only deeper?

Explores whether sampling multiple parallel latent trajectories offers a faster scaling path than recursive refinement alone. Matters because it could unlock latency-efficient reasoning at test time.

Note · 2026-05-28 · sourced from Reasoning Architectures

Recursive Reasoning Models (RRMs) increase reasoning capability by iterating a shared transition function over a latent state — more iterations means more "thinking" without extending the output sequence. This is depth scaling, and it decouples reasoning depth from both parameter count and output length. GRAM (Generative Recursive reAsoning Models) argues this is only half the story: depth alone is insufficient because a single refinement path can become trapped in a suboptimal trajectory, and many problems have ambiguity or multiple valid solutions that a single converging path cannot represent.

The structural claim is that future recursive reasoners should be not only deep (repeated refinement) but also wide (maintaining and exploring multiple latent trajectories in parallel). GRAM operationalizes width by turning the latent transition stochastic and sampling several trajectories simultaneously. Crucially, width sidesteps the latency penalty that depth-only scaling incurs: sampling N trajectories runs in parallel, whereas adding N refinement steps is serial and accumulates wall-clock time.

This reframes the inference-scaling design space for latent architectures. It mirrors at the latent-state level what parallel-vs-sequential debates established at the token level — since Why does parallel reasoning outperform single chain thinking?, breadth often beats depth under a fixed budget because independent paths sample the solution distribution rather than inflating variance along one path. GRAM brings that lesson inside the recurrent block, where prior work like Can models reason without generating visible thinking tokens? had only scaled depth. The counterpoint to watch: since Can parallel architectures solve inherently sequential problems?, width cannot substitute for depth on inherently serial problems — the two axes are complements, not interchangeable knobs. Why it matters: it gives latent reasoning a second, latency-cheap scaling dimension and explains why deterministic RRMs underperform on multi-solution tasks.


— "Generative Recursive Reasoning", https://arxiv.org/abs/2605.19376

Related concepts in this collection

Concept map
13 direct connections · 125 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

reasoning systems should scale in width by sampling parallel latent trajectories not only in depth