LLM Reasoning and Architecture Reinforcement Learning for LLMs

How should we balance parallel versus sequential compute at test time?

Test-time compute can prioritize breadth (trying many approaches) or depth (refining one approach). Which strategy works better, and does the answer depend on the problem?

Note · 2026-02-20 · sourced from Test Time Compute
How should we allocate compute budget at inference time?

Every approach to test-time compute lands somewhere on the parallel-sequential axis:

The pattern recurs consistently across papers, architectures, and tasks. The trade-off between coverage and depth is not a special feature of any one method — it's a fundamental tension in how to allocate finite compute.

Empirical evidence increasingly favors parallel approaches on general benchmarks (see Why does parallel reasoning outperform single chain thinking?), but the field's intuition still leans sequential because it maps onto human reasoning patterns. The disconnect between what works and what feels right is part of what makes the overthinking findings surprising.

The exponential counter-case: On structured compositional problems where solutions require sequential accumulation of intermediate results (graph connectivity, deep multi-hop chains), sequential CoT is exponentially better than parallel voting. See When does sequential reasoning beat parallel voting?. This resolves the apparent contradiction: parallel wins when independent short attempts can each reach an answer; sequential wins when the problem requires depth that short chains cannot achieve at all. Task structure is the moderating variable.

Training format as an upstream determinant: Does training data format shape reasoning strategy more than domain? shows that multiple-choice training produces BFS-like (parallel-resembling) reasoning; free-form training produces DFS-like (sequential) reasoning. The parallel/sequential trade-off plays out at training time too — format determines which pole a model's default reasoning strategy occupies before any inference-time decisions are made.

Retrieval-level parallel/sequential trade-off: RAG-R1 demonstrates the parallel/sequential dichotomy at the retrieval level. Single-query mode requires sequential multi-turn retrieval rounds; multi-query parallelism issues multiple queries simultaneously, reducing retrieval rounds and improving information diversity. The same structural trade-off — coverage (parallel) vs depth (sequential) — appears in RAG system design, not just reasoning token allocation.

Complexity-theoretic foundation — the Serial Scaling Hypothesis: Can parallel architectures solve fundamentally sequential problems? provides the theoretical grounding: inherently serial problems (mathematical reasoning, physical simulation, planning) cannot be solved by parallel architectures. Transformers and even diffusion models are in TC0 — provably incapable of solving inherently serial problems regardless of compute. This reframes the trade-off: it's not just empirical (which works better) but formal (some problems require serial computation). The parallel-wins finding applies to parallelizable problems; the serial hypothesis identifies problems where parallel is provably insufficient.

Evolutionary inference as a third mode: Mind Evolution introduces population-based search at inference time — neither pure parallel sampling nor sequential refinement, but iterative evolution of diverse candidate populations. See Can evolutionary search beat sampling and revision at inference time?. The island model sustains diversity that single-trajectory refinement loses, while the genetic recombination creates candidates that independent sampling cannot reach. This suggests the parallel/sequential axis may be insufficient — population-based methods occupy a distinct region of the design space.


Source: Test Time Compute, Novel Architectures

Related concepts in this collection

Concept map
24 direct connections · 186 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

parallel vs sequential scaling is the recurring trade-off in test-time compute