INQUIRING LINE

What decomposition level minimizes both error rate and computational cost in practice?

This explores whether there's a 'sweet spot' granularity for breaking a task into pieces — fine enough to keep errors low, coarse enough to not blow up compute — and what the corpus says that level actually is.


This reads the question as a search for the optimal grain size when you split work into steps: too coarse and errors compound, too fine and you pay for orchestration overhead. The corpus is interesting here because it doesn't agree on a single answer — it splits into two camps, and the disagreement is the insight.

The first camp says go to the extreme. MAKER decomposes million-step tasks into the *smallest possible* subtasks, votes on each one, and flags correlated errors — and reaches zero-error execution at a scale that would be hopeless for a monolithic model Can extreme task decomposition enable reliable execution at million-step scale?. The surprising part is the cost story inverts: when decomposition is fine enough, small non-reasoning models suffice, so you're not paying for a giant model at every step. A related finding is that errors aren't just additive — LLMs can't actually execute iterative procedures in latent space, they pattern-match and emit plausible-but-wrong values Do large language models actually perform iterative optimization?, and they plateau around 55–60% constraint satisfaction regardless of scale Do larger language models solve constrained optimization better?. That ceiling is the real argument *for* extreme decomposition: if a model can't reliably do a hard step, the fix isn't a bigger model, it's a smaller step.

The second camp warns that 'more decomposition is always better' isn't true once you measure compute. The Engram work finds a *U-shaped* scaling law — balancing cheap O(1) lookup memory against expensive computation beats maxing out either one Can lookup memory and computation work together better than either alone?. That's the shape your question is really asking about: a minimum sits in the middle, not at an extreme. The same flavor shows up in compute-aware work elsewhere — step-level confidence filtering matches majority-voting accuracy with far fewer generated traces by spending effort only where reasoning is breaking down Does step-level confidence outperform global averaging for trace filtering?, and calibrated uncertainty beats elaborate adaptive-retrieval pipelines at a fraction of the model and retriever calls Can simple uncertainty estimates beat complex adaptive retrieval?. The lesson: the cost-minimizing 'level' isn't uniform granularity, it's *adaptive* granularity — fine where the model is unsure, coarse where it's confident.

Reconciling the two: the most useful frame in the corpus is that the decomposition boundary itself is a separable skill. Splitting the decomposer from the solver outperforms one monolithic model, and notably the *decomposition ability transfers across domains while solving ability doesn't* Does separating planning from execution improve reasoning accuracy?. So 'what level' may be the wrong question — the practical answer is to let a dedicated planner choose the grain per-step rather than fix one level globally.

What you didn't know you wanted to know: the cost curve flips depending on whether you're memory-bound or compute-bound. On mobile hardware, *recomputing* a transformer block twice is cheaper than moving its weights from memory — the bottleneck isn't FLOPs at all Does recomputing weights cost less than moving them on mobile?. So the decomposition level that minimizes cost on a datacenter GPU and the one that minimizes it on a phone can be opposite, because 'computational cost' isn't one quantity.


Sources 8 notes

Can extreme task decomposition enable reliable execution at million-step scale?

MAKER solves million-step tasks with zero errors by decomposing into minimal subtasks, applying voting at each step, and flagging correlated errors. Surprisingly, small non-reasoning models suffice when decomposition is extreme enough, inverting the standard approach to hard problems.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do larger language models solve constrained optimization better?

Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.

Can lookup memory and computation work together better than either alone?

Engram combines O(1) N-gram lookup with Mixture-of-Experts routing, revealing a U-shaped scaling law where balanced allocation to both mechanisms outperforms either alone. Gains appear largest in reasoning and code rather than pure retrieval.

Does step-level confidence outperform global averaging for trace filtering?

Local step-level confidence catches reasoning breakdowns that global averaging masks and enables early stopping before traces complete. This approach achieves comparable accuracy gains to naive majority voting with far fewer generated traces, proving trace quality matters more than quantity.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Does recomputing weights cost less than moving them on mobile?

MobileLLM shows that on memory-bound mobile hardware, sharing weights between adjacent transformer blocks by recomputing one block twice uses less latency than fetching separate weights, gaining accuracy with no parameter increase.

Next inquiring lines