Agentic and Multi-Agent Systems

Can extreme task decomposition enable reliable execution at million-step scale?

Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.

Note · 2026-02-23 · sourced from Novel Architectures

A system with a 1% per-step error rate is expected to fail after 100 steps of a million-step task. This makes traditional approaches to long-horizon tasks fundamentally infeasible — improving model accuracy from 99% to 99.99% is insufficient for tasks requiring thousands of dependent steps. MAKER (Massively Decomposed Agentic Processes) takes a different approach: instead of improving per-step accuracy, decompose until each step is trivially reliable, then apply error correction.

Three core components:

  1. Decomposition into minimal subtasks: Each agent handles a single, tiny "micro-role" rather than anthropomorphized human-level roles. By avoiding complex role assignments and instead exploiting the machine-like nature of LLMs, each subtask becomes solvable with high reliability.
  2. Error correction via subtask-level voting: Multiple agents independently solve the same subtask; voting identifies the correct answer. This is error correction at the finest possible granularity.
  3. Red-flagging to reduce correlated errors: Detects situations where voting might fail because errors are correlated across agents, and applies additional verification.

The scaling laws are formalized: probability of success and expected cost change predictably with total steps and decomposition level. Under extreme decomposition, effective scaling is feasible; without it, infeasible.

The most counterintuitive finding: state-of-the-art reasoning models are not required. Relatively small non-reasoning models suffice when the decomposition is extreme enough. This inverts the standard approach to hard problems — instead of smarter models, use dumber models on smaller problems.

This extends Does separating planning from execution improve reasoning accuracy? to an extreme: not just separating two functions, but decomposing the entire task into maximally atomic units. It also extends Why does majority voting outperform more complex inference methods? from answer-level voting to subtask-level voting with formalized scaling properties.

The implication for AI deployment: for tasks requiring very high reliability over many steps (organizational processes, scientific experiments, production pipelines), the path may run through decomposition and redundancy rather than through better models.


Source: Novel Architectures

Related concepts in this collection

Concept map
17 direct connections · 129 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

extreme task decomposition into microagents with voting enables error-free execution at million-step scale