LLM Reasoning and Architecture Agentic and Multi-Agent Systems

Does separating planning from execution improve reasoning accuracy?

Explores whether modularizing decomposition and solution into separate models prevents interference and boosts performance compared to monolithic approaches.

Note · 2026-02-22 · sourced from Reasoning Architectures

When a single monolithic LLM is asked to decompose a problem and solve it, the decomposer doesn't track the solver's capabilities — it generates subproblems without knowing whether the solver can handle them. LM2 addresses this coordination failure by modularizing decomposition, solution, and verification into three separate language models.

The architecture:

The key finding: fine-tuning a separate decomposer LM to coordinate with a larger solver LM outperforms simply prompting a single monolithic LM to decompose and solve. Distilling decomposition abilities from a larger LM to a smaller specialized LM is more generalizable than prompting the monolithic system. The solver is freed to focus on execution; the decomposer is freed to focus on planning.

The generalizability advantage: Monolithic LLM approaches heavily rely on the proprietary LLM being used and fail absolutely when employed with less powerful models. Fine-tuned modular approaches, though cost-effective, maintain generalizability because the decomposition module learns a more abstract planning skill not tied to a specific domain.

The Divide-or-Conquer distillation paper provides direct evidence for this asymmetry: when decomposition and solution abilities are distilled from GPT-4 into smaller models, decomposition ability transfers across domains while solving ability does not. This confirms that planning/decomposition is a more generalizable skill than execution — distilling the ability to break problems down is more portable than distilling the ability to solve specific sub-problems. The decomposer-solver separation isn't just an architectural convenience; it reflects a genuine difference in the transferability of the two cognitive operations.

This is the single-query reasoning instantiation of the same principle that Do hierarchical retrieval architectures outperform flat ones on complex queries? documents at the multi-hop research level. The separation of concerns produces accuracy gains regardless of whether the task is a single complex question or a multi-step research task.

The connection to Can reasoning and tool execution run in parallel? is also structural: both ReWOO and LM2 achieve gains by preventing one cognitive operation from contaminating another. ReWOO decouples planning from tool execution; LM2 decouples planning from solution execution.

Planner-Caller-Summarizer decomposition for tool use (from Arxiv/Agents Multi): The "Small LLMs Are Weak Tool Learners" paper extends the decomposer-solver principle to tool-use tasks, demonstrating that modular decomposition into planner, caller, and summarizer enables smaller LLMs to match larger monolithic models. The key insight: each component draws on different LLM facets — planning requires reasoning ability, tool invocation demands accurate request writing, and result summarization requires conclusion-drawing skills. A two-stage training paradigm first finetunes a backbone on the entire dataset for comprehensive understanding, then instantiates and continually finetunes each specialized module on respective sub-tasks. This confirms the generalizability finding: decomposition ability is more transferable than execution ability, and the modular framework facilitates individual component updates — the planner can be upgraded independently of the caller.


Source: Reasoning Architectures; enriched from Training Fine Tuning, Agents Multi

Related concepts in this collection

Concept map
17 direct connections · 185 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

separating decomposer from solver in multi-step reasoning prevents planning-execution interference and improves accuracy