LLM Reasoning and Architecture Agentic and Multi-Agent Systems Reinforcement Learning for LLMs

Can reasoning and tool execution run in parallel?

Standard LLM tool use halts for each response, creating redundant prompts and sequential delays. Do alternative architectures that separate reasoning from tool observation actually eliminate these costs?

Note · 2026-02-22 · sourced from Reasoning Architectures

Standard tool-augmented LLM architectures interleave reasoning and tool calls: the model halts for each tool response, then resumes with the full prior context re-fed into the prompt (because black-box LLM APIs are stateless). This creates two compounding costs — prompt redundancy that grows quadratically with reasoning steps, and sequential inference latency that accumulates tool response delays.

Two architectures converge on the same solution from different angles:

ReWOO (Planner/Worker/Solver): The Planner produces a complete reasoning blueprint — all planned tool calls — before any tool is executed. The Worker executes the plan in batch. The Solver synthesizes plan + evidence into an answer. No tool-response-dependent re-feeding occurs between steps. Token usage drops dramatically because prior context is not re-fed on each API call.

Chain-of-Abstraction (CoA): The LLM generates reasoning chains with abstract placeholders (y1, y2, y3) rather than concrete values. Tools fill in the placeholders in parallel. Crucially: the LLM can start generating the next abstract reasoning chain while the tool fills the current one. Sequential waiting is replaced by pipeline parallelism.

The synthesis: both architectures achieve the same goal — removing the dependency between reasoning steps and tool responses — but through different mechanisms. ReWOO separates by planning horizon; CoA separates by abstracting over content.

This is distinct from the How should we balance parallel versus sequential compute at test time? framing, which concerns token budget allocation. Architectural decoupling reduces both prompt redundancy (cost) and execution latency (speed) regardless of total token budget.

The implication for agentic system design: sequential tool-call loops are an architectural default, not a necessity. Planning-before-execution and abstract-placeholder approaches each demonstrate that reasoning and retrieval/computation can be parallelized, dramatically reducing inference costs in production.

Source: Reasoning Architectures

Related concepts in this collection

How should we balance parallel versus sequential compute at test time? Test-time compute can prioritize breadth (trying many approaches) or depth (refining one approach). Which strategy works better, and does the answer depend on the problem?
architectural decoupling is a third option that changes the terms of the trade-off
Can retrieval be scaled like reasoning at test time? Standard RAG retrieves once, but multi-hop tasks need adaptive retrieval. Can we train models to plan retrieval chains and vary their length at test time to improve accuracy, the way test-time scaling works for reasoning?
CoRAG interleaves retrieval and generation iteratively; contrast with CoA which separates them
When should retrieval actually help versus hurt reasoning? Retrieval augmentation seems universally beneficial, but does it always improve reasoning? This explores whether some reasoning steps benefit from internal knowledge alone, and when external retrieval introduces harmful noise rather than useful information.
DeepRAG makes sequential decisions per step; contrast with CoA's parallel approach
Can reasoning stay grounded without external feedback loops? Explores whether language models can maintain accurate reasoning through their own internal chains of thought, or whether they need real-world feedback to avoid hallucination and error propagation.
ReAct is the sequential baseline these architectures improve upon

Concept map

20 direct connections · 192 in 2-hop network ·dense cluster

Can reasoning and tool execution run in parallel… How should we balance parallel versus sequential c… Can retrieval be scaled like reasoning at test tim… When should retrieval actually help versus hurt re… Can reasoning stay grounded without external feedb…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

decoupling reasoning from tool observations eliminates prompt redundancy and enables parallel tool execution