LLM Reasoning and Architecture Agentic and Multi-Agent Systems Reinforcement Learning for LLMs

Can reasoning and tool execution run in parallel?

Standard LLM tool use halts for each response, creating redundant prompts and sequential delays. Do alternative architectures that separate reasoning from tool observation actually eliminate these costs?

Note · 2026-02-22 · sourced from Reasoning Architectures

Standard tool-augmented LLM architectures interleave reasoning and tool calls: the model halts for each tool response, then resumes with the full prior context re-fed into the prompt (because black-box LLM APIs are stateless). This creates two compounding costs — prompt redundancy that grows quadratically with reasoning steps, and sequential inference latency that accumulates tool response delays.

Two architectures converge on the same solution from different angles:

ReWOO (Planner/Worker/Solver): The Planner produces a complete reasoning blueprint — all planned tool calls — before any tool is executed. The Worker executes the plan in batch. The Solver synthesizes plan + evidence into an answer. No tool-response-dependent re-feeding occurs between steps. Token usage drops dramatically because prior context is not re-fed on each API call.

Chain-of-Abstraction (CoA): The LLM generates reasoning chains with abstract placeholders (y1, y2, y3) rather than concrete values. Tools fill in the placeholders in parallel. Crucially: the LLM can start generating the next abstract reasoning chain while the tool fills the current one. Sequential waiting is replaced by pipeline parallelism.

The synthesis: both architectures achieve the same goal — removing the dependency between reasoning steps and tool responses — but through different mechanisms. ReWOO separates by planning horizon; CoA separates by abstracting over content.

This is distinct from the How should we balance parallel versus sequential compute at test time? framing, which concerns token budget allocation. Architectural decoupling reduces both prompt redundancy (cost) and execution latency (speed) regardless of total token budget.

The implication for agentic system design: sequential tool-call loops are an architectural default, not a necessity. Planning-before-execution and abstract-placeholder approaches each demonstrate that reasoning and retrieval/computation can be parallelized, dramatically reducing inference costs in production.


Source: Reasoning Architectures

Related concepts in this collection

Concept map
20 direct connections · 192 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

decoupling reasoning from tool observations eliminates prompt redundancy and enables parallel tool execution