ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
“There is a trending paradigm[1; 2; 3; 4; 5; 6; 7; 8] to couple large language models (LLMs) with external plugins or tools, enabling LLMs to interact with environment [9; 10] and retrieve up-to-date knowledge. The tool-augmented LLMs, often referred to as augmented language models (ALMs) have fueled several prevailing applications like Auto-GPT [11] for autonomous task executions Existing efforts on ALMs have been widely grounded in the prompting paradigm similar to ReAct [1] which interleaves verbal reasoning and tool-calling consecutively
Such paradigm, however, introduces frequent execution and suspension of LLMs, together with potentially huge cost in terms of token consumption. LLMs generate tokens conditioned on the former context. When interacting with external tools, an LLM has to be halted for tool response. Moreover the APIs of black-box LLMs, such as ChatGPT, are stateless. To resume the token generation all the historical tokens (including context prompt, exemplars, all previous reasoning traces an observations) are fed into the LLM, leading to significant prompt redundancy. The commercial LL service provided by OpenAI charges in terms of token consumption. Thereby, prompt redundancy brings substantial expense to average users2. However, to the best of our knowledge, there is no prio work exploring to reduce the token consumption of ALMs.
This paper proposes ReWOO , a novel prompting paradigm for ALMs. As illustrated in Figure 1, ReWOO compartmentalizes the key components of an ALM: step-wise reasoning, tool-calls, and summarization, into three separate modules: Planner, Worker, and Solver. Planner breaks down a task and formulates a blueprint of interdependent plans, each of which is allocated to Worker. Worker retrieves external knowledge from tools to provide evidence. Solver synthesizes all the plans and evidence to generate the ultimate answer to the initial task. As shown in Figure 2, ReWOO separates the reasoning process of LLMs from external tools, avoiding the redundancy of interleaved prompts in observation-dependent reasoning, thereby significantly reducing token usage and enhancing prompting efficiency.”