FlowReasoner: Reinforcing Query-Level Meta-Agents
This paper proposes a query-level meta-agent named FLOWREASONER to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FLOWREASONER. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FLOWREASONER is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FLOWREASONER.
Large language models (LLMs) [1, 40, 43, 52, 30] have exhibited remarkable power in various meaningful yet challenging domains, like chatbots [35], code [9], math [36], robotics [23], etc. LLM-based multi-agent systems [17, 51, 26], which are characterized by planning, reasoning, tool use, and memory, become the foundation of these LLM-driven applications.1 While effective, most of them are manually designed, increasing human resource costs and limiting scalability.
To mitigate this challenge, early automatic methods are proposed to optimize the prompts [55, 22, 61, 53] or hyper-parameters [41]. But they still rely on the fixed workflow of the multi-agent system, which requires human effort to manually design workflows for each new scenario. From this motivation, various graph-based methods [64, 33, 56, 11] formulate the workflows as graphs or networks and automate the workflow designs. However, the structural complexity of graphs limits their scalability [18]. To overcome this limitation, state-of-the-art methods represent the multi-agent systems as programming codes [18] and prompt a performant LLM, e.g., GPT-4o, as a meta-agent to optimize workflows via complex search algorithms on carefully designed search sets [58, 42, 57].
These previous methods focus on task-level meta-agents, generating merely a single task-specific multi-agent system that applies to one kind of task, e.g., code generation task, as in Figure 1 (a). However, for individual user queries, these one-size-fits-all systems lack the capability for automatic adaptation. To enhance the adaptability of multi-agent systems for individual user queries, this paper aims to design a query-level meta-agent to generate a query-specific multi-agent system for each user query, e.g., build a 2048 game, as shown in Figure 1 (b).
A multi-purpose reward is designed to guide RL training, focusing on performance, complexity, and efficiency. During inference, FLOWREASONER leverages deliberative reasoning to generate a novel query-level multi-agent system for each user query, achieving one system per user query.