Can AI systems design unique multi-agent workflows per individual query?
Explores whether meta-agents trained with reinforcement learning can automatically generate personalized multi-agent system architectures tailored to individual user queries, rather than applying fixed task-level templates uniformly.
Previous approaches to automating multi-agent system design operate at the task level: design one workflow for "code generation tasks," another for "summarization tasks," and apply each uniformly to all queries of that type. FlowReasoner (2025) shifts this to the query level — generating a unique multi-agent system for each individual user query.
The architecture has two phases. First, distill from DeepSeek R1 to give the meta-agent basic reasoning about how to design multi-agent workflows. Then enhance via RL with external execution feedback — the meta-agent generates a multi-agent system, that system runs on the query, and the execution result provides reward signal. A multi-purpose reward guides training across three dimensions: performance (did it work), complexity (how many agents and steps), and efficiency (how much compute).
This matters because one-size-fits-all multi-agent systems lack the capability for automatic adaptation to individual queries. A code generation task where the user wants "build a 2048 game" needs a fundamentally different agent composition than "fix a sorting bug." The query-level approach treats multi-agent architecture design as itself a reasoning problem amenable to RL.
The progression is notable: manual design → fixed template optimization → graph-based workflow search → code-based meta-agents → RL-trained query-level meta-agents. Each step automates one more degree of freedom. The connection to Can we automatically optimize both prompts and agent coordination? is direct: FlowReasoner represents multi-agent systems as code and optimizes them, but at the individual query level rather than the task level.
Since Can computational power accelerate scientific discovery itself?, the RL-trained meta-agent approach may follow similar scaling dynamics — more compute for the meta-agent should yield better per-query system designs.
MasRouter as structured alternative (from Arxiv/Routers): MasRouter provides a more constrained approach to per-query MAS design than FlowReasoner. Where FlowReasoner generates arbitrary multi-agent systems via RL-trained code generation (maximum flexibility, less interpretability), MasRouter uses a cascaded controller: variational latent variable model for topology selection → structured probabilistic cascade for role allocation → multinomial distribution for LLM routing. The cascade provides interpretable intermediate decisions at the cost of a fixed structure-type vocabulary (Chain/Tree/Graph topologies, predefined role categories). FlowReasoner trades interpretability for expressiveness; MasRouter trades expressiveness for interpretability and likely faster convergence. Both achieve per-query optimization. See What decisions must multi-agent routing systems optimize simultaneously?.
Source: Agents Multi
Related concepts in this collection
-
Can we automatically optimize both prompts and agent coordination?
This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
the graph formalism this extends to query-level
-
Can computational power accelerate scientific discovery itself?
Does the pace of research breakthroughs scale with computing resources, like model performance does? ASI-ARCH tested this by running thousands of autonomous experiments to discover neural architectures.
scaling dynamics for architecture search
-
Can multi-agent teams automatically remove their weakest members?
Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.
inference-time team optimization; FlowReasoner does this at design time
-
Can extreme task decomposition enable reliable execution at million-step scale?
Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
alternative approach: fixed decomposition vs adaptive design
-
What decisions must multi-agent routing systems optimize simultaneously?
Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?
MasRouter: more constrained per-query design with interpretable cascade
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
query-level meta-agents generate personalized multi-agent systems per user query via RL with execution feedback