Can AI systems design unique multi-agent workflows per individual query?

Explores whether meta-agents trained with reinforcement learning can automatically generate personalized multi-agent system architectures tailored to individual user queries, rather than applying fixed task-level templates uniformly.

Note · 2026-02-23 · sourced from Agents Multi

Previous approaches to automating multi-agent system design operate at the task level: design one workflow for "code generation tasks," another for "summarization tasks," and apply each uniformly to all queries of that type. FlowReasoner (2025) shifts this to the query level — generating a unique multi-agent system for each individual user query.

The architecture has two phases. First, distill from DeepSeek R1 to give the meta-agent basic reasoning about how to design multi-agent workflows. Then enhance via RL with external execution feedback — the meta-agent generates a multi-agent system, that system runs on the query, and the execution result provides reward signal. A multi-purpose reward guides training across three dimensions: performance (did it work), complexity (how many agents and steps), and efficiency (how much compute).

This matters because one-size-fits-all multi-agent systems lack the capability for automatic adaptation to individual queries. A code generation task where the user wants "build a 2048 game" needs a fundamentally different agent composition than "fix a sorting bug." The query-level approach treats multi-agent architecture design as itself a reasoning problem amenable to RL.

The progression is notable: manual design → fixed template optimization → graph-based workflow search → code-based meta-agents → RL-trained query-level meta-agents. Each step automates one more degree of freedom. The connection to Can we automatically optimize both prompts and agent coordination? is direct: FlowReasoner represents multi-agent systems as code and optimizes them, but at the individual query level rather than the task level.

Since Can computational power accelerate scientific discovery itself?, the RL-trained meta-agent approach may follow similar scaling dynamics — more compute for the meta-agent should yield better per-query system designs.

MasRouter as structured alternative (from Arxiv/Routers): MasRouter provides a more constrained approach to per-query MAS design than FlowReasoner. Where FlowReasoner generates arbitrary multi-agent systems via RL-trained code generation (maximum flexibility, less interpretability), MasRouter uses a cascaded controller: variational latent variable model for topology selection → structured probabilistic cascade for role allocation → multinomial distribution for LLM routing. The cascade provides interpretable intermediate decisions at the cost of a fixed structure-type vocabulary (Chain/Tree/Graph topologies, predefined role categories). FlowReasoner trades interpretability for expressiveness; MasRouter trades expressiveness for interpretability and likely faster convergence. Both achieve per-query optimization. See What decisions must multi-agent routing systems optimize simultaneously?.

Source: Agents Multi

Related concepts in this collection

Can we automatically optimize both prompts and agent coordination? This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
the graph formalism this extends to query-level
Can computational power accelerate scientific discovery itself? Does the pace of research breakthroughs scale with computing resources, like model performance does? ASI-ARCH tested this by running thousands of autonomous experiments to discover neural architectures.
scaling dynamics for architecture search
Can multi-agent teams automatically remove their weakest members? Explores whether agents can score each other's contributions during problem-solving and use those scores to deactivate underperforming teammates in real time, improving overall team efficiency.
inference-time team optimization; FlowReasoner does this at design time
Can extreme task decomposition enable reliable execution at million-step scale? Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
alternative approach: fixed decomposition vs adaptive design
What decisions must multi-agent routing systems optimize simultaneously? Standard LLM routing only picks which model to use. But multi-agent systems involve four interdependent choices: topology, agent count, role assignment, and per-agent model selection. Does optimizing all four together actually improve performance?
MasRouter: more constrained per-query design with interpretable cascade

Concept map

13 direct connections · 89 in 2-hop network ·medium cluster

Can AI systems design unique multi-agent workflo… Can we automatically optimize both prompts and age… Can computational power accelerate scientific disc… Can multi-agent teams automatically remove their w… Can extreme task decomposition enable reliable exe… What decisions must multi-agent routing systems op…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

query-level meta-agents generate personalized multi-agent systems per user query via RL with execution feedback