Agentic and Multi-Agent Systems

Can AI systems design unique multi-agent workflows per individual query?

Explores whether meta-agents trained with reinforcement learning can automatically generate personalized multi-agent system architectures tailored to individual user queries, rather than applying fixed task-level templates uniformly.

Note · 2026-02-23 · sourced from Agents Multi
What breaks when specialized AI models reach real users? What makes multi-agent teams actually perform better?

Previous approaches to automating multi-agent system design operate at the task level: design one workflow for "code generation tasks," another for "summarization tasks," and apply each uniformly to all queries of that type. FlowReasoner (2025) shifts this to the query level — generating a unique multi-agent system for each individual user query.

The architecture has two phases. First, distill from DeepSeek R1 to give the meta-agent basic reasoning about how to design multi-agent workflows. Then enhance via RL with external execution feedback — the meta-agent generates a multi-agent system, that system runs on the query, and the execution result provides reward signal. A multi-purpose reward guides training across three dimensions: performance (did it work), complexity (how many agents and steps), and efficiency (how much compute).

This matters because one-size-fits-all multi-agent systems lack the capability for automatic adaptation to individual queries. A code generation task where the user wants "build a 2048 game" needs a fundamentally different agent composition than "fix a sorting bug." The query-level approach treats multi-agent architecture design as itself a reasoning problem amenable to RL.

The progression is notable: manual design → fixed template optimization → graph-based workflow search → code-based meta-agents → RL-trained query-level meta-agents. Each step automates one more degree of freedom. The connection to Can we automatically optimize both prompts and agent coordination? is direct: FlowReasoner represents multi-agent systems as code and optimizes them, but at the individual query level rather than the task level.

Since Can computational power accelerate scientific discovery itself?, the RL-trained meta-agent approach may follow similar scaling dynamics — more compute for the meta-agent should yield better per-query system designs.

MasRouter as structured alternative (from Arxiv/Routers): MasRouter provides a more constrained approach to per-query MAS design than FlowReasoner. Where FlowReasoner generates arbitrary multi-agent systems via RL-trained code generation (maximum flexibility, less interpretability), MasRouter uses a cascaded controller: variational latent variable model for topology selection → structured probabilistic cascade for role allocation → multinomial distribution for LLM routing. The cascade provides interpretable intermediate decisions at the cost of a fixed structure-type vocabulary (Chain/Tree/Graph topologies, predefined role categories). FlowReasoner trades interpretability for expressiveness; MasRouter trades expressiveness for interpretability and likely faster convergence. Both achieve per-query optimization. See What decisions must multi-agent routing systems optimize simultaneously?.


Source: Agents Multi

Related concepts in this collection

Concept map
13 direct connections · 89 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

query-level meta-agents generate personalized multi-agent systems per user query via RL with execution feedback