Can prompt injection reshape multi-agent workflow without touching infrastructure?
Explores whether an attacker can manipulate how a planner assigns tasks and routes coordination purely through prompt crafting, without modifying agents, tools, or messages. This matters because it identifies a planning-time vulnerability most defenses miss.
The flexibility that makes planner-executor multi-agent systems attractive is also their weakness. When a planner converts a prompt into subtasks, roles, dependencies, and routing paths, the prompt is not merely a request — it is the blueprint from which the entire collaboration is constructed. FLOWSTEER demonstrates that an attacker who never touches agents, tools, memory, or inter-agent messages can still steer behavior, because the planning step happens before any of that infrastructure is invoked. A single crafted prompt can bias how the workflow forms in the first place, raising malicious success by up to 55 percent over naive prompting and transferring across MAS setups even under black-box topology inference.
This reframes where multi-agent safety lives. Most existing defenses inspect the artifacts of coordination — the generated workflow, the messages exchanged, the tool calls made. But if the contamination enters at workflow formation, those defenses arrive too late. The attack surface is not the running system; it is the organizational act of deciding who does what and in what order. The counterpoint is that this requires the planner to be promptable at all — fully fixed pipelines are immune — but fixed pipelines forfeit the adaptive coordination that motivates planner-executor designs. This matters because it identifies workflow formation as a distinct security frontier, one that grows more exposed precisely as multi-agent systems become more flexible.
— "FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems", https://arxiv.org/abs/2605.11514
Related concepts in this collection
-
Can one compromised agent corrupt an entire multi-agent network?
Explores whether a single biased agent can spread behavioral corruption through ordinary messages to downstream agents without any direct adversarial access. Matters because it reveals a previously unknown vulnerability in how multi-agent systems communicate.
both attack MAS without privileged access, but FLOWSTEER acts at planning time while subliminal injection rides ordinary messages at runtime
-
How do adversarial traps target different layers of AI agents?
As AI agents browse the web, attackers can exploit their perception, reasoning, memory, actions, and coordination in distinct ways. Understanding these attack vectors is crucial for building robust agent defenses.
planning-time steering is a systemic trap that the six-category taxonomy frames structurally
-
Can workflow inspection catch attacks that bias planning signals?
Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.
extends: the defensive corollary — because contamination enters at workflow formation, workflow-inspecting defenses examine an already-compromised artifact
-
How does workflow position shape attack propagation in multi-agent systems?
Explores whether a malicious signal's influence depends on its injection point in a multi-agent graph, and how task-relevant framing makes downstream agents more likely to relay it without scrutiny.
grounds the propagation mechanism: explains why a planning-time bias spreads, since high-influence positions and sycophantic relay amplify the injected signal downstream
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
multi-agent planner-executor systems expose a planning-time attack surface where prompts reshape agent organization without touching infrastructure