FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems
Multi-agent systems (MAS) powered by large language models (LLMs) increasingly adopt planner–executor architectures, where planners convert prompts into subtasks, roles, dependencies, and routing paths. This flexibility enables adaptive coordination, but exposes an attack surface in workflow formation: prompts can shape agent organization without modifying MAS infrastructure. We study this risk through social influence, probing workflows to identify high-impact subtasks and malicious-signal propagation. The analysis reveals two vulnerabilities: workflow position can amplify or suppress a malicious signal, and sycophantic framing makes downstream agents more likely to relay it. We translate these findings into FLOWSTEER, a prompt-only workflow steering attack that converts vulnerability priors into one crafted prompt. FLOWSTEER aligns a malicious signal with influential task components and guides replanning toward dependencies that preserve propagation. Experiments show that FLOWSTEER increases malicious success by up to 55% over naive prompting, transfers across MAS setups, and remains effective with black-box topology inference. As FLOWSTEER biases the planning signals that generate the workflow, MAS defenses that inspect only the generated workflow provide limited protection. As such, we introduce FLOWGUARD, an input-side defense that reduces malicious success by up to 34% while preserving prompt utility. Our results position workflow formation as a new safety frontier for multi-agent LLM systems, opening a planning-time security perspective on how agent coordination itself can be attacked and defended.
LLM-based agents are increasingly organized into collaborative multi-agent systems (MAS), where specialized agents exchange intermediate outputs and aggregate decentralized decisions. A prominent instantiation is the planner–executor architecture: a planner decomposes a user task into subtasks, assigns execution roles, constructs communication dependencies, and coordinates executor agents toward a final response. Such systems are already appearing in consequential workflows, including agentic coding assistance, financial risk analysis, and public policy simulation. This shift makes coordination itself a safety-critical object. Existing MAS security research has primarily examined attacks within already formed workflows, such as hijacking executor agents, poisoning shared memory, manipulating tool calls, or corrupting inter-agent messages. These attacks expose important failure modes, but they typically assume that the workflow already exists and that the adversary can intervene in some internal component during execution. We study a higher-level attack surface: workflow formation, where the ordinary prompt interface becomes a point of leverage for biasing coordination structure without modifying agents, tools, memory, messages, or execution-time dependencies.
Motivated by these vulnerabilities, we propose FLOWSTEER, a prompt-only workflow steering attack designed around a simple principle: prompt-only attacks can be strengthened by controlling where a malicious signal enters a workflow and how the planner routes it afterward. The first stage, a task-aware sycophantic argument, exploits structural sensitivity by aligning the malicious signal with a high-influence subtask, while using framing cues to make it appear as task-relevant evidence. The second stage, dependency-guided workflow steering, addresses replanning instability: because a manipulated prompt may cause the planner to regenerate roles and dependencies, FLOWSTEER expresses propagation-favorable dependency patterns as natural-language guidance that biases the newly formed workflow. In this way, FLOWSTEER does not require access to agents, tools, memory, or messages; it steers the planning signals from which collaboration is constructed. This same observation motivates FLOWGUARD, an input-side defense that strengthens the planning boundary by separating task, methodological, and framing intents, then reframing workflow-contaminating cues while preserving the original task objective.
This work identifies planner-executor MAS as vulnerable at a boundary that current safety methods largely overlook: the moment a user prompt is converted into subtasks, roles, dependencies, and routing paths. Our findings show that workflow formation is not a neutral preprocessing step. It can determine which signals become influential, how they are framed for downstream adoption, and how they propagate toward the final system output. This shifts the safety question from guarding individual agents, tools, memory, or messages after collaboration begins to protecting the planning process that organizes collaboration in the first place. As LLM systems increasingly rely on dynamic planning, workflow-level safety must become a first-class design principle for multi-agent systems. Future work should develop planning-time defenses, broader diagnostics for workflow influence, and safeguards that remain robust under realistic prompt-only access, especially in high-stakes settings where task decomposition and information routing shape consequential decisions.