Agentic Systems and Planning

Can workflow inspection catch attacks that bias planning signals?

Does inspecting the final workflow catch attacks that contaminate earlier planning stages? This matters because contamination laundered through the planner may look legitimate by the time the workflow exists.

Note · 2026-05-28 · sourced from Agents Multi Architecture

A defense can only catch what it can see, and where it looks determines what it can catch. Because FLOWSTEER biases the planning signals from which the workflow is generated, any defense that inspects only the resulting workflow examines an artifact that is already compromised. The malicious intent has been laundered through the planner into legitimate-looking roles, dependencies, and routing — by the time the workflow exists, the contamination is no longer visibly malicious. This is why the paper introduces FLOWGUARD as an input-side defense: it strengthens the planning boundary by separating task, methodological, and framing intents, then reframes workflow-contaminating cues while preserving the original task objective, reducing malicious success by up to 34 percent without degrading prompt utility.

The general principle is about defense placement, not defense strength. Moving inspection upstream — to the point where intent is parsed but before organization is committed — catches a class of attack that downstream inspection structurally cannot. The counterpoint is that input-side defense risks false positives that suppress legitimate methodological guidance, which is exactly why FLOWGUARD separates intent types rather than filtering wholesale. This matters because it reframes MAS security as a question of where the trust boundary sits: the safest place to intervene is the boundary between instruction and organization, not the organization itself.


— "FLOWSTEER: Prompt-Only Workflow Steering Exposes Planning-Time Vulnerabilities in Multi-Agent LLM Systems", https://arxiv.org/abs/2605.11514

Related concepts in this collection

Concept map
12 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

defenses that inspect only the generated workflow miss attacks that bias the upstream planning signal