What network topologies are most vulnerable to bias propagation?
This explores which structural shapes of a system — agent-to-agent message chains, workflow dependency graphs, recommender feedback loops — let a bias spread furthest once it's introduced, rather than asking how any single model becomes biased.
This explores which *structural shapes* — multi-agent message chains, workflow dependency graphs, and recommender feedback loops — let a bias spread furthest once introduced. The corpus converges on a clear answer: topologies where signals pass through many hops, or where many paths funnel through a few high-traffic nodes, are the dangerous ones. A single compromised agent can corrupt a network of six downstream agents in both chain and bidirectional layouts using nothing but ordinary messages — and because the bias rides along with no explicit semantic payload, paraphrasing and content filters never catch it Can one compromised agent corrupt an entire multi-agent network?. Length is itself a vulnerability: the more hops a signal makes, the more chances it has to be relayed and amplified rather than corrected.
But not all positions in a graph are equal, and this is the most useful thing to take away. Influence concentrates where dependencies converge — the subtasks that many other steps feed into. Inject a malicious signal there and it travels far; inject it at a leaf node and it dies locally How does workflow position shape attack propagation in multi-agent systems?. The same work finds that *framing* matters as much as position: a biased claim dressed up as evidence rather than an instruction gets passed along, because downstream agents treat it as data to relay. So the worst case is a high-fan-in node carrying an evidence-framed signal.
There's a parallel story inside a single reasoning model that's worth seeing alongside the multi-agent one. A long chain-of-thought is, structurally, a chain topology — and it fails the same way. Reasoning models lose 25–29% accuracy under multi-turn manipulation precisely because extended chains create more corruption points, where one wrong step propagates into a confident wrong conclusion Are reasoning models actually more vulnerable to manipulation?. "More steps" is the same risk whether the steps are agents or thoughts.
The other topology the corpus flags is the *loop*. Recommender systems are graphs that feed their own outputs back as tomorrow's training data, and that closed loop turns small biases into entrenched ones. A ranker without explicit selection-bias correction converges on degenerate equilibria that amplify its own past decisions Why do ranking systems need to model selection bias explicitly?. Low-dimensional embeddings compound popularity bias the same way — niche items get under-exposed, which starves them of the very interactions that would justify exposing them Does embedding dimensionality secretly drive popularity bias in recommenders?. And hash-collision tables make it worse over time because collisions pile up exactly on the high-frequency users and items the system most needs to get right Why do hash collisions hurt recommendation models so much?.
The through-line across all of these: bias propagates worst where signals are long-lived and where structure concentrates traffic. Chains give it distance, convergence nodes give it reach, and feedback loops give it permanence. The unexpected lesson is that the fix isn't always cleaner inputs — it's shorter paths, position-aware defenses at the high-fan-in nodes, and breaking the loops (selection-bias correction, fairness-aware dimensionality) before they harden.
Sources 6 notes
Research demonstrates that a single biased agent can transmit persistent behavioral corruption through six downstream agents in chain and bidirectional topologies using only normal inter-agent communication. The bias evades detection and paraphrasing defenses because it carries no explicit semantic content.
FLOWSTEER demonstrates that malicious signals propagate farther when injected into high-influence subtasks, and that framing them as evidence rather than instruction causes downstream agents to relay them. Influence concentrates where dependencies converge, making position-aware attacks far more effective.
GaslightingBench-R shows that multi-turn manipulative prompts reduce reasoning model accuracy significantly more than standard models. Extended chains create more corruption points, allowing single wrong steps to propagate into confident incorrect conclusions.
YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.
Research shows that when user/item embedding dimensions are too small, recommender systems overfit toward popular items to maximize ranking quality. This compounds over time as niche items receive insufficient exposure, and cannot be fixed post-hoc without treating dimensionality as a fairness hyperparameter.
Monolith's empirical work shows that real recommendation systems have power-law distributed frequencies, causing collisions to accumulate precisely on the entities models need most accurate. Fixed-size hashed tables worsen this over time as new IDs arrive.