Can language models be hijacked to hide covert advertising?

Explores whether LLMs can be compromised to inject promotional or propaganda content into outputs without degrading accuracy, and whether attackers can exploit distribution channels to do so at scale.

Synthesis note · 2026-06-03 · sourced from Flaws

Most adversarial-attack research targets accuracy: degrade the model, induce wrong answers, jailbreak safety. Advertisement Embedding Attacks (AEA) name a different objective — information integrity. They stealthily inject promotional or malicious content (covert ads, propaganda, hate speech) into outputs while the response otherwise appears normal and accurate. Two low-cost vectors carry it: hijacking third-party service-distribution platforms to prepend adversarial prompts, and publishing backdoored open-source checkpoints fine-tuned with attacker data.

What makes AEA distinctive is the commercial incentive structure and the invisibility. Because accuracy is untouched, standard quality metrics and many safety filters miss it; the harm is the insertion of an interested party into ostensibly neutral output, mapped across five stakeholder victim groups. The proposed mitigation is a prompt-based self-inspection defense requiring no retraining — the model audits its own output for injected content.

This extends the vault's injection/poisoning cluster along a new axis. Where Can one compromised agent corrupt an entire multi-agent network? concerns behavioral bias and Can we defend RAG systems from corpus poisoning without retraining? concerns retrieval, AEA targets the commercial integrity of generation itself — and the authors warn it could become "as prevalent as web viruses," since the economic motive (paid placement) is durable in a way that pure sabotage is not.

Inquiring lines that use this note as a source 4

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 118 in 2-hop network ·dense cluster Open in graph ↗

Can language models be hijacked to hide covert a… Can one compromised agent corrupt an entire multi-… Can we defend RAG systems from corpus poisoning wi… Does advanced technology eventually function like …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can one compromised agent corrupt an entire multi-agent network? Explores whether a single biased agent can spread behavioral corruption through ordinary messages to downstream agents without any direct adversarial access. Matters because it reveals a previously unknown vulnerability in how multi-agent systems communicate.
adjacent injection vector; AEA carries commercial payloads rather than behavioral traits
Can we defend RAG systems from corpus poisoning without retraining? Explores whether retrieval-time defenses can catch and block poisoned documents before they reach the generator, without expensive retraining cycles. Matters because corpus updates outpace model retraining in production RAG systems.
both are integrity attacks with lightweight, retraining-free defenses
Does advanced technology eventually function like cultural myth? Explores whether the most sophisticated technical systems—particularly AI—end up operating in culture the way traditional myths do: as unquestionable authorities accepted on faith rather than verified on merit.
AEA exploits exactly the unearned authority of fluent normal-looking output

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

advertisement embedding attacks are a new threat class that subverts information integrity rather than accuracy — covert ads and propaganda while output appears normal

Can language models be hijacked to hide covert advertising?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4