Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Paper · arXiv 2508.17674 · Published August 25, 2025

Abstract—We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (i) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (ii) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community. Index Terms—Large Language Models (LLMs); LLM Security; Prompt Injection; Backdoor; Advertisement Embedding Attack; AI Agents

Introduction. Over the past decade, AI has rapidly evolved from computer vision and machine learning to NLP and today’s progression toward AGI through LLMs, multimodal models, and AI agents [1]–[3]. Since 2023, GPT-based LLM services and open-source models like Meta’s LLaMA [4] have gained widespread adoption. These AI models, including LLMs, have been extensively deployed in people’s daily lives, encompassing online LLM services such as ChatGPT, Gemini, Grok, and Claude, as well as real-time traffic prediction, weather forecasting, urban-water system prediction, medical diagnosis, psychological counseling, autonomous driving systems like FSD, and aerospace applications [5]–[9]. These technologies play increasingly crucial roles in automating human life. Consequently, ensuring AI model security has become paramount given emerging attacks including hijacking, backdoor attacks, membership inference, model stealing, and adversarial attacks [10]–[13], which can cause device failures, operational disruptions, and data theft.

Discussion / Conclusion. In this research, we propose and define the newly discovered Advertisement Embedding Attacks (AEA) against LLM and AI agent systems, which can cause model responses to contain attacker-desired advertisements, misinformation, and other harmful information. In our experiments, the state-of-the-art Gemini 2.5 model can be easily misled by our proposed AEA attack prompts and prioritize returning our predefined attack data, which can be readily exploited by attackers on inference service distribution platforms, potentially causing significant harm to all parties. Addressing this threat, we believe that AEA will become as prevalent as web viruses. Researchers and LLM service providers should urgently investigate how to counter such attacks.

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

Synthesis notes that discuss concepts related to this paper