Indirect Prompt Injection: Why AI Agents Get Hijacked by Hidden Web Text
TL;DR
- Google's April 2026 research found a 32% jump in malicious prompt injection content embedded across the public web between Nov 2025 and Feb 2026.
- Attackers hide instructions inside ordinary pages โ white-on-white text, CSS tricks, zero-width Unicode โ that humans never see but AI agents read and obey.
- The vulnerability is architectural, not a bug: language models treat every token in their context as potentially executable, so the line between "data to summarize" and "command to follow" doesn't really exist.
- OWASP ranks prompt injection the #1 AI risk for 2026, and OpenAI itself says the problem may never be fully "solved."
- Defense is layered probability, not a fix โ which changes how you should think about deploying AI agents that browse the open web.
The Trap Was Hiding in the HTML
In late April 2026, Google's security team published the result of an unusual scan. They took CommonCrawl โ the open archive of the public web โ and looked for pages containing prompt injection patterns aimed at AI assistants. The malicious-content category had grown 32% between November 2025 and February 2026.
The technique is called indirect prompt injection (IPI). Attackers don't talk to your AI directly. Instead, they seed instructions inside ordinary-looking web pages and wait for an AI agent โ your AI browser, your enterprise assistant, your research helper โ to scrape that page. The agent reads the hidden text and executes it as if you had typed it.
This is not a fringe research scenario. It is the OWASP-rated #1 AI security threat of 2026, and the design choices that enable it sit at the heart of how every modern language model works.
What Is Indirect Prompt Injection?
Indirect prompt injection is an attack where malicious instructions are hidden inside content that an AI later ingests, causing the AI to act on those instructions instead of (or in addition to) the user's original request. Direct injection is a user typing "ignore previous instructions"; indirect injection is a third party planting that text on a webpage so your agent will swallow it without you noticing.
The "indirect" part is what makes it dangerous. You never asked your AI to do anything bad. You asked it to summarize a vendor page, read a job posting, or check a competitor's site. The instructions came along for the ride.
How Hidden Web Prompts Work
Attackers have a long menu of techniques for embedding text that humans cannot see but LLMs will happily read.
Visual concealment
| Technique | Mechanism | Why AI still reads it |
|---|---|---|
| White-on-white text | Foreground color matches background | LLM sees raw text, not rendered colors |
| Off-screen positioning | position: absolute; left: -9999px |
Style is rendering hint, not content filter |
| Zero-pixel font | font-size: 0 |
Text remains in DOM and HTML source |
| Hidden HTML elements | display: none / visibility: hidden |
Many scrapers extract text regardless |
Encoding tricks
Zero-width Unicode characters (U+200B, U+FEFF, etc.) can be inserted between letters of a normal sentence. To a human reader, the sentence is unchanged. To a tokenizer, the byte sequence is different โ and an attacker can use that hidden channel to smuggle in a parallel instruction.
CDATA sections inside SVG files are another favorite. The XML parser ignores the contents as markup, so designers and humans never see them, but most LLM ingestion pipelines just strip tags and pass the raw text through.
Image-based payloads
Researchers have demonstrated readable instructions hidden as faint text in screenshots โ light blue on yellow, for example โ that a vision model lifts out perfectly while humans squint and see noise. Because multimodal AI processes pixels through the same pipeline as text, a screenshot of a webpage can carry payloads that the page itself wouldn't survive a basic content scan.
The common thread: the human eye depends on rendering, contrast, and visual context. The AI depends on the underlying byte stream.
The Root Cause: Tokens Are Tokens
Here is the part that gets glossed over in security blogs and is the actual key to the whole problem.
A large language model does not have a separate "instruction channel" and "data channel." When you give it a system prompt, your question, and the contents of a webpage, it concatenates all of them into one long sequence of tokens and tries to predict the next one. Every token in that context window has the same status: it is evidence about what the next token should be.
This is why prompt injection is structurally different from, say, SQL injection. In a database, queries and data are different artifacts handled by different parsers. You can escape inputs, parameterize queries, and create a hard boundary. In an LLM, there is no parser-level boundary. There is just a big text blob and a model's learned tendency to follow instruction-shaped patterns.
The UK National Cyber Security Centre put it bluntly in 2023: prompt injection "may simply be an inherent issue with LLM technology." Three years and many billions of training dollars later, that assessment has held.
You can train the model to prefer the system prompt. You can fine-tune it to resist obvious overrides. But you cannot build a hard wall between "stuff I should do" and "stuff I should only read about," because the model has no concept of those categories at the architectural level.
Why Agentic AI Multiplies the Danger
A chatbot that produces wrong text is annoying. An agent that takes actions in the world is something else entirely.
Modern AI agents combine three capabilities that, on their own, are fine but together create what security researchers now call the lethal trifecta:
- Reading untrusted external content (web pages, emails, documents shared by others).
- Accessing private or sensitive data (your inbox, your files, your company database).
- Taking external actions (sending messages, running code, making purchases, calling APIs).
Any agent with all three is a high-impact target. A poisoned webpage doesn't just trick the model into saying something silly โ it can instruct the agent to read a private email, exfiltrate the contents to a webhook, and then delete the trail. Each individual capability passed an internal review. The combination did not.
This connects directly to the supply-chain attack on Vercel's Context.ai integration in April 2026, where the attack surface wasn't the AI itself but the OAuth-connected tools the AI could reach. The pattern is the same: capabilities compound, and so do the consequences when one piece is compromised.
Can Prompt Injection Be Fixed?
Short answer: probably not in a way that ever lets you fully trust an agent on the open web. OpenAI has publicly said the problem may never be "solved" โ only mitigated.
What does mitigation look like? A layered defense, none of which is bulletproof on its own.
| Defense layer | What it does | What it can't do |
|---|---|---|
| Input filtering | Strips suspicious patterns from scraped content | Misses obfuscated or novel payloads |
| Instruction hierarchy training | Teaches model to prefer system prompts | Defeated by sufficiently authoritative injections |
| Sandboxing & permissions | Limits what an agent can actually do | Only as good as the permission model |
| Human-in-the-loop confirmation | Requires user OK on sensitive actions | Defeated by alert fatigue |
| Output monitoring | Watches for exfiltration patterns | Detects, but only after the action |
| Provenance tracking | Marks tokens as "external" vs "trusted" | Still experimental; performance cost |
Google's own framing is honest: their strategy is "layered defense," not elimination. Each layer reduces the probability of a successful attack. None of them brings it to zero.
How Do You Know If Your AI Browser Got Hijacked?
You usually don't โ at least not in real time. Indirect prompt injection is designed to be silent. The agent does what the attacker said, often in addition to what you said, and the visible response looks normal. By the time the credential is exfiltrated or the email is sent, the page is closed.
The practical signals are indirect:
- Unexpected actions in connected tools (sent emails, calendar invites, file changes you didn't initiate).
- Account activity alerts from services your AI has access to.
- Drift in agent behavior โ answers suddenly hostile to a particular vendor, sudden recommendations to visit specific URLs, refusals on topics it used to handle.
If your AI agent has access to anything that matters โ your inbox, your bank, your code โ the right mental model is "browser-grade trust, not assistant-grade trust." Treat every webpage your agent reads as potentially hostile, the way you treat every email attachment.
What This Means for You
If you use AI agents โ or build them โ three principles follow from the architecture, not the news.
1. Limit the lethal trifecta on purpose. If an agent can read untrusted web content, do not also give it write access to your private systems. If it has access to your data, do not give it autonomous external actions. Two of the three is uncomfortable. All three is a vulnerability waiting for a target.
2. Stop trusting "the model knows better." Modern LLMs are confident, fluent, and sometimes obedient to the wrong master. The defense is not better models โ it is fewer permissions and more confirmation steps at the boundary where actions actually happen.
3. Read the consent screen like a security boundary. When an AI tool asks for OAuth access to your inbox or your drive, you are not granting it to the AI โ you are granting it to every webpage that AI will ever read for you. Scope accordingly.
This is the same logic as the five locks of basic cybersecurity: the goal is not perfection, it's making the cheap attacks expensive. Indirect prompt injection just adds a new lock to the door โ one that didn't exist when the cybersecurity playbook was written.
Bottom Line
Indirect prompt injection isn't a temporary glitch on the road to safe AI agents. It is the predictable consequence of building systems that turn arbitrary text into actions and then point them at the open internet. Google's 32% increase isn't a sudden surge โ it's the visible part of a steady professionalization. Attackers have figured out that prompt injection is a structural foothold, and structural footholds get exploited.
The good news: this is a known problem with a known shape. Treat AI agents like browsers, not like assistants. Limit what they can touch. Confirm sensitive actions. And stop assuming a confident answer is a safe one.
The bad news: nobody is shipping a patch that makes the question go away.
Related reading:
- Cybersecurity Essentials: 5 Locks Every Digital Door Needs โ the foundational layered-defense mindset this post extends
- Vercel's Context.ai Breach: Why Your AI Assistant Is Now an Attack Surface โ when the AI's tools are the weak link
๐ Sources
- Google Online Security Blog, "AI threats in the wild: The current state of prompt injections on the web" (April 2026)
- Help Net Security, "Indirect prompt injection is taking hold in the wild" (April 24, 2026)
- Securance, "Prompt injection: the OWASP #1 AI threat in 2026"
- Fortune, "OpenAI says prompt injections that can trick AI browsers may never be fully 'solved'" (December 2025)
- Palo Alto Networks Unit 42, "Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild"
- OWASP Foundation, "Prompt Injection" reference page
- UK National Cyber Security Centre, advisory on LLM prompt injection (2023)
- SecurityWeek, "Google DeepMind Researchers Map Web Attacks Against AI Agents"
'๐ฌ Science & Tech' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
| Pipeline AI vs Tool AI: Lessons from Novo Nordisk ร OpenAI (0) | 2026.05.02 |
|---|---|
| mRNA Cancer Vaccine: 6-Year Pancreatic Survival Data (0) | 2026.05.01 |
| Otarmeni Approved: How Gene Therapy Reversed Genetic Deafness (0) | 2026.04.30 |
| Neuromorphic Chips: Why a 70% AI Energy Cut Starts with Brain-Like Memory (0) | 2026.04.28 |
| How AI Remembers: DeepSeek V4 and the Million-Token Breakthrough (0) | 2026.04.25 |