Prompt Injection

noun

Foundational concepts Reporting on AI

A cyberattack on AI systems where malicious instructions are hidden inside content the AI reads — such as a webpage, email, or document — tricking it into abandoning its original instructions and doing the attacker's bidding instead.

The risk becomes acute as AI "agents" gain the ability to browse the web, read emails, and take actions on a user's behalf. Imagine a reporter's AI assistant is tasked with reviewing public documents from a government website. If an attacker has embedded hidden commands in those pages, the agent could be tricked into forwarding the reporter's login credentials, deleting emails, or leaking private files — all without the reporter knowing anything went wrong. Because agents can act autonomously across many steps, a single injected instruction can cause a cascade of unauthorized actions. The Model Context Protocol makes agents more powerful by connecting them to outside tools and data, but also expands the attack surface for prompt injection.

OpenAI, Anthropic, and other AI companies have acknowledged that prompt injection may never be fully solved. The U.K.'s National Cyber Security Centre warned it "may never be totally mitigated." Security researchers compare it to social engineering and phishing: there is no one-time fix, only ongoing defenses. The problem is closely tied to alignment — the broader challenge of ensuring AI systems follow the instructions of the people they're supposed to serve, not the ones they've been tricked into serving.

_Prompt injection_s are commands that can derail bots from their normal processes, sometimes allowing hackers to trick them into sharing sensitive user information with them or performing tasks that a user may not want the bots to perform. — NBC News

"Prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agent fall for these attacks." — Dane Stuckey, OpenAI chief information security officer — NBC News

Entry by Ryan Serpico

Flag Changelog

About this glossary — who's behind this site and how you can contribute.