Introduction to Prompt Injection
Prompt injection is the most critical security vulnerability in LLM-powered applications. Understanding it is essential for anyone building, deploying, or securing AI systems.
What is Prompt Injection?
Prompt injection is a class of attacks where an attacker crafts input that causes a language model to ignore its original instructions and follow attacker-supplied instructions instead. It is analogous to SQL injection in traditional web applications — untrusted user input is interpreted as commands rather than data.
Why LLMs Are Vulnerable
Language models are vulnerable to prompt injection because of their core architecture:
- No privilege separation: System prompts and user inputs are processed in the same context window with no architectural boundary
- Instruction following: Models are specifically trained to follow instructions, making them responsive to injected commands
- Context confusion: Models cannot reliably distinguish between legitimate instructions and adversarial instructions embedded in data
- Tool access: Modern AI agents have access to tools (web browsing, code execution, APIs) that amplify the impact of successful injection
The Threat Landscape
| Attack Vector | Description | Severity |
|---|---|---|
| Direct Injection | User directly inputs malicious instructions to override system behavior | High |
| Indirect Injection | Malicious instructions embedded in external data sources (websites, emails, documents) | Critical |
| Jailbreaking | Techniques to bypass safety training and content policies | High |
| Data Exfiltration | Extracting sensitive information through crafted prompts | Critical |
| Privilege Escalation | Gaining access to tools or data beyond intended scope | Critical |
Real-World Incidents
-
Bing Chat System Prompt Leak (2023)
Users discovered they could extract Bing Chat's system prompt by asking the model to "ignore previous instructions." The leaked prompt revealed internal code names and behavioral guidelines.
-
Indirect Injection via Web Browsing
Researchers demonstrated that hidden text on web pages could instruct AI assistants to exfiltrate conversation data or execute unauthorized actions when browsing the web.
-
ChatGPT Plugin Exploits
Third-party ChatGPT plugins were shown to be vulnerable to injection attacks where malicious websites could trigger plugin actions without user consent.
-
Customer Service Bot Manipulation
A car dealership's AI chatbot was tricked into agreeing to sell a car for one dollar by users who crafted clever prompt injections that overrode the bot's pricing instructions.
Why This Course Matters
For Developers
Learn to build LLM applications that are resilient to injection attacks through defense-in-depth strategies and secure architecture patterns.
For Security Teams
Understand the unique threat model of LLM applications and how to conduct effective security assessments and penetration testing.
For Product Managers
Make informed decisions about AI deployment risk, understand what security measures are needed, and plan appropriate testing timelines.
For Researchers
Explore the cutting edge of LLM security research, from automated red teaming to novel defense mechanisms.
Lilly Tech Systems