Beginner

Introduction to Prompt Injection

Prompt injection is the most critical security vulnerability in LLM-powered applications. Understanding it is essential for anyone building, deploying, or securing AI systems.

What is Prompt Injection?

Prompt injection is a class of attacks where an attacker crafts input that causes a language model to ignore its original instructions and follow attacker-supplied instructions instead. It is analogous to SQL injection in traditional web applications — untrusted user input is interpreted as commands rather than data.

Critical Understanding: Unlike traditional software vulnerabilities that can be patched, prompt injection is an inherent property of how language models work. LLMs process all text in their context window as instructions, making it fundamentally difficult to separate trusted instructions from untrusted input.

Why LLMs Are Vulnerable

Language models are vulnerable to prompt injection because of their core architecture:

  • No privilege separation: System prompts and user inputs are processed in the same context window with no architectural boundary
  • Instruction following: Models are specifically trained to follow instructions, making them responsive to injected commands
  • Context confusion: Models cannot reliably distinguish between legitimate instructions and adversarial instructions embedded in data
  • Tool access: Modern AI agents have access to tools (web browsing, code execution, APIs) that amplify the impact of successful injection

The Threat Landscape

Attack Vector Description Severity
Direct Injection User directly inputs malicious instructions to override system behavior High
Indirect Injection Malicious instructions embedded in external data sources (websites, emails, documents) Critical
Jailbreaking Techniques to bypass safety training and content policies High
Data Exfiltration Extracting sensitive information through crafted prompts Critical
Privilege Escalation Gaining access to tools or data beyond intended scope Critical

Real-World Incidents

  1. Bing Chat System Prompt Leak (2023)

    Users discovered they could extract Bing Chat's system prompt by asking the model to "ignore previous instructions." The leaked prompt revealed internal code names and behavioral guidelines.

  2. Indirect Injection via Web Browsing

    Researchers demonstrated that hidden text on web pages could instruct AI assistants to exfiltrate conversation data or execute unauthorized actions when browsing the web.

  3. ChatGPT Plugin Exploits

    Third-party ChatGPT plugins were shown to be vulnerable to injection attacks where malicious websites could trigger plugin actions without user consent.

  4. Customer Service Bot Manipulation

    A car dealership's AI chatbot was tricked into agreeing to sell a car for one dollar by users who crafted clever prompt injections that overrode the bot's pricing instructions.

Why This Course Matters

For Developers

Learn to build LLM applications that are resilient to injection attacks through defense-in-depth strategies and secure architecture patterns.

For Security Teams

Understand the unique threat model of LLM applications and how to conduct effective security assessments and penetration testing.

For Product Managers

Make informed decisions about AI deployment risk, understand what security measures are needed, and plan appropriate testing timelines.

For Researchers

Explore the cutting edge of LLM security research, from automated red teaming to novel defense mechanisms.

💡
Looking Ahead: In the next lesson, we will explore specific attack types in detail — from simple jailbreaks to sophisticated indirect injection chains that can compromise entire AI agent systems.