LLM Security Landscape

Lesson 1 of 7 in the LLM Security & Prompt Injection course.

Understanding LLM Security Landscape

Large language models (LLMs) have introduced an entirely new class of security vulnerabilities that traditional application security frameworks were not designed to address. Unlike conventional software where inputs are strictly typed and behavior is deterministic, LLMs accept free-form natural language and produce probabilistic outputs, creating a vast and unpredictable attack surface.

The rapid deployment of LLMs in customer-facing applications, internal tools, and autonomous agents has outpaced the development of security best practices. Organizations are integrating LLMs into workflows that handle sensitive data, make business decisions, and interact with external systems, often without adequate security controls. Understanding the full security landscape is the essential first step toward securing these systems.

Core Concepts

The LLM security landscape encompasses several categories of risk that security teams must address:

Prompt injection: Attackers craft inputs that override the system prompt or intended behavior, causing the LLM to follow attacker instructions instead of developer instructions
Data exfiltration: LLMs may leak sensitive information from their training data, system prompts, or context windows through carefully crafted queries
Jailbreaking: Techniques that bypass the model's safety alignment to produce harmful, unethical, or policy-violating outputs
Hallucination exploitation: Attackers leverage the model's tendency to generate plausible-sounding but false information to spread misinformation or manipulate users
Agent-based risks: LLM agents with tool access can be manipulated to execute unauthorized actions like sending emails, modifying databases, or accessing file systems

💡

Key insight: The most dangerous LLM vulnerabilities are not in the model itself but in how the application integrates the model. A well-designed application architecture can mitigate many LLM security risks even if the underlying model has vulnerabilities.

The OWASP LLM Top 10

OWASP maintains a dedicated top 10 list for LLM applications that provides a structured view of the most critical risks:

Risk Categories

The OWASP LLM Top 10 covers prompt injection (LLM01), insecure output handling (LLM02), training data poisoning (LLM03), model denial of service (LLM04), supply chain vulnerabilities (LLM05), sensitive information disclosure (LLM06), insecure plugin design (LLM07), excessive agency (LLM08), overreliance (LLM09), and model theft (LLM10). Each category requires specific mitigations.

Identify all LLM integration points: Map every location where your application sends data to or receives data from an LLM, including system prompts, user inputs, context injections, and tool calls
Classify data sensitivity: Determine what sensitive data the LLM has access to through its context window, connected tools, and training data
Assess trust boundaries: Identify where untrusted input (user messages, external data) enters the LLM pipeline and where LLM output is used in trusted contexts
Evaluate tool permissions: For LLM agents, audit what actions the model can take and whether those permissions follow the principle of least privilege
Review output handling: Examine how LLM outputs are processed, displayed, and used in downstream systems to identify injection and XSS risks

Threat Modeling for LLM Applications

When threat modeling LLM applications, consider both the direct attack surface (user inputs to the model) and the indirect attack surface (external data sources that feed into the model's context). Indirect prompt injection through retrieved documents, emails, or web content is particularly dangerous because it can be invisible to the user.

Python

# LLM Security Assessment Framework
class LLMSecurityAssessment:
    """Systematic security assessment for LLM-powered applications."""

    def __init__(self, app_name):
        self.app_name = app_name
        self.findings = []

    def assess_input_handling(self, config):
        """Check input security controls."""
        checks = {
            "input_length_limit": config.get("max_input_tokens", 0) > 0,
            "input_sanitization": config.get("sanitize_inputs", False),
            "system_prompt_isolation": config.get("system_prompt_protected", False),
            "user_role_separation": config.get("role_based_prompts", False),
        }
        for check, passed in checks.items():
            if not passed:
                self.findings.append({
                    "category": "Input Handling",
                    "check": check,
                    "severity": "HIGH",
                    "recommendation": f"Enable {check} for {self.app_name}"
                })
        return checks

    def assess_output_handling(self, config):
        """Check output security controls."""
        checks = {
            "output_filtering": config.get("filter_outputs", False),
            "html_escaping": config.get("escape_html", False),
            "pii_detection": config.get("detect_pii_in_output", False),
            "confidence_thresholds": config.get("min_confidence", 0) > 0,
        }
        for check, passed in checks.items():
            if not passed:
                self.findings.append({
                    "category": "Output Handling",
                    "check": check,
                    "severity": "MEDIUM",
                    "recommendation": f"Implement {check}"
                })
        return checks

    def assess_agent_permissions(self, tools):
        """Assess tool permissions for LLM agents."""
        high_risk_actions = ["delete", "send_email", "execute_code",
                            "modify_database", "transfer_funds"]
        for tool in tools:
            for action in high_risk_actions:
                if action in tool.get("capabilities", []):
                    self.findings.append({
                        "category": "Agent Permissions",
                        "check": f"{tool['name']}/{action}",
                        "severity": "CRITICAL",
                        "recommendation": f"Add human-in-the-loop for {action}"
                    })

    def generate_report(self):
        """Generate assessment report."""
        print(f"\nSecurity Assessment: {self.app_name}")
        print(f"Total findings: {len(self.findings)}")
        for f in sorted(self.findings, key=lambda x: x['severity']):
            print(f"  [{f['severity']}] {f['category']}: {f['check']}")

Best Practices for LLM Security

Implement these foundational security practices for any LLM application:

Defense in depth: Never rely solely on the model's alignment or system prompt for security. Implement input validation, output filtering, and application-level access controls as independent layers
Least privilege for agents: LLM agents should have the minimum permissions necessary. Use separate API keys with restricted scopes for each tool the agent can access
Monitor and log: Log all LLM interactions including inputs, outputs, and tool calls. Monitor for anomalous patterns that may indicate prompt injection or data exfiltration attempts
Assume breach: Design your application assuming the LLM will eventually be compromised. Ensure that even if an attacker controls the model's output, the damage is contained through application-level controls

Building an LLM Security Program

Establishing a comprehensive LLM security program requires ongoing effort. Start by inventorying all LLM deployments in your organization, classify them by risk level based on data sensitivity and autonomy, and apply appropriate security controls to each. Regular red teaming, prompt injection testing, and security reviews should be integrated into the development lifecycle for any LLM-powered application.

Implementation Checklist

Inventory all LLM integrations and classify by risk level
Implement input validation and sanitization for all user-provided prompts
Add output filtering to detect and prevent data leakage
Configure rate limiting and usage monitoring for all LLM API endpoints
Conduct regular prompt injection testing as part of security assessments
Establish incident response procedures specifically for LLM security events

⚠

Warning: System prompts are NOT a security boundary. They can be extracted through various prompt injection techniques. Never embed secrets, API keys, or sensitive instructions in system prompts. Treat them as public information from a security perspective.

Summary and Next Steps

The LLM security landscape is broad and rapidly evolving. By understanding the key risk categories, conducting thorough threat modeling, and implementing defense-in-depth strategies, organizations can deploy LLMs with confidence. In the next lesson, we will explore Direct Prompt Injection.

Next →Direct Prompt Injection