LLM Security Landscape
Lesson 1 of 7 in the LLM Security & Prompt Injection course.
Understanding LLM Security Landscape
Large language models (LLMs) have introduced an entirely new class of security vulnerabilities that traditional application security frameworks were not designed to address. Unlike conventional software where inputs are strictly typed and behavior is deterministic, LLMs accept free-form natural language and produce probabilistic outputs, creating a vast and unpredictable attack surface.
The rapid deployment of LLMs in customer-facing applications, internal tools, and autonomous agents has outpaced the development of security best practices. Organizations are integrating LLMs into workflows that handle sensitive data, make business decisions, and interact with external systems, often without adequate security controls. Understanding the full security landscape is the essential first step toward securing these systems.
Core Concepts
The LLM security landscape encompasses several categories of risk that security teams must address:
- Prompt injection: Attackers craft inputs that override the system prompt or intended behavior, causing the LLM to follow attacker instructions instead of developer instructions
- Data exfiltration: LLMs may leak sensitive information from their training data, system prompts, or context windows through carefully crafted queries
- Jailbreaking: Techniques that bypass the model's safety alignment to produce harmful, unethical, or policy-violating outputs
- Hallucination exploitation: Attackers leverage the model's tendency to generate plausible-sounding but false information to spread misinformation or manipulate users
- Agent-based risks: LLM agents with tool access can be manipulated to execute unauthorized actions like sending emails, modifying databases, or accessing file systems
The OWASP LLM Top 10
OWASP maintains a dedicated top 10 list for LLM applications that provides a structured view of the most critical risks:
Risk Categories
The OWASP LLM Top 10 covers prompt injection (LLM01), insecure output handling (LLM02), training data poisoning (LLM03), model denial of service (LLM04), supply chain vulnerabilities (LLM05), sensitive information disclosure (LLM06), insecure plugin design (LLM07), excessive agency (LLM08), overreliance (LLM09), and model theft (LLM10). Each category requires specific mitigations.
- Identify all LLM integration points: Map every location where your application sends data to or receives data from an LLM, including system prompts, user inputs, context injections, and tool calls
- Classify data sensitivity: Determine what sensitive data the LLM has access to through its context window, connected tools, and training data
- Assess trust boundaries: Identify where untrusted input (user messages, external data) enters the LLM pipeline and where LLM output is used in trusted contexts
- Evaluate tool permissions: For LLM agents, audit what actions the model can take and whether those permissions follow the principle of least privilege
- Review output handling: Examine how LLM outputs are processed, displayed, and used in downstream systems to identify injection and XSS risks
Threat Modeling for LLM Applications
When threat modeling LLM applications, consider both the direct attack surface (user inputs to the model) and the indirect attack surface (external data sources that feed into the model's context). Indirect prompt injection through retrieved documents, emails, or web content is particularly dangerous because it can be invisible to the user.
# LLM Security Assessment Framework
class LLMSecurityAssessment:
"""Systematic security assessment for LLM-powered applications."""
def __init__(self, app_name):
self.app_name = app_name
self.findings = []
def assess_input_handling(self, config):
"""Check input security controls."""
checks = {
"input_length_limit": config.get("max_input_tokens", 0) > 0,
"input_sanitization": config.get("sanitize_inputs", False),
"system_prompt_isolation": config.get("system_prompt_protected", False),
"user_role_separation": config.get("role_based_prompts", False),
}
for check, passed in checks.items():
if not passed:
self.findings.append({
"category": "Input Handling",
"check": check,
"severity": "HIGH",
"recommendation": f"Enable {check} for {self.app_name}"
})
return checks
def assess_output_handling(self, config):
"""Check output security controls."""
checks = {
"output_filtering": config.get("filter_outputs", False),
"html_escaping": config.get("escape_html", False),
"pii_detection": config.get("detect_pii_in_output", False),
"confidence_thresholds": config.get("min_confidence", 0) > 0,
}
for check, passed in checks.items():
if not passed:
self.findings.append({
"category": "Output Handling",
"check": check,
"severity": "MEDIUM",
"recommendation": f"Implement {check}"
})
return checks
def assess_agent_permissions(self, tools):
"""Assess tool permissions for LLM agents."""
high_risk_actions = ["delete", "send_email", "execute_code",
"modify_database", "transfer_funds"]
for tool in tools:
for action in high_risk_actions:
if action in tool.get("capabilities", []):
self.findings.append({
"category": "Agent Permissions",
"check": f"{tool['name']}/{action}",
"severity": "CRITICAL",
"recommendation": f"Add human-in-the-loop for {action}"
})
def generate_report(self):
"""Generate assessment report."""
print(f"\nSecurity Assessment: {self.app_name}")
print(f"Total findings: {len(self.findings)}")
for f in sorted(self.findings, key=lambda x: x['severity']):
print(f" [{f['severity']}] {f['category']}: {f['check']}")
Best Practices for LLM Security
Implement these foundational security practices for any LLM application:
- Defense in depth: Never rely solely on the model's alignment or system prompt for security. Implement input validation, output filtering, and application-level access controls as independent layers
- Least privilege for agents: LLM agents should have the minimum permissions necessary. Use separate API keys with restricted scopes for each tool the agent can access
- Monitor and log: Log all LLM interactions including inputs, outputs, and tool calls. Monitor for anomalous patterns that may indicate prompt injection or data exfiltration attempts
- Assume breach: Design your application assuming the LLM will eventually be compromised. Ensure that even if an attacker controls the model's output, the damage is contained through application-level controls
Building an LLM Security Program
Establishing a comprehensive LLM security program requires ongoing effort. Start by inventorying all LLM deployments in your organization, classify them by risk level based on data sensitivity and autonomy, and apply appropriate security controls to each. Regular red teaming, prompt injection testing, and security reviews should be integrated into the development lifecycle for any LLM-powered application.
Implementation Checklist
- Inventory all LLM integrations and classify by risk level
- Implement input validation and sanitization for all user-provided prompts
- Add output filtering to detect and prevent data leakage
- Configure rate limiting and usage monitoring for all LLM API endpoints
- Conduct regular prompt injection testing as part of security assessments
- Establish incident response procedures specifically for LLM security events
Summary and Next Steps
The LLM security landscape is broad and rapidly evolving. By understanding the key risk categories, conducting thorough threat modeling, and implementing defense-in-depth strategies, organizations can deploy LLMs with confidence. In the next lesson, we will explore Direct Prompt Injection.
Lilly Tech Systems