Agent Security
When LLMs gain the ability to use tools, browse the web, execute code, and take autonomous actions, the security stakes increase dramatically. A prompt injection becomes a remote code execution.
The Agent Threat Model
AI agents amplify every LLM vulnerability because they can translate compromised decisions into real-world actions. The combination of natural language understanding with tool access creates a unique class of security risks.
Tool Use Risks
| Risk | Description | Mitigation |
|---|---|---|
| Confused Deputy | Agent performs actions on behalf of attacker using legitimate user's permissions | Per-action authorization checks |
| Privilege Escalation | Agent accesses tools or data beyond intended scope | Least privilege, tool allowlisting |
| Side-channel Exfiltration | Agent uses legitimate tools to exfiltrate data (e.g., sending email with stolen info) | Output monitoring, rate limiting |
| Chain-of-action Attacks | Combining multiple benign tools to achieve malicious goals | Action sequence analysis |
| Persistent Manipulation | Agent modifying its own instructions or memory via tools | Immutable system prompts, memory isolation |
Securing Agent Architectures
-
Principle of Least Privilege
Give agents access only to the minimum set of tools and permissions required for their specific task. Use scoped API tokens with short expiration. Never grant admin-level access.
-
Human-in-the-Loop for High-Risk Actions
Require explicit human approval for actions that modify data, send communications, make purchases, or access sensitive resources. The agent should present its plan and wait for confirmation.
-
Tool Input Validation
Validate all arguments the LLM passes to tools. Do not trust the LLM to construct safe SQL queries, shell commands, or API calls. Apply the same input validation you would for untrusted user input.
-
Action Rate Limiting and Budgets
Set limits on the number of actions an agent can take per session, the cost of actions, and the rate at which it can invoke tools. This limits blast radius if compromised.
Multi-Agent Security
# Secure multi-agent communication pattern
class SecureAgentOrchestrator:
def route_message(self, source_agent, target_agent, message):
# 1. Validate source agent authorization
if not self.is_authorized(source_agent, target_agent):
raise SecurityError("Unauthorized agent communication")
# 2. Sanitize inter-agent messages
sanitized = self.sanitize_message(message)
# 3. Log all inter-agent communication
self.audit_log.record(source_agent, target_agent, sanitized)
# 4. Apply message size and rate limits
if self.rate_limit_exceeded(source_agent):
raise RateLimitError("Agent communication rate exceeded")
# 5. Deliver with isolation
return target_agent.receive(sanitized, context="external")
Lilly Tech Systems