Advanced

Agent Security

When LLMs gain the ability to use tools, browse the web, execute code, and take autonomous actions, the security stakes increase dramatically. A prompt injection becomes a remote code execution.

The Agent Threat Model

AI agents amplify every LLM vulnerability because they can translate compromised decisions into real-world actions. The combination of natural language understanding with tool access creates a unique class of security risks.

Excessive Agency (OWASP LLM08): Granting LLMs more permissions, tools, or autonomy than necessary is one of the most dangerous mistakes in AI application design. Every tool is an attack surface. Every permission is a potential escalation path.

Tool Use Risks

Risk Description Mitigation
Confused Deputy Agent performs actions on behalf of attacker using legitimate user's permissions Per-action authorization checks
Privilege Escalation Agent accesses tools or data beyond intended scope Least privilege, tool allowlisting
Side-channel Exfiltration Agent uses legitimate tools to exfiltrate data (e.g., sending email with stolen info) Output monitoring, rate limiting
Chain-of-action Attacks Combining multiple benign tools to achieve malicious goals Action sequence analysis
Persistent Manipulation Agent modifying its own instructions or memory via tools Immutable system prompts, memory isolation

Securing Agent Architectures

  1. Principle of Least Privilege

    Give agents access only to the minimum set of tools and permissions required for their specific task. Use scoped API tokens with short expiration. Never grant admin-level access.

  2. Human-in-the-Loop for High-Risk Actions

    Require explicit human approval for actions that modify data, send communications, make purchases, or access sensitive resources. The agent should present its plan and wait for confirmation.

  3. Tool Input Validation

    Validate all arguments the LLM passes to tools. Do not trust the LLM to construct safe SQL queries, shell commands, or API calls. Apply the same input validation you would for untrusted user input.

  4. Action Rate Limiting and Budgets

    Set limits on the number of actions an agent can take per session, the cost of actions, and the rate at which it can invoke tools. This limits blast radius if compromised.

Multi-Agent Security

# Secure multi-agent communication pattern
class SecureAgentOrchestrator:
    def route_message(self, source_agent, target_agent, message):
        # 1. Validate source agent authorization
        if not self.is_authorized(source_agent, target_agent):
            raise SecurityError("Unauthorized agent communication")

        # 2. Sanitize inter-agent messages
        sanitized = self.sanitize_message(message)

        # 3. Log all inter-agent communication
        self.audit_log.record(source_agent, target_agent, sanitized)

        # 4. Apply message size and rate limits
        if self.rate_limit_exceeded(source_agent):
            raise RateLimitError("Agent communication rate exceeded")

        # 5. Deliver with isolation
        return target_agent.receive(sanitized, context="external")
💡
Looking Ahead: In the next lesson, we will cover monitoring — how to build observability and security alerting systems that detect threats and anomalies in LLM applications in real time.