Advanced

Agent Security

When LLMs gain the ability to use tools, browse the web, execute code, and take autonomous actions, the security stakes increase dramatically. A prompt injection becomes a remote code execution.

The Agent Threat Model

AI agents amplify every LLM vulnerability because they can translate compromised decisions into real-world actions. The combination of natural language understanding with tool access creates a unique class of security risks.

⚠

Excessive Agency (OWASP LLM08): Granting LLMs more permissions, tools, or autonomy than necessary is one of the most dangerous mistakes in AI application design. Every tool is an attack surface. Every permission is a potential escalation path.

Tool Use Risks

Risk	Description	Mitigation
Confused Deputy	Agent performs actions on behalf of attacker using legitimate user's permissions	Per-action authorization checks
Privilege Escalation	Agent accesses tools or data beyond intended scope	Least privilege, tool allowlisting
Side-channel Exfiltration	Agent uses legitimate tools to exfiltrate data (e.g., sending email with stolen info)	Output monitoring, rate limiting
Chain-of-action Attacks	Combining multiple benign tools to achieve malicious goals	Action sequence analysis
Persistent Manipulation	Agent modifying its own instructions or memory via tools	Immutable system prompts, memory isolation

Securing Agent Architectures

Principle of Least Privilege

Give agents access only to the minimum set of tools and permissions required for their specific task. Use scoped API tokens with short expiration. Never grant admin-level access.
Human-in-the-Loop for High-Risk Actions

Require explicit human approval for actions that modify data, send communications, make purchases, or access sensitive resources. The agent should present its plan and wait for confirmation.
Tool Input Validation

Validate all arguments the LLM passes to tools. Do not trust the LLM to construct safe SQL queries, shell commands, or API calls. Apply the same input validation you would for untrusted user input.
Action Rate Limiting and Budgets

Set limits on the number of actions an agent can take per session, the cost of actions, and the rate at which it can invoke tools. This limits blast radius if compromised.

Multi-Agent Security

# Secure multi-agent communication pattern
class SecureAgentOrchestrator:
    def route_message(self, source_agent, target_agent, message):
        # 1. Validate source agent authorization
        if not self.is_authorized(source_agent, target_agent):
            raise SecurityError("Unauthorized agent communication")

        # 2. Sanitize inter-agent messages
        sanitized = self.sanitize_message(message)

        # 3. Log all inter-agent communication
        self.audit_log.record(source_agent, target_agent, sanitized)

        # 4. Apply message size and rate limits
        if self.rate_limit_exceeded(source_agent):
            raise RateLimitError("Agent communication rate exceeded")

        # 5. Deliver with isolation
        return target_agent.receive(sanitized, context="external")

💡

Looking Ahead: In the next lesson, we will cover monitoring — how to build observability and security alerting systems that detect threats and anomalies in LLM applications in real time.

← PreviousData Leakage Next →Monitoring

Agent Security

The Agent Threat Model

Tool Use Risks

Securing Agent Architectures

Principle of Least Privilege

Human-in-the-Loop for High-Risk Actions

Tool Input Validation

Action Rate Limiting and Budgets

Multi-Agent Security