Advanced

Building a Complete Agent

Walk through building a fully functional AI agent from scratch. We cover requirements, LLM selection, tool design, memory, testing, deployment, monitoring, and safety guardrails.

Step 1: Requirements Gathering

Before writing code, define what your agent needs to do:

  • Goal: What task(s) should the agent accomplish?
  • Scope: What are the boundaries? What should it NOT do?
  • Tools needed: What external systems must it interact with?
  • User interaction: Fully autonomous or human-in-the-loop?
  • Quality bar: How good does it need to be? What is the cost of errors?
  • Volume: How many tasks per day? Concurrent users?

Step 2: Choosing the LLM

RequirementRecommended Model
Complex reasoning, high accuracyClaude Opus 4, o3
Good balance, most use casesClaude Sonnet 4, GPT-4o
High volume, cost-sensitiveGPT-4o mini, Gemini Flash
Privacy-critical, self-hostedLlama 3.3 70B, Qwen 2.5 72B

Step 3: Complete Agent Implementation

Python - Complete Research Agent
import anthropic
import json
from datetime import datetime

class ResearchAgent:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.model = "claude-sonnet-4-20250514"
        self.max_steps = 15
        self.tools = self._define_tools()
        self.system_prompt = """You are a research agent.
Given a research question, use your tools to find
information, analyze it, and produce a report.

Always cite your sources. If you cannot find reliable
information, say so rather than speculating."""

    def _define_tools(self):
        return [
            {
                "name": "web_search",
                "description": "Search the web for information",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string"}
                    },
                    "required": ["query"]
                }
            },
            {
                "name": "read_url",
                "description": "Read content from a URL",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string"}
                    },
                    "required": ["url"]
                }
            },
            {
                "name": "save_report",
                "description": "Save the final research report",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "title": {"type": "string"},
                        "content": {"type": "string"}
                    },
                    "required": ["title", "content"]
                }
            }
        ]

    def run(self, question):
        """Execute the research agent loop."""
        messages = [{
            "role": "user",
            "content": f"Research this: {question}"
        }]

        for step in range(self.max_steps):
            response = self.client.messages.create(
                model=self.model,
                max_tokens=4096,
                system=self.system_prompt,
                tools=self.tools,
                messages=messages
            )

            # Add assistant response to history
            messages.append({
                "role": "assistant",
                "content": response.content
            })

            # If no tool use, agent is done
            if response.stop_reason == "end_turn":
                return self._extract_text(response)

            # Execute each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = self._execute(block)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            messages.append({
                "role": "user",
                "content": tool_results
            })

        return "Max steps reached"

# Usage
agent = ResearchAgent()
report = agent.run("Latest advances in AI agents")
print(report)

Step 4: Testing Agents

Agent testing is harder than traditional software testing because behavior is non-deterministic:

  • Unit test tools: Test each tool function independently with known inputs and expected outputs
  • Scenario testing: Run the agent against a set of predefined scenarios and check outcomes
  • Regression testing: Keep a suite of tasks the agent should handle and verify after changes
  • Adversarial testing: Try edge cases, ambiguous inputs, and adversarial prompts
  • Cost monitoring: Track token usage per task to catch cost regressions

Step 5: Deployment

  • API server: Wrap the agent in a REST API (FastAPI, Flask) for integration
  • Queue-based: For async tasks, use a job queue (Redis, SQS) to process requests
  • Containerized: Docker containers for consistent deployment
  • Serverless: AWS Lambda or Google Cloud Functions for event-driven agents

Step 6: Safety Guardrails

Essential safety measures for production agents:
  • Action whitelists: Explicitly define allowed actions. Deny by default.
  • Rate limits: Cap the number of actions per minute and per task
  • Budget limits: Set maximum token/cost budget per task
  • Sandboxing: Execute code in sandboxed environments (Docker, gVisor)
  • Human approval: Require human confirmation for high-risk actions (delete, send, purchase)
  • Audit logging: Log every action for review and debugging

Step 7: Monitoring

In production, monitor these metrics:

  • Task completion rate: Percentage of tasks successfully completed
  • Average steps per task: Efficiency metric; fewer steps is better
  • Cost per task: Token usage and API costs
  • Error rate: Tool failures, LLM errors, timeout rates
  • Latency: Time from request to completion
  • User satisfaction: Ratings, feedback, escalation rates