Beginner

Project Setup

Architecture overview, FastAPI and Redis setup, and project scaffolding.

What We Are Building

A centralized gateway that sits between your applications and LLM providers:

Route requests to multiple LLM providers with fallback
Enforce per-user and per-team rate limits
Cache semantically similar requests to reduce costs
Track costs and enforce department budgets
Provide an admin dashboard for monitoring

Architecture

Client Request
    |
    v
+-------------------+     +-------------------+
| API Gateway       | --> | Rate Limiter      |
| (FastAPI)         |     | (Redis)           |
+-------------------+     +-------------------+
    |                         |
    v                         v
+-------------------+     +-------------------+
| Semantic Cache    |     | Provider Router   |
| (Embeddings)      |     | (OpenAI/Anthropic)|
+-------------------+     +-------------------+
    |                         |
    v                         v
+-------------------+     +-------------------+
| Cost Tracker      |     | Admin Dashboard   |
| (PostgreSQL)      |     | (FastAPI + HTML)  |
+-------------------+     +-------------------+

Tech Stack

FastAPI

High-performance async API framework. Handles proxying requests to LLM providers.

Redis

Rate limiting with sliding windows and semantic cache storage.

PostgreSQL

Cost tracking, usage logs, and budget management.

OpenAI + Anthropic SDKs

Official SDKs for routing requests to multiple LLM providers.

Step 1: Create the Project

mkdir ai-gateway && cd ai-gateway
python -m venv venv && source venv/bin/activate
pip install fastapi uvicorn redis openai anthropic numpy psycopg2-binary sqlalchemy

Step 2: Project Structure

ai-gateway/
  src/
    main.py              # FastAPI app entry point
    router.py            # Provider routing (Lesson 2)
    limiter.py           # Rate limiting (Lesson 3)
    cache.py             # Semantic caching (Lesson 4)
    costs.py             # Cost tracking (Lesson 5)
    dashboard.py         # Admin dashboard (Lesson 6)
    models.py            # Pydantic models
  config.yaml
  requirements.txt

Step 3: Configuration

# config.yaml
providers:
  openai:
    api_key: ${OPENAI_API_KEY}
    models: [gpt-4o, gpt-4o-mini]
    priority: 1
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    models: [claude-sonnet-4-20250514, claude-3-5-haiku-20241022]
    priority: 2

rate_limits:
  default:
    requests_per_minute: 60
    tokens_per_minute: 100000

cache:
  enabled: true
  similarity_threshold: 0.95
  ttl_seconds: 3600

redis:
  url: redis://localhost:6379

database:
  url: postgresql://user:pass@localhost:5432/gateway

Step 4: Base Server

# src/main.py
from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
import redis.asyncio as redis

@asynccontextmanager
async def lifespan(app):
    app.state.redis = redis.from_url("redis://localhost:6379")
    yield
    await app.state.redis.close()

app = FastAPI(title="AI API Gateway", lifespan=lifespan)

@app.get("/health")
async def health():
    return {"status": "ok"}

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    body = await request.json()
    return {"message": "Gateway ready", "model": body.get("model")}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8080)

💡

Next: We will build multi-provider routing with automatic fallback chains.

Next →Multi-Provider Routing