Beginner
Project Setup
Architecture overview, FastAPI and Redis setup, and project scaffolding.
What We Are Building
A centralized gateway that sits between your applications and LLM providers:
- Route requests to multiple LLM providers with fallback
- Enforce per-user and per-team rate limits
- Cache semantically similar requests to reduce costs
- Track costs and enforce department budgets
- Provide an admin dashboard for monitoring
Architecture
Client Request
|
v
+-------------------+ +-------------------+
| API Gateway | --> | Rate Limiter |
| (FastAPI) | | (Redis) |
+-------------------+ +-------------------+
| |
v v
+-------------------+ +-------------------+
| Semantic Cache | | Provider Router |
| (Embeddings) | | (OpenAI/Anthropic)|
+-------------------+ +-------------------+
| |
v v
+-------------------+ +-------------------+
| Cost Tracker | | Admin Dashboard |
| (PostgreSQL) | | (FastAPI + HTML) |
+-------------------+ +-------------------+
Tech Stack
FastAPI
High-performance async API framework. Handles proxying requests to LLM providers.
Redis
Rate limiting with sliding windows and semantic cache storage.
PostgreSQL
Cost tracking, usage logs, and budget management.
OpenAI + Anthropic SDKs
Official SDKs for routing requests to multiple LLM providers.
Step 1: Create the Project
mkdir ai-gateway && cd ai-gateway
python -m venv venv && source venv/bin/activate
pip install fastapi uvicorn redis openai anthropic numpy psycopg2-binary sqlalchemy
Step 2: Project Structure
ai-gateway/
src/
main.py # FastAPI app entry point
router.py # Provider routing (Lesson 2)
limiter.py # Rate limiting (Lesson 3)
cache.py # Semantic caching (Lesson 4)
costs.py # Cost tracking (Lesson 5)
dashboard.py # Admin dashboard (Lesson 6)
models.py # Pydantic models
config.yaml
requirements.txt
Step 3: Configuration
# config.yaml
providers:
openai:
api_key: ${OPENAI_API_KEY}
models: [gpt-4o, gpt-4o-mini]
priority: 1
anthropic:
api_key: ${ANTHROPIC_API_KEY}
models: [claude-sonnet-4-20250514, claude-3-5-haiku-20241022]
priority: 2
rate_limits:
default:
requests_per_minute: 60
tokens_per_minute: 100000
cache:
enabled: true
similarity_threshold: 0.95
ttl_seconds: 3600
redis:
url: redis://localhost:6379
database:
url: postgresql://user:pass@localhost:5432/gateway
Step 4: Base Server
# src/main.py
from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
import redis.asyncio as redis
@asynccontextmanager
async def lifespan(app):
app.state.redis = redis.from_url("redis://localhost:6379")
yield
await app.state.redis.close()
app = FastAPI(title="AI API Gateway", lifespan=lifespan)
@app.get("/health")
async def health():
return {"status": "ok"}
@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
body = await request.json()
return {"message": "Gateway ready", "model": body.get("model")}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)
Next: We will build multi-provider routing with automatic fallback chains.
Lilly Tech Systems