Advanced

Best Practices

Production-ready patterns for Docker deployment, health checks, structured logging, monitoring, testing, and scaling your FastAPI ML service.

Docker Deployment

Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies first (cached layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY app/ ./app/
COPY models/ ./models/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Health Checks

Python
@app.get("/health")
async def health():
    return {"status": "healthy"}

@app.get("/health/ready")
async def readiness():
    if not ml_models.get("classifier"):
        raise HTTPException(503, "Model not loaded")
    return {"status": "ready", "models": list(ml_models.keys())}

Structured Logging

Python
import logging, time
from fastapi import Request

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration = time.time() - start
    logging.info(
        f"{request.method} {request.url.path} "
        f"status={response.status_code} "
        f"duration={duration:.3f}s"
    )
    return response

Testing

Python
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_health():
    response = client.get("/health")
    assert response.status_code == 200

def test_predict():
    response = client.post(
        "/predict",
        json={"text": "test input"},
        headers={"X-API-Key": "test-key"}
    )
    assert response.status_code == 200
    assert "prediction" in response.json()

Production Checklist

🔨

Workers

Use multiple Uvicorn workers: --workers 4. Set to 2x CPU cores for I/O-bound, 1x for CPU-bound ML.

📊

Monitoring

Export metrics with Prometheus. Track latency, error rates, model inference time, and queue depth.

🔒

Security

Enable HTTPS, use CORS, validate all inputs, rate limit, and never expose internal errors.

🚀

Scaling

Use Kubernetes or cloud auto-scaling. Separate model loading from serving for faster cold starts.

Key takeaway: FastAPI is production-ready out of the box, but always add health checks, structured logging, input validation, and proper error handling before deploying ML APIs.

Course Complete!

Congratulations! You can now serve ML models as production-grade APIs with FastAPI, including streaming, authentication, and Docker deployment.