Advanced
Best Practices
Production-ready patterns for Docker deployment, health checks, structured logging, monitoring, testing, and scaling your FastAPI ML service.
Docker Deployment
Dockerfile
FROM python:3.11-slim WORKDIR /app # Install dependencies first (cached layer) COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy application code COPY app/ ./app/ COPY models/ ./models/ EXPOSE 8000 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Health Checks
Python
@app.get("/health") async def health(): return {"status": "healthy"} @app.get("/health/ready") async def readiness(): if not ml_models.get("classifier"): raise HTTPException(503, "Model not loaded") return {"status": "ready", "models": list(ml_models.keys())}
Structured Logging
Python
import logging, time from fastapi import Request @app.middleware("http") async def log_requests(request: Request, call_next): start = time.time() response = await call_next(request) duration = time.time() - start logging.info( f"{request.method} {request.url.path} " f"status={response.status_code} " f"duration={duration:.3f}s" ) return response
Testing
Python
from fastapi.testclient import TestClient from app.main import app client = TestClient(app) def test_health(): response = client.get("/health") assert response.status_code == 200 def test_predict(): response = client.post( "/predict", json={"text": "test input"}, headers={"X-API-Key": "test-key"} ) assert response.status_code == 200 assert "prediction" in response.json()
Production Checklist
Workers
Use multiple Uvicorn workers: --workers 4. Set to 2x CPU cores for I/O-bound, 1x for CPU-bound ML.
Monitoring
Export metrics with Prometheus. Track latency, error rates, model inference time, and queue depth.
Security
Enable HTTPS, use CORS, validate all inputs, rate limit, and never expose internal errors.
Scaling
Use Kubernetes or cloud auto-scaling. Separate model loading from serving for faster cold starts.
Key takeaway: FastAPI is production-ready out of the box, but always add health checks, structured logging, input validation, and proper error handling before deploying ML APIs.
Course Complete!
Congratulations! You can now serve ML models as production-grade APIs with FastAPI, including streaming, authentication, and Docker deployment.
Lilly Tech Systems