Intermediate

Service Design for AI Systems

Learn how to decompose AI systems into well-bounded microservices with clear API contracts, proper data ownership, and manageable dependency graphs.

Service Boundary Identification

Defining the right service boundaries is the most critical design decision in a microservice architecture. For AI systems, boundaries typically align with these dimensions:

Boundary TypeRationaleExample
Model DomainEach model type has different lifecycleRecommendation service, fraud detection service
Data DomainData ownership and access patterns differCustomer features service, transaction data service
Compute ProfileDifferent hardware and scaling needsGPU inference service, CPU preprocessing service
Team OwnershipConway's Law alignmentSearch team's ranking service, risk team's scoring service

API Contract Design

  1. Choose the Right Protocol

    Use gRPC for internal service-to-service communication where low latency matters. Use REST with OpenAPI for external-facing APIs and developer portals.

  2. Design for Evolution

    Use schema versioning with backward compatibility. Add new fields as optional, never remove or rename existing fields without a deprecation cycle.

  3. Define Clear Request/Response Schemas

    Specify input feature schemas, output prediction formats, confidence scores, and metadata. Use protocol buffers or JSON Schema for formal contracts.

  4. Include Health and Metadata Endpoints

    Every service should expose health checks, readiness probes, model version information, and feature dependency metadata.

Design Principle: Each AI microservice should own its model artifacts and feature dependencies. If two services need the same model, consider a shared model serving service rather than duplicating the model across services.

Data Ownership Patterns

Database per Service

Each service owns its data store. Other services access data through APIs only, ensuring loose coupling and independent schema evolution.

Shared Feature Store

A centralized feature store provides computed features to multiple model services, avoiding duplication while maintaining a single source of truth.

Event Sourcing

Services publish data change events that other services consume to build their own read-optimized views, enabling eventual consistency.

CQRS Pattern

Separate read and write models for AI services. Write paths handle feature updates while read paths serve optimized prediction queries.

Dependency Management

Managing dependencies between AI microservices requires careful attention to avoid cascading failures:

  • Service Registry: Use service discovery to dynamically resolve service endpoints rather than hardcoding URLs
  • Circuit Breakers: Implement circuit breaker patterns to prevent cascading failures when downstream services become unavailable
  • Timeout Policies: Set aggressive timeouts for inter-service calls and define fallback behaviors for when predictions are unavailable
  • Dependency Graphs: Map and visualize service dependencies to identify critical paths and single points of failure
  • Contract Testing: Use consumer-driven contract tests to verify API compatibility between services before deployment
💡
Looking Ahead: In the next lesson, we will explore model serving in depth, covering how to package models as services, optimize inference performance, and manage GPU resources efficiently.