Intermediate

Service Design for AI Systems

Learn how to decompose AI systems into well-bounded microservices with clear API contracts, proper data ownership, and manageable dependency graphs.

Service Boundary Identification

Defining the right service boundaries is the most critical design decision in a microservice architecture. For AI systems, boundaries typically align with these dimensions:

Boundary Type	Rationale	Example
Model Domain	Each model type has different lifecycle	Recommendation service, fraud detection service
Data Domain	Data ownership and access patterns differ	Customer features service, transaction data service
Compute Profile	Different hardware and scaling needs	GPU inference service, CPU preprocessing service
Team Ownership	Conway's Law alignment	Search team's ranking service, risk team's scoring service

API Contract Design

Choose the Right Protocol
Use gRPC for internal service-to-service communication where low latency matters. Use REST with OpenAPI for external-facing APIs and developer portals.
Design for Evolution
Use schema versioning with backward compatibility. Add new fields as optional, never remove or rename existing fields without a deprecation cycle.
Define Clear Request/Response Schemas
Specify input feature schemas, output prediction formats, confidence scores, and metadata. Use protocol buffers or JSON Schema for formal contracts.
Include Health and Metadata Endpoints
Every service should expose health checks, readiness probes, model version information, and feature dependency metadata.

✅

Design Principle: Each AI microservice should own its model artifacts and feature dependencies. If two services need the same model, consider a shared model serving service rather than duplicating the model across services.

Data Ownership Patterns

Database per Service

Each service owns its data store. Other services access data through APIs only, ensuring loose coupling and independent schema evolution.

Shared Feature Store

A centralized feature store provides computed features to multiple model services, avoiding duplication while maintaining a single source of truth.

Event Sourcing

Services publish data change events that other services consume to build their own read-optimized views, enabling eventual consistency.

CQRS Pattern

Separate read and write models for AI services. Write paths handle feature updates while read paths serve optimized prediction queries.

Dependency Management

Managing dependencies between AI microservices requires careful attention to avoid cascading failures:

Service Registry: Use service discovery to dynamically resolve service endpoints rather than hardcoding URLs
Circuit Breakers: Implement circuit breaker patterns to prevent cascading failures when downstream services become unavailable
Timeout Policies: Set aggressive timeouts for inter-service calls and define fallback behaviors for when predictions are unavailable
Dependency Graphs: Map and visualize service dependencies to identify critical paths and single points of failure
Contract Testing: Use consumer-driven contract tests to verify API compatibility between services before deployment

💡

Looking Ahead: In the next lesson, we will explore model serving in depth, covering how to package models as services, optimize inference performance, and manage GPU resources efficiently.

← PreviousIntroduction Next →Model Serving

Service Design for AI Systems

Service Boundary Identification

API Contract Design

Choose the Right Protocol

Design for Evolution

Define Clear Request/Response Schemas

Include Health and Metadata Endpoints

Data Ownership Patterns

Database per Service

Shared Feature Store

Event Sourcing

CQRS Pattern

Dependency Management