Service Design for AI Systems
Learn how to decompose AI systems into well-bounded microservices with clear API contracts, proper data ownership, and manageable dependency graphs.
Service Boundary Identification
Defining the right service boundaries is the most critical design decision in a microservice architecture. For AI systems, boundaries typically align with these dimensions:
| Boundary Type | Rationale | Example |
|---|---|---|
| Model Domain | Each model type has different lifecycle | Recommendation service, fraud detection service |
| Data Domain | Data ownership and access patterns differ | Customer features service, transaction data service |
| Compute Profile | Different hardware and scaling needs | GPU inference service, CPU preprocessing service |
| Team Ownership | Conway's Law alignment | Search team's ranking service, risk team's scoring service |
API Contract Design
Choose the Right Protocol
Use gRPC for internal service-to-service communication where low latency matters. Use REST with OpenAPI for external-facing APIs and developer portals.
Design for Evolution
Use schema versioning with backward compatibility. Add new fields as optional, never remove or rename existing fields without a deprecation cycle.
Define Clear Request/Response Schemas
Specify input feature schemas, output prediction formats, confidence scores, and metadata. Use protocol buffers or JSON Schema for formal contracts.
Include Health and Metadata Endpoints
Every service should expose health checks, readiness probes, model version information, and feature dependency metadata.
Data Ownership Patterns
Database per Service
Each service owns its data store. Other services access data through APIs only, ensuring loose coupling and independent schema evolution.
Shared Feature Store
A centralized feature store provides computed features to multiple model services, avoiding duplication while maintaining a single source of truth.
Event Sourcing
Services publish data change events that other services consume to build their own read-optimized views, enabling eventual consistency.
CQRS Pattern
Separate read and write models for AI services. Write paths handle feature updates while read paths serve optimized prediction queries.
Dependency Management
Managing dependencies between AI microservices requires careful attention to avoid cascading failures:
- Service Registry: Use service discovery to dynamically resolve service endpoints rather than hardcoding URLs
- Circuit Breakers: Implement circuit breaker patterns to prevent cascading failures when downstream services become unavailable
- Timeout Policies: Set aggressive timeouts for inter-service calls and define fallback behaviors for when predictions are unavailable
- Dependency Graphs: Map and visualize service dependencies to identify critical paths and single points of failure
- Contract Testing: Use consumer-driven contract tests to verify API compatibility between services before deployment
Lilly Tech Systems