Core Concepts
The fundamental building blocks of Kubernetes — Pods, Deployments, Services, and resource management — applied to machine learning workloads.
Pods: The Smallest Unit
A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share the same network namespace and storage volumes. For ML workloads, a Pod typically runs a single training script or inference server.
# Example: Pod running a PyTorch training container
apiVersion: v1
kind: Pod
metadata:
name: pytorch-training
labels:
app: ml-training
framework: pytorch
spec:
containers:
- name: trainer
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
command: ["python", "train.py"]
resources:
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
Deployments: Managing Replicas
A Deployment manages a set of identical Pod replicas. It handles rolling updates, rollbacks, and scaling. For AI, Deployments are ideal for model serving — running multiple replicas of an inference server behind a load balancer.
# Example: Deployment for model serving
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
replicas: 3
selector:
matchLabels:
app: inference-server
template:
metadata:
labels:
app: inference-server
spec:
containers:
- name: server
image: myregistry/bert-serving:v1.2
ports:
- containerPort: 8080
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
Key Deployment Features for ML
- Rolling updates — Deploy new model versions without downtime. Gradually replace old Pods with new ones.
- Rollbacks — If a new model version performs poorly, roll back to the previous version with one command.
- Scaling — Scale up replicas during peak inference load, scale down during quiet periods.
- Readiness probes — Essential for ML: models take time to load into memory. The probe ensures traffic is only sent to Pods that have finished loading.
Services: Exposing Workloads
A Service provides a stable network endpoint for a set of Pods. Pods are ephemeral (they can be created and destroyed), but Services provide a consistent IP and DNS name.
Service Types
- ClusterIP (default) — Internal cluster access only. Use for internal model APIs that other services call.
- NodePort — Exposes on a static port on each node. Use for development and testing.
- LoadBalancer — Provisions an external load balancer (cloud providers). Use for production model endpoints.
# Example: Service for model serving
apiVersion: v1
kind: Service
metadata:
name: model-api
spec:
selector:
app: inference-server
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
Namespaces: Isolation for Teams
Namespaces provide logical isolation within a cluster. In ML organizations, you typically have separate namespaces for different teams or environments.
ml-training— For training jobs with GPU accessml-serving— For production inference endpointsml-experiments— For data scientists running experimentsdata-pipeline— For ETL and data preprocessing
Resource Management
Resource management is critical for ML workloads because training jobs consume significant CPU, memory, and GPU resources.
Requests vs Limits
- Requests — The minimum resources the container needs. The scheduler uses this to find a suitable node.
- Limits — The maximum resources the container can use. If exceeded, the container is throttled (CPU) or killed (memory).
ResourceQuotas
ResourceQuotas limit the total resources a namespace can consume. This prevents a single team from monopolizing the cluster.
# Example: ResourceQuota for ML training namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: ml-training-quota
namespace: ml-training
spec:
hard:
requests.cpu: "32"
requests.memory: "128Gi"
limits.cpu: "64"
limits.memory: "256Gi"
pods: "20"
Labels and Selectors
Labels are key-value pairs attached to objects. Selectors filter objects by labels. For ML, use labels to organize workloads by framework, team, experiment, or model version.
framework: pytorchorframework: tensorflowworkload-type: trainingorworkload-type: inferencemodel-version: v1.2.3team: nlporteam: computer-vision
Practice Questions
A) Liveness probe
B) Readiness probe
C) Startup probe
D) Init container
Show Answer
B) Readiness probe. A readiness probe determines when a Pod is ready to receive traffic. Until the probe succeeds, the Pod is removed from Service endpoints. This is essential for ML serving because models need time to load into memory. A liveness probe checks if the container is alive (restart if not), which is different.
A) LimitRange
B) ResourceQuota
C) PriorityClass
D) NetworkPolicy
Show Answer
B) ResourceQuota. A ResourceQuota limits the total aggregate resources that can be consumed in a namespace. LimitRange sets default and max resources per Pod/container, not total namespace limits. PriorityClass controls Pod scheduling priority, and NetworkPolicy controls network traffic.
A) Job
B) DaemonSet
C) Deployment
D) StatefulSet
Show Answer
C) Deployment. Deployments support rolling updates by default, gradually replacing old Pods with new ones to maintain availability. Jobs are for batch tasks, DaemonSets run one Pod per node, and StatefulSets are for stateful applications that need stable identities.
A) The CPU limit is too low
B) The memory limit is too low
C) The readiness probe is failing
D) The node has no GPU
Show Answer
B) The memory limit is too low. OOMKilled (Out Of Memory Killed) means the container exceeded its memory limit and was terminated by the kubelet. The fix is to increase the memory limit in the Pod spec. ML training jobs often need large amounts of memory for loading datasets and model parameters.
A) ClusterIP
B) NodePort
C) LoadBalancer
D) ExternalName
Show Answer
C) LoadBalancer. On cloud providers, a LoadBalancer Service automatically provisions an external load balancer with a public IP, making it the standard choice for production APIs. ClusterIP is internal only, NodePort exposes on a static high port (not ideal for production), and ExternalName is for mapping to external DNS names.
Lilly Tech Systems