Intermediate

Kubernetes Security for ML Workloads

Kubernetes is the dominant orchestration platform for ML at scale. Securing ML workloads requires proper RBAC, pod security standards, network policies, and GPU-aware scheduling controls.

Pod Security Standards for ML

Kubernetes Pod Security Standards (PSS) define three levels of security. ML workloads should target the restricted profile wherever possible:

Profile ML Use Case Restrictions
Privileged GPU driver installation (DaemonSet only) None — unrestricted access
Baseline Training jobs requiring host networking or IPC Prevents known privilege escalations
Restricted Model inference, data preprocessing, API serving Enforces hardening best practices

RBAC for ML Namespaces

Implement role-based access control that separates concerns across ML teams:

  1. Namespace Isolation

    Create dedicated namespaces for training, inference, and data processing. Apply resource quotas per namespace to prevent a single training job from consuming all cluster GPU resources.

  2. Service Account Scoping

    Each ML workload type should have its own service account with minimal permissions. Training jobs need access to data volumes and model storage. Inference pods need only read access to model artifacts.

  3. Secret Access Control

    Use RBAC to restrict which service accounts can read which secrets. Data pipeline credentials should not be accessible from inference pods.

  4. Audit Logging

    Enable Kubernetes audit logging for all API server requests in ML namespaces. Track who created, modified, or accessed GPU workloads and their associated secrets.

Network Policies for ML

ML clusters have specific network security requirements:

  • Training isolation: Multi-GPU training pods need to communicate with each other (NCCL, Gloo) but should not reach the internet or unrelated services
  • Inference lockdown: Model serving pods should only accept traffic from the API gateway and reach the model storage backend
  • Data pipeline controls: Restrict data preprocessing pods to only access approved data sources and the training namespace
  • Egress filtering: Block outbound internet access from ML pods except for explicitly whitelisted endpoints (model registries, package mirrors)

GPU Scheduling Security

GPU Resource Risks: Kubernetes GPU scheduling uses device plugins that allocate whole GPUs by default. Without proper controls, a pod requesting one GPU could be scheduled on a node with sensitive workloads, creating co-tenancy risks.

Node Affinity

Use node labels and affinity rules to ensure sensitive ML workloads (e.g., training on proprietary data) run on dedicated GPU nodes separate from shared workloads.

Taints and Tolerations

Apply taints to GPU nodes so that only authorized ML workloads with matching tolerations can be scheduled there. Prevents non-ML pods from landing on expensive GPU hardware.

Resource Quotas

Set per-namespace GPU quotas to prevent any single team from monopolizing cluster GPU resources. Use priority classes for critical inference workloads.

MIG Support

On supported GPUs (A100, H100), use NVIDIA MIG to partition GPUs into isolated instances. Each partition has its own memory and compute, providing hardware-level isolation.

💡
Next Up: In the next lesson, we cover image scanning — using Trivy, Snyk, and Grype to detect vulnerabilities in your ML container images before they reach production.