Intermediate
Configuring Ray Clusters
Configure head and worker nodes, set up Ray autoscaling with KubeRay, manage GPU resources, and build heterogeneous clusters with multiple worker groups.
Autoscaling Configuration
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: autoscaling-cluster
spec:
rayVersion: "2.9.0"
enableInTreeAutoscaling: true
autoscalerOptions:
upscalingMode: Default
idleTimeoutSeconds: 300
headGroupSpec:
rayStartParams:
num-cpus: "0" # Don't schedule tasks on head
template:
spec:
containers:
- name: ray-head
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "2"
memory: "4Gi"
workerGroupSpecs:
- groupName: gpu-workers
replicas: 1
minReplicas: 0
maxReplicas: 10
rayStartParams:
num-gpus: "1"
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.9.0-gpu
resources:
limits:
cpu: "8"
memory: "32Gi"
nvidia.com/gpu: 1
Heterogeneous Clusters
Define multiple worker groups with different resource profiles for varied workloads:
workerGroupSpecs:
- groupName: cpu-workers # For data processing
replicas: 4
minReplicas: 2
maxReplicas: 20
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.9.0
resources:
limits:
cpu: "16"
memory: "64Gi"
- groupName: gpu-workers # For training/inference
replicas: 2
minReplicas: 0
maxReplicas: 8
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray:2.9.0-gpu
resources:
limits:
cpu: "8"
memory: "64Gi"
nvidia.com/gpu: 4
Autoscaling behavior: Ray's autoscaler works with Kubernetes: Ray requests more workers based on pending tasks, KubeRay creates pods, and the Kubernetes cluster autoscaler provisions nodes if needed. Scale-down happens after the idle timeout.
Cost optimization: Set
minReplicas: 0 for GPU workers so they scale to zero when idle. Use num-cpus: "0" on the head node to prevent it from running compute tasks, keeping it available for cluster management.
Lilly Tech Systems