Advanced
W&B Launch
Submit, manage, and track ML training jobs across any compute infrastructure from a unified interface.
What is Launch?
W&B Launch lets you package training code into reproducible jobs and run them on any compute backend: local Docker, Kubernetes, AWS SageMaker, or GCP Vertex AI. It separates "what to run" from "where to run it."
Launch Architecture
Create a Launch Job
Package your training code, dependencies, and configuration into a reproducible job definition.
Configure a Queue
Set up compute queues that point to specific backends (K8s cluster, cloud provider, local Docker).
Submit to Queue
Push jobs to a queue with specific resource requirements and hyperparameters.
Launch Agent executes
A Launch Agent monitors the queue and executes jobs on the configured compute backend.
Setting Up Launch
Terminal — Install and configure Launch
# Install launch dependencies
pip install "wandb[launch]"
# Start a local Launch Agent (Docker backend)
wandb launch-agent --queue default --entity my-team
# Or configure for Kubernetes
wandb launch-agent --queue k8s-gpu \
--entity my-team \
--config launch-config.yaml
Launching Jobs
Python — Submit a job programmatically
import wandb
# Launch from a git repository
wandb.launch(
uri="https://github.com/org/ml-training",
job="train.py",
project="my-project",
entity="my-team",
queue="gpu-queue",
resource="kubernetes",
resource_args={
"kubernetes": {
"namespace": "ml-training",
"resources": {
"requests": {"nvidia.com/gpu": "1", "memory": "16Gi"},
"limits": {"nvidia.com/gpu": "1", "memory": "32Gi"},
}
}
},
config={"learning_rate": 0.001, "epochs": 50}
)
Supported Backends
| Backend | Use Case | Setup Complexity |
|---|---|---|
| Local Docker | Development, testing | Low |
| Kubernetes | Production, on-prem GPU clusters | Medium |
| AWS SageMaker | AWS-native ML workflows | Medium |
| GCP Vertex AI | GCP-native ML workflows | Medium |
Launch with Sweeps
Python — Run sweeps via Launch
# Combine Sweeps with Launch for distributed HPO
# Each sweep run is submitted as a Launch job
sweep_config = {
"method": "bayes",
"metric": {"name": "val_loss", "goal": "minimize"},
"parameters": {
"learning_rate": {"min": 1e-5, "max": 1e-2},
"batch_size": {"values": [32, 64, 128]},
},
"launch": {
"queue": "gpu-queue",
"resource": "kubernetes",
}
}
sweep_id = wandb.sweep(sweep_config, project="launch-sweep")
# Sweep runs are automatically submitted to the queue
Key benefit: Launch lets ML engineers focus on experiments while platform engineers manage infrastructure. Researchers submit jobs to queues without needing to know about Kubernetes, Docker, or cloud APIs.
Lilly Tech Systems