Advanced

W&B Launch

Submit, manage, and track ML training jobs across any compute infrastructure from a unified interface.

What is Launch?

W&B Launch lets you package training code into reproducible jobs and run them on any compute backend: local Docker, Kubernetes, AWS SageMaker, or GCP Vertex AI. It separates "what to run" from "where to run it."

Launch Architecture

Create a Launch Job
Package your training code, dependencies, and configuration into a reproducible job definition.
Configure a Queue
Set up compute queues that point to specific backends (K8s cluster, cloud provider, local Docker).
Submit to Queue
Push jobs to a queue with specific resource requirements and hyperparameters.
Launch Agent executes
A Launch Agent monitors the queue and executes jobs on the configured compute backend.

Setting Up Launch

Terminal — Install and configure Launch

# Install launch dependencies
pip install "wandb[launch]"

# Start a local Launch Agent (Docker backend)
wandb launch-agent --queue default --entity my-team

# Or configure for Kubernetes
wandb launch-agent --queue k8s-gpu \
    --entity my-team \
    --config launch-config.yaml

Launching Jobs

Python — Submit a job programmatically

import wandb

# Launch from a git repository
wandb.launch(
    uri="https://github.com/org/ml-training",
    job="train.py",
    project="my-project",
    entity="my-team",
    queue="gpu-queue",
    resource="kubernetes",
    resource_args={
        "kubernetes": {
            "namespace": "ml-training",
            "resources": {
                "requests": {"nvidia.com/gpu": "1", "memory": "16Gi"},
                "limits": {"nvidia.com/gpu": "1", "memory": "32Gi"},
            }
        }
    },
    config={"learning_rate": 0.001, "epochs": 50}
)

Supported Backends

Backend	Use Case	Setup Complexity
Local Docker	Development, testing	Low
Kubernetes	Production, on-prem GPU clusters	Medium
AWS SageMaker	AWS-native ML workflows	Medium
GCP Vertex AI	GCP-native ML workflows	Medium

Launch with Sweeps

Python — Run sweeps via Launch

# Combine Sweeps with Launch for distributed HPO
# Each sweep run is submitted as a Launch job
sweep_config = {
    "method": "bayes",
    "metric": {"name": "val_loss", "goal": "minimize"},
    "parameters": {
        "learning_rate": {"min": 1e-5, "max": 1e-2},
        "batch_size": {"values": [32, 64, 128]},
    },
    "launch": {
        "queue": "gpu-queue",
        "resource": "kubernetes",
    }
}

sweep_id = wandb.sweep(sweep_config, project="launch-sweep")
# Sweep runs are automatically submitted to the queue

✅

Key benefit: Launch lets ML engineers focus on experiments while platform engineers manage infrastructure. Researchers submit jobs to queues without needing to know about Kubernetes, Docker, or cloud APIs.

← Previous Sweeps Advanced Next → Best Practices

W&B Launch

What is Launch?

Launch Architecture

Create a Launch Job

Configure a Queue

Submit to Queue

Launch Agent executes

Setting Up Launch

Launching Jobs

Supported Backends

Launch with Sweeps