Advanced

W&B Launch

Submit, manage, and track ML training jobs across any compute infrastructure from a unified interface.

What is Launch?

W&B Launch lets you package training code into reproducible jobs and run them on any compute backend: local Docker, Kubernetes, AWS SageMaker, or GCP Vertex AI. It separates "what to run" from "where to run it."

Launch Architecture

  1. Create a Launch Job

    Package your training code, dependencies, and configuration into a reproducible job definition.

  2. Configure a Queue

    Set up compute queues that point to specific backends (K8s cluster, cloud provider, local Docker).

  3. Submit to Queue

    Push jobs to a queue with specific resource requirements and hyperparameters.

  4. Launch Agent executes

    A Launch Agent monitors the queue and executes jobs on the configured compute backend.

Setting Up Launch

Terminal — Install and configure Launch
# Install launch dependencies
pip install "wandb[launch]"

# Start a local Launch Agent (Docker backend)
wandb launch-agent --queue default --entity my-team

# Or configure for Kubernetes
wandb launch-agent --queue k8s-gpu \
    --entity my-team \
    --config launch-config.yaml

Launching Jobs

Python — Submit a job programmatically
import wandb

# Launch from a git repository
wandb.launch(
    uri="https://github.com/org/ml-training",
    job="train.py",
    project="my-project",
    entity="my-team",
    queue="gpu-queue",
    resource="kubernetes",
    resource_args={
        "kubernetes": {
            "namespace": "ml-training",
            "resources": {
                "requests": {"nvidia.com/gpu": "1", "memory": "16Gi"},
                "limits": {"nvidia.com/gpu": "1", "memory": "32Gi"},
            }
        }
    },
    config={"learning_rate": 0.001, "epochs": 50}
)

Supported Backends

BackendUse CaseSetup Complexity
Local DockerDevelopment, testingLow
KubernetesProduction, on-prem GPU clustersMedium
AWS SageMakerAWS-native ML workflowsMedium
GCP Vertex AIGCP-native ML workflowsMedium

Launch with Sweeps

Python — Run sweeps via Launch
# Combine Sweeps with Launch for distributed HPO
# Each sweep run is submitted as a Launch job
sweep_config = {
    "method": "bayes",
    "metric": {"name": "val_loss", "goal": "minimize"},
    "parameters": {
        "learning_rate": {"min": 1e-5, "max": 1e-2},
        "batch_size": {"values": [32, 64, 128]},
    },
    "launch": {
        "queue": "gpu-queue",
        "resource": "kubernetes",
    }
}

sweep_id = wandb.sweep(sweep_config, project="launch-sweep")
# Sweep runs are automatically submitted to the queue
Key benefit: Launch lets ML engineers focus on experiments while platform engineers manage infrastructure. Researchers submit jobs to queues without needing to know about Kubernetes, Docker, or cloud APIs.