GPU & TPU Acceleration Intermediate

One of Colab's most powerful features is free access to GPUs and TPUs for accelerating machine learning workloads. Learn how to enable GPU/TPU, check hardware specifications, train models on accelerators, and manage runtime limits.

Enabling GPU

By default, Colab uses a CPU runtime. To enable GPU acceleration:

  1. Open Runtime Settings

    Go to Runtime → Change runtime type from the menu bar.

  2. Select GPU

    In the "Hardware accelerator" dropdown, select GPU (or TPU for tensor processing).

  3. Save and Connect

    Click Save. Colab will restart the runtime and connect you to a GPU-enabled virtual machine.

💡
Note: Changing the runtime type will restart your session. All variables and installed packages will be lost. Run your setup cells again after switching.

GPU Types Available

GPU VRAM Tier Best For
NVIDIA T4 16 GB Free / Pro Inference, fine-tuning small models, general ML
NVIDIA V100 16 GB Pro / Pro+ Training medium models, faster computation
NVIDIA A100 40 GB Pro+ / Enterprise Training large models, LLM fine-tuning
NVIDIA L4 24 GB Pro / Pro+ Inference optimization, efficient training

Checking Your GPU

Verify your GPU allocation and check hardware details:

# Check GPU with nvidia-smi
!nvidia-smi

# Check GPU in Python
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
print(f"GPU Name: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# TensorFlow GPU check
import tensorflow as tf
print(f"TF GPUs: {tf.config.list_physical_devices('GPU')}")

TPU Access

Google's Tensor Processing Units (TPUs) are custom accelerators designed for machine learning:

# Check TPU availability
import os
tpu_address = os.environ.get('COLAB_TPU_ADDR')
print(f"TPU Address: {tpu_address}")

# Using TPU with TensorFlow
import tensorflow as tf
resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)

print(f"Number of TPU cores: {strategy.num_replicas_in_sync}")

CUDA and cuDNN

Colab comes with CUDA and cuDNN pre-installed and configured:

# Check CUDA version
!nvcc --version

# Check cuDNN version
!cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

# Verify CUDA works with PyTorch
import torch
print(f"CUDA Version: {torch.version.cuda}")
print(f"cuDNN Version: {torch.backends.cudnn.version()}")
print(f"cuDNN Enabled: {torch.backends.cudnn.enabled}")

Training Models on GPU

PyTorch on GPU

import torch
import torch.nn as nn

# Move model and data to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).to(device)

# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Move data to GPU
x = torch.randn(64, 784).to(device)
y = torch.randint(0, 10, (64,)).to(device)

# Forward pass
output = model(x)
loss = criterion(output, y)
loss.backward()
optimizer.step()
print(f"Loss: {loss.item():.4f}")

TensorFlow on GPU

import tensorflow as tf

# TensorFlow automatically uses GPU when available
print(f"GPUs: {tf.config.list_physical_devices('GPU')}")

# Build and train a model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training will automatically use GPU
import numpy as np
x_train = np.random.randn(1000, 784)
y_train = np.random.randint(0, 10, 1000)
model.fit(x_train, y_train, epochs=5, batch_size=32)

Runtime Limits

Tier Max Session Idle Timeout GPU Priority
Free ~12 hours ~90 minutes Standard (may be preempted)
Pro ~24 hours ~90 minutes Higher priority, background execution
Pro+ ~24 hours ~90 minutes Highest priority, background execution

Memory Management

GPU memory is limited. Use these techniques to avoid out-of-memory errors:

# Check GPU memory usage
!nvidia-smi

# Clear GPU cache (PyTorch)
torch.cuda.empty_cache()

# Monitor memory in Python
print(f"Allocated: {torch.cuda.memory_allocated()/1e9:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved()/1e9:.2f} GB")

# Use mixed precision for lower memory usage
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

with autocast():
    output = model(x)
    loss = criterion(output, y)

# Reduce batch size if running out of memory
# Use gradient accumulation for effective larger batches
Tip: If you run out of GPU memory, try: (1) reducing batch size, (2) using mixed precision training, (3) clearing cache with torch.cuda.empty_cache(), or (4) restarting the runtime from Runtime → Restart runtime.