GPU & TPU Acceleration Intermediate

One of Colab's most powerful features is free access to GPUs and TPUs for accelerating machine learning workloads. Learn how to enable GPU/TPU, check hardware specifications, train models on accelerators, and manage runtime limits.

Enabling GPU

By default, Colab uses a CPU runtime. To enable GPU acceleration:

Open Runtime Settings

Go to Runtime → Change runtime type from the menu bar.
Select GPU

In the "Hardware accelerator" dropdown, select GPU (or TPU for tensor processing).
Save and Connect

Click Save. Colab will restart the runtime and connect you to a GPU-enabled virtual machine.

💡

Note: Changing the runtime type will restart your session. All variables and installed packages will be lost. Run your setup cells again after switching.

GPU Types Available

GPU	VRAM	Tier	Best For
NVIDIA T4	16 GB	Free / Pro	Inference, fine-tuning small models, general ML
NVIDIA V100	16 GB	Pro / Pro+	Training medium models, faster computation
NVIDIA A100	40 GB	Pro+ / Enterprise	Training large models, LLM fine-tuning
NVIDIA L4	24 GB	Pro / Pro+	Inference optimization, efficient training

Checking Your GPU

Verify your GPU allocation and check hardware details:

# Check GPU with nvidia-smi
!nvidia-smi

# Check GPU in Python
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
print(f"GPU Name: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")

# TensorFlow GPU check
import tensorflow as tf
print(f"TF GPUs: {tf.config.list_physical_devices('GPU')}")

TPU Access

Google's Tensor Processing Units (TPUs) are custom accelerators designed for machine learning:

# Check TPU availability
import os
tpu_address = os.environ.get('COLAB_TPU_ADDR')
print(f"TPU Address: {tpu_address}")

# Using TPU with TensorFlow
import tensorflow as tf
resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)

print(f"Number of TPU cores: {strategy.num_replicas_in_sync}")

CUDA and cuDNN

Colab comes with CUDA and cuDNN pre-installed and configured:

# Check CUDA version
!nvcc --version

# Check cuDNN version
!cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

# Verify CUDA works with PyTorch
import torch
print(f"CUDA Version: {torch.version.cuda}")
print(f"cuDNN Version: {torch.backends.cudnn.version()}")
print(f"cuDNN Enabled: {torch.backends.cudnn.enabled}")

Training Models on GPU

PyTorch on GPU

import torch
import torch.nn as nn

# Move model and data to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10)
).to(device)

# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Move data to GPU
x = torch.randn(64, 784).to(device)
y = torch.randint(0, 10, (64,)).to(device)

# Forward pass
output = model(x)
loss = criterion(output, y)
loss.backward()
optimizer.step()
print(f"Loss: {loss.item():.4f}")

TensorFlow on GPU

import tensorflow as tf

# TensorFlow automatically uses GPU when available
print(f"GPUs: {tf.config.list_physical_devices('GPU')}")

# Build and train a model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training will automatically use GPU
import numpy as np
x_train = np.random.randn(1000, 784)
y_train = np.random.randint(0, 10, 1000)
model.fit(x_train, y_train, epochs=5, batch_size=32)

Runtime Limits

Tier	Max Session	Idle Timeout	GPU Priority
Free	~12 hours	~90 minutes	Standard (may be preempted)
Pro	~24 hours	~90 minutes	Higher priority, background execution
Pro+	~24 hours	~90 minutes	Highest priority, background execution

Memory Management

GPU memory is limited. Use these techniques to avoid out-of-memory errors:

# Check GPU memory usage
!nvidia-smi

# Clear GPU cache (PyTorch)
torch.cuda.empty_cache()

# Monitor memory in Python
print(f"Allocated: {torch.cuda.memory_allocated()/1e9:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved()/1e9:.2f} GB")

# Use mixed precision for lower memory usage
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()

with autocast():
    output = model(x)
    loss = criterion(output, y)

# Reduce batch size if running out of memory
# Use gradient accumulation for effective larger batches

✅

Tip: If you run out of GPU memory, try: (1) reducing batch size, (2) using mixed precision training, (3) clearing cache with torch.cuda.empty_cache(), or (4) restarting the runtime from Runtime → Restart runtime.

← Notebooks Data & Files →