GPU & TPU Acceleration Intermediate
One of Colab's most powerful features is free access to GPUs and TPUs for accelerating machine learning workloads. Learn how to enable GPU/TPU, check hardware specifications, train models on accelerators, and manage runtime limits.
Enabling GPU
By default, Colab uses a CPU runtime. To enable GPU acceleration:
-
Open Runtime Settings
Go to Runtime → Change runtime type from the menu bar.
-
Select GPU
In the "Hardware accelerator" dropdown, select GPU (or TPU for tensor processing).
-
Save and Connect
Click Save. Colab will restart the runtime and connect you to a GPU-enabled virtual machine.
GPU Types Available
| GPU | VRAM | Tier | Best For |
|---|---|---|---|
| NVIDIA T4 | 16 GB | Free / Pro | Inference, fine-tuning small models, general ML |
| NVIDIA V100 | 16 GB | Pro / Pro+ | Training medium models, faster computation |
| NVIDIA A100 | 40 GB | Pro+ / Enterprise | Training large models, LLM fine-tuning |
| NVIDIA L4 | 24 GB | Pro / Pro+ | Inference optimization, efficient training |
Checking Your GPU
Verify your GPU allocation and check hardware details:
# Check GPU with nvidia-smi
!nvidia-smi
# Check GPU in Python
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
print(f"GPU Name: {torch.cuda.get_device_name(0)}")
print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
# TensorFlow GPU check
import tensorflow as tf
print(f"TF GPUs: {tf.config.list_physical_devices('GPU')}")
TPU Access
Google's Tensor Processing Units (TPUs) are custom accelerators designed for machine learning:
# Check TPU availability
import os
tpu_address = os.environ.get('COLAB_TPU_ADDR')
print(f"TPU Address: {tpu_address}")
# Using TPU with TensorFlow
import tensorflow as tf
resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)
print(f"Number of TPU cores: {strategy.num_replicas_in_sync}")
CUDA and cuDNN
Colab comes with CUDA and cuDNN pre-installed and configured:
# Check CUDA version
!nvcc --version
# Check cuDNN version
!cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
# Verify CUDA works with PyTorch
import torch
print(f"CUDA Version: {torch.version.cuda}")
print(f"cuDNN Version: {torch.backends.cudnn.version()}")
print(f"cuDNN Enabled: {torch.backends.cudnn.enabled}")
Training Models on GPU
PyTorch on GPU
import torch
import torch.nn as nn
# Move model and data to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
model = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10)
).to(device)
# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
# Move data to GPU
x = torch.randn(64, 784).to(device)
y = torch.randint(0, 10, (64,)).to(device)
# Forward pass
output = model(x)
loss = criterion(output, y)
loss.backward()
optimizer.step()
print(f"Loss: {loss.item():.4f}")
TensorFlow on GPU
import tensorflow as tf
# TensorFlow automatically uses GPU when available
print(f"GPUs: {tf.config.list_physical_devices('GPU')}")
# Build and train a model
model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation='relu', input_shape=(784,)),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Training will automatically use GPU
import numpy as np
x_train = np.random.randn(1000, 784)
y_train = np.random.randint(0, 10, 1000)
model.fit(x_train, y_train, epochs=5, batch_size=32)
Runtime Limits
| Tier | Max Session | Idle Timeout | GPU Priority |
|---|---|---|---|
| Free | ~12 hours | ~90 minutes | Standard (may be preempted) |
| Pro | ~24 hours | ~90 minutes | Higher priority, background execution |
| Pro+ | ~24 hours | ~90 minutes | Highest priority, background execution |
Memory Management
GPU memory is limited. Use these techniques to avoid out-of-memory errors:
# Check GPU memory usage
!nvidia-smi
# Clear GPU cache (PyTorch)
torch.cuda.empty_cache()
# Monitor memory in Python
print(f"Allocated: {torch.cuda.memory_allocated()/1e9:.2f} GB")
print(f"Cached: {torch.cuda.memory_reserved()/1e9:.2f} GB")
# Use mixed precision for lower memory usage
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
with autocast():
output = model(x)
loss = criterion(output, y)
# Reduce batch size if running out of memory
# Use gradient accumulation for effective larger batches
torch.cuda.empty_cache(), or (4) restarting the runtime from Runtime → Restart runtime.
Lilly Tech Systems