Intermediate

Right-Sizing AI Workloads

Select the optimal instance types, GPU configurations, and resource allocations for your ML workloads by profiling actual usage patterns.

GPU Instance Selection Guide

Workload	Recommended	Why
LLM inference (small)	g5.xlarge / inf2.xlarge	24-32 GB GPU memory, cost-effective
LLM inference (large)	inf2.48xlarge / p4d.24xlarge	384+ GB for 70B+ models
Fine-tuning	g5.12xlarge / p4d.24xlarge	Multi-GPU for LoRA/QLoRA training
Pre-training	p5.48xlarge / trn1.32xlarge	Maximum throughput, EFA networking
Vision inference	g5.xlarge / inf2.xlarge	Single GPU sufficient for most models
Batch processing	g5.xlarge (Spot)	Cost-optimized with Spot pricing

Profiling Your Workloads

📈

GPU Utilization

Monitor with nvidia-smi or DCGM. If consistently below 50%, consider a smaller instance or GPU sharing with MIG.

💾

GPU Memory

Track peak memory usage. If using less than 50% of GPU memory, a smaller GPU type may be sufficient.

⏲

CPU & System RAM

Data preprocessing often bottlenecks on CPU. If GPU waits for data, increase CPU count or optimize data loading.

📦

I/O Throughput

Storage bandwidth affects training speed. If I/O-bound, upgrade to FSx for Lustre or increase EBS throughput.

Common Right-Sizing Mistakes

Over-provisioning GPUs: Using p4d.24xlarge for a model that fits on a single g5.xlarge GPU
Ignoring Inferentia: Running inference on expensive GPUs when Inferentia can serve the same model at lower cost
Wrong GPU generation: Using older P3 instances instead of G5 which offer better price-performance
Idle endpoints: Keeping SageMaker endpoints running 24/7 for workloads with sporadic traffic
One-size-fits-all: Using the same instance type for training and inference without considering each workload's requirements

✅

Pro tip: Run a systematic benchmark across instance types before committing to a fleet. A 2-hour benchmark comparing g5, p4d, and inf2 instances can save thousands of dollars per month by identifying the most cost-effective option for your specific model and throughput requirements.

← Previous Savings Plans Next → Monitoring