Intermediate

Right-Sizing AI Workloads

Select the optimal instance types, GPU configurations, and resource allocations for your ML workloads by profiling actual usage patterns.

GPU Instance Selection Guide

WorkloadRecommendedWhy
LLM inference (small)g5.xlarge / inf2.xlarge24-32 GB GPU memory, cost-effective
LLM inference (large)inf2.48xlarge / p4d.24xlarge384+ GB for 70B+ models
Fine-tuningg5.12xlarge / p4d.24xlargeMulti-GPU for LoRA/QLoRA training
Pre-trainingp5.48xlarge / trn1.32xlargeMaximum throughput, EFA networking
Vision inferenceg5.xlarge / inf2.xlargeSingle GPU sufficient for most models
Batch processingg5.xlarge (Spot)Cost-optimized with Spot pricing

Profiling Your Workloads

📈

GPU Utilization

Monitor with nvidia-smi or DCGM. If consistently below 50%, consider a smaller instance or GPU sharing with MIG.

💾

GPU Memory

Track peak memory usage. If using less than 50% of GPU memory, a smaller GPU type may be sufficient.

CPU & System RAM

Data preprocessing often bottlenecks on CPU. If GPU waits for data, increase CPU count or optimize data loading.

📦

I/O Throughput

Storage bandwidth affects training speed. If I/O-bound, upgrade to FSx for Lustre or increase EBS throughput.

Common Right-Sizing Mistakes

  • Over-provisioning GPUs: Using p4d.24xlarge for a model that fits on a single g5.xlarge GPU
  • Ignoring Inferentia: Running inference on expensive GPUs when Inferentia can serve the same model at lower cost
  • Wrong GPU generation: Using older P3 instances instead of G5 which offer better price-performance
  • Idle endpoints: Keeping SageMaker endpoints running 24/7 for workloads with sporadic traffic
  • One-size-fits-all: Using the same instance type for training and inference without considering each workload's requirements
Pro tip: Run a systematic benchmark across instance types before committing to a fleet. A 2-hour benchmark comparing g5, p4d, and inf2 instances can save thousands of dollars per month by identifying the most cost-effective option for your specific model and throughput requirements.