Intermediate
Right-Sizing AI Workloads
Select the optimal instance types, GPU configurations, and resource allocations for your ML workloads by profiling actual usage patterns.
GPU Instance Selection Guide
| Workload | Recommended | Why |
|---|---|---|
| LLM inference (small) | g5.xlarge / inf2.xlarge | 24-32 GB GPU memory, cost-effective |
| LLM inference (large) | inf2.48xlarge / p4d.24xlarge | 384+ GB for 70B+ models |
| Fine-tuning | g5.12xlarge / p4d.24xlarge | Multi-GPU for LoRA/QLoRA training |
| Pre-training | p5.48xlarge / trn1.32xlarge | Maximum throughput, EFA networking |
| Vision inference | g5.xlarge / inf2.xlarge | Single GPU sufficient for most models |
| Batch processing | g5.xlarge (Spot) | Cost-optimized with Spot pricing |
Profiling Your Workloads
GPU Utilization
Monitor with nvidia-smi or DCGM. If consistently below 50%, consider a smaller instance or GPU sharing with MIG.
GPU Memory
Track peak memory usage. If using less than 50% of GPU memory, a smaller GPU type may be sufficient.
CPU & System RAM
Data preprocessing often bottlenecks on CPU. If GPU waits for data, increase CPU count or optimize data loading.
I/O Throughput
Storage bandwidth affects training speed. If I/O-bound, upgrade to FSx for Lustre or increase EBS throughput.
Common Right-Sizing Mistakes
- Over-provisioning GPUs: Using p4d.24xlarge for a model that fits on a single g5.xlarge GPU
- Ignoring Inferentia: Running inference on expensive GPUs when Inferentia can serve the same model at lower cost
- Wrong GPU generation: Using older P3 instances instead of G5 which offer better price-performance
- Idle endpoints: Keeping SageMaker endpoints running 24/7 for workloads with sporadic traffic
- One-size-fits-all: Using the same instance type for training and inference without considering each workload's requirements
Pro tip: Run a systematic benchmark across instance types before committing to a fleet. A 2-hour benchmark comparing g5, p4d, and inf2 instances can save thousands of dollars per month by identifying the most cost-effective option for your specific model and throughput requirements.
Lilly Tech Systems