AI Storage Tiers Intermediate
Different AI data types have different access patterns, performance requirements, and retention needs. A well-designed tiered storage architecture places data on the most cost-effective medium that meets its performance requirements, from ultra-fast NVMe for active training to cold object storage for archived datasets.
Storage Tier Comparison
| Tier | Technology | Throughput | Cost/TB/mo | AI Use Case |
|---|---|---|---|---|
| Hot | Local NVMe | 3-7 GB/s | $$$ | Active training data, checkpoints |
| Warm | Lustre/GPFS (SSD) | 100+ GB/s aggregate | $$ | Shared datasets, model artifacts |
| Cool | NFS/GPFS (HDD) | 10-50 GB/s aggregate | $ | Infrequently accessed datasets |
| Cold | S3/GCS/MinIO | Variable | $0.02/GB | Archived data, compliance retention |
Data Placement Strategy
- Training datasets — Warm tier (Lustre) with hot tier caching (local NVMe) for active experiments
- Checkpoints — Write to hot tier (local NVMe), async replicate to warm tier, archive to cold tier after training completes
- Model artifacts — Warm tier for recent versions, cold tier for historical versions
- Experiment logs — Warm tier during experiment, cold tier after analysis
- Raw data — Cold tier with warm tier staging for active preprocessing
Automated Tiering
Implement policies that automatically move data between tiers based on access patterns:
- Access-based tiering — Move data to a colder tier after N days of no access
- Size-based tiering — Keep datasets under a threshold on fast storage; overflow to cheaper tiers
- Policy-based tiering — Apply rules based on data tags (e.g., production datasets stay on warm tier)
Cost Tip: Object storage (S3, GCS) costs $0.02/GB/month. Keeping 100TB of rarely accessed data on Lustre instead of object storage wastes thousands of dollars monthly. Implement automated tiering to keep costs under control as datasets grow.
Ready to Learn File Systems?
The next lesson covers deploying NFS and Lustre for shared AI data access.
Next: NFS/Lustre →
Lilly Tech Systems