Intermediate
Azure Storage for AI
Choose and configure the right Azure storage services for training data, model artifacts, checkpoints, and feature stores.
Storage Options for AI Workloads
| Service | Use Case | Performance | Cost Tier |
|---|---|---|---|
| Blob Storage | Datasets, model artifacts | High throughput | Hot/Cool/Archive |
| Data Lake Gen2 | Large-scale analytics data | Hierarchical namespace | Hot/Cool/Archive |
| Azure Files (NFS) | Shared training data | Premium NFS 4.1 | Premium/Standard |
| Managed Disks | Local compute storage | Ultra/Premium SSD | Per-disk pricing |
| Azure NetApp Files | HPC training datasets | Ultra-high IOPS | Standard/Premium/Ultra |
Azure ML Datastores
from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient
# Register a blob datastore for training data
blob_datastore = AzureBlobDatastore(
name="training_data",
account_name="mymlstorage",
container_name="datasets",
description="Training datasets for ML models",
)
ml_client.datastores.create_or_update(blob_datastore)
Storage Architecture Patterns
Data Lake Pattern
Use ADLS Gen2 as a central data lake with bronze/silver/gold zones for raw, processed, and feature-ready data.
High-Performance Training
Mount Azure NetApp Files or Premium NFS for datasets that require low-latency random access during training.
Model Registry
Use Azure ML Model Registry backed by Blob Storage for versioned model artifacts with lineage tracking.
Checkpoint Storage
Blob Storage with lifecycle policies to auto-delete old checkpoints and tier cold data to Archive.
Performance Optimization
- Co-locate storage and compute: Keep storage accounts in the same region as your compute to minimize latency
- Use private endpoints: Connect via Private Link to avoid public internet and improve throughput
- Enable NFS for training: NFS mounts provide better random-read performance than blob fuse for small files
- Lifecycle policies: Automatically tier old training data to Cool or Archive storage to reduce costs
- Parallel downloads: Use AzCopy or Azure ML data mounts with parallel I/O for large dataset transfers
Pro tip: For large training datasets with many small files, use Azure Blob Storage with NFS 3.0 protocol or convert your dataset to fewer large files (like TFRecords or WebDataset format). Many small files create significant I/O overhead that can bottleneck GPU training.
Lilly Tech Systems