Intermediate

Azure Storage for AI

Choose and configure the right Azure storage services for training data, model artifacts, checkpoints, and feature stores.

Storage Options for AI Workloads

ServiceUse CasePerformanceCost Tier
Blob StorageDatasets, model artifactsHigh throughputHot/Cool/Archive
Data Lake Gen2Large-scale analytics dataHierarchical namespaceHot/Cool/Archive
Azure Files (NFS)Shared training dataPremium NFS 4.1Premium/Standard
Managed DisksLocal compute storageUltra/Premium SSDPer-disk pricing
Azure NetApp FilesHPC training datasetsUltra-high IOPSStandard/Premium/Ultra

Azure ML Datastores

from azure.ai.ml.entities import AzureBlobDatastore
from azure.ai.ml import MLClient

# Register a blob datastore for training data
blob_datastore = AzureBlobDatastore(
    name="training_data",
    account_name="mymlstorage",
    container_name="datasets",
    description="Training datasets for ML models",
)
ml_client.datastores.create_or_update(blob_datastore)

Storage Architecture Patterns

💾

Data Lake Pattern

Use ADLS Gen2 as a central data lake with bronze/silver/gold zones for raw, processed, and feature-ready data.

🚀

High-Performance Training

Mount Azure NetApp Files or Premium NFS for datasets that require low-latency random access during training.

📦

Model Registry

Use Azure ML Model Registry backed by Blob Storage for versioned model artifacts with lineage tracking.

📈

Checkpoint Storage

Blob Storage with lifecycle policies to auto-delete old checkpoints and tier cold data to Archive.

Performance Optimization

  • Co-locate storage and compute: Keep storage accounts in the same region as your compute to minimize latency
  • Use private endpoints: Connect via Private Link to avoid public internet and improve throughput
  • Enable NFS for training: NFS mounts provide better random-read performance than blob fuse for small files
  • Lifecycle policies: Automatically tier old training data to Cool or Archive storage to reduce costs
  • Parallel downloads: Use AzCopy or Azure ML data mounts with parallel I/O for large dataset transfers
Pro tip: For large training datasets with many small files, use Azure Blob Storage with NFS 3.0 protocol or convert your dataset to fewer large files (like TFRecords or WebDataset format). Many small files create significant I/O overhead that can bottleneck GPU training.