Intermediate
Azure OpenAI Deployment
Learn to create model deployments, manage versions and quotas, and architect multi-region deployments for high availability.
Creating a Deployment
# Azure CLI
az cognitiveservices account deployment create \
--name my-openai-resource \
--resource-group ai-rg \
--deployment-name gpt4o-deployment \
--model-name gpt-4o \
--model-version "2024-08-06" \
--model-format OpenAI \
--sku-capacity 80 \
--sku-name Standard
# Python SDK
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint="https://my-openai-resource.openai.azure.com/",
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-06-01"
)
Deployment Types
| Type | Pricing | Latency | Best For |
|---|---|---|---|
| Standard | Pay-per-token | Variable | Development, variable traffic |
| Provisioned (PTU) | Reserved throughput | Guaranteed low | Production, consistent traffic |
| Global Standard | Pay-per-token | Optimized routing | Global apps, best availability |
| Data Zone Standard | Pay-per-token | Variable | Data residency requirements |
Multi-Region Architecture
- Active-passive: Primary region handles traffic; secondary region for failover with same model deployment
- Active-active: Load balance across regions using Azure API Management or Azure Front Door
- Quota spreading: Distribute quota across regions to maximize total available throughput
- Model availability: Not all models are available in all regions; check regional availability first
Version Management
- Pin versions: Always specify a model version to avoid unexpected behavior changes
- Auto-upgrade policy: Configure deployments to auto-upgrade on default version changes or require manual upgrade
- Testing: Test new model versions in a staging deployment before updating production
- Retirement: Monitor model retirement dates and plan migrations well in advance
Pro tip: Deploy the same model across multiple Azure regions and use API Management with a load-balancing policy to distribute traffic. This gives you higher aggregate throughput (quotas are per-region) and built-in disaster recovery.
Lilly Tech Systems