Intermediate

Azure OpenAI Deployment

Learn to create model deployments, manage versions and quotas, and architect multi-region deployments for high availability.

Creating a Deployment

# Azure CLI
az cognitiveservices account deployment create \
  --name my-openai-resource \
  --resource-group ai-rg \
  --deployment-name gpt4o-deployment \
  --model-name gpt-4o \
  --model-version "2024-08-06" \
  --model-format OpenAI \
  --sku-capacity 80 \
  --sku-name Standard

# Python SDK
from openai import AzureOpenAI
client = AzureOpenAI(
    azure_endpoint="https://my-openai-resource.openai.azure.com/",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version="2024-06-01"
)

Deployment Types

TypePricingLatencyBest For
StandardPay-per-tokenVariableDevelopment, variable traffic
Provisioned (PTU)Reserved throughputGuaranteed lowProduction, consistent traffic
Global StandardPay-per-tokenOptimized routingGlobal apps, best availability
Data Zone StandardPay-per-tokenVariableData residency requirements

Multi-Region Architecture

  • Active-passive: Primary region handles traffic; secondary region for failover with same model deployment
  • Active-active: Load balance across regions using Azure API Management or Azure Front Door
  • Quota spreading: Distribute quota across regions to maximize total available throughput
  • Model availability: Not all models are available in all regions; check regional availability first

Version Management

  • Pin versions: Always specify a model version to avoid unexpected behavior changes
  • Auto-upgrade policy: Configure deployments to auto-upgrade on default version changes or require manual upgrade
  • Testing: Test new model versions in a staging deployment before updating production
  • Retirement: Monitor model retirement dates and plan migrations well in advance
Pro tip: Deploy the same model across multiple Azure regions and use API Management with a load-balancing policy to distribute traffic. This gives you higher aggregate throughput (quotas are per-region) and built-in disaster recovery.