Intermediate
PTU vs Pay-As-You-Go
Compare Provisioned Throughput Units (PTU) and Pay-As-You-Go (PayGo) pricing models to choose the right option for your Azure OpenAI workloads.
Pricing Model Comparison
| Feature | Pay-As-You-Go | Provisioned (PTU) |
|---|---|---|
| Pricing | Per 1K tokens (input/output) | Per PTU per hour |
| Latency | Variable (shared infra) | Guaranteed low (dedicated) |
| Rate limits | Subject to TPM quota | No TPM limits within PTU capacity |
| Commitment | None | Monthly or yearly reservation |
| Cost predictability | Variable with usage | Fixed monthly cost |
| Best for | Dev, variable workloads | Production, consistent traffic |
When to Choose PTU
Consistent Traffic
Your application has steady, predictable token consumption that justifies dedicated capacity allocation.
Latency Requirements
Your SLA requires guaranteed low latency that shared PayGo infrastructure cannot consistently deliver.
Cost Optimization
At high volume, PTU can be cheaper than PayGo. The crossover point depends on utilization rate.
No Rate Limiting
You need burst capacity without hitting TPM rate limits during traffic spikes.
Calculating PTU Requirements
- Estimate tokens per minute: Calculate your average and peak TPM based on request patterns
- Use the capacity calculator: Azure provides a PTU sizing calculator in the Azure portal
- Factor in prompt/completion ratio: PTU capacity depends on the mix of input and output tokens
- Plan for growth: Size for 6-12 month projected traffic, not just current needs
- Start with monitoring: Run on PayGo first to establish baseline usage patterns before committing to PTU
Hybrid Approach
Many organizations use a hybrid strategy combining PTU and PayGo:
- Base load on PTU: Cover your minimum sustained traffic with provisioned capacity
- Burst to PayGo: Configure APIM to route overflow traffic to PayGo deployments during spikes
- Dev on PayGo: Use PayGo for development, testing, and experimentation environments
- Production on PTU: Run production workloads on PTU for guaranteed performance
Pro tip: Use Azure Monitor metrics to track your PTU utilization rate. If utilization is consistently below 60%, you may be over-provisioned. If you're regularly hitting 90%+, consider adding more PTUs or using PayGo spillover. The sweet spot is 70-85% average utilization.
Lilly Tech Systems