Best Practices
Optimize costs, handle rate limits, implement fallbacks, monitor usage, and secure your API keys for production-grade OpenRouter deployments.
Cost Optimization
AI API costs can add up quickly. Here are proven strategies to keep spending under control:
Choose the Right Model for Each Task
Not every task needs the most expensive model. Match the model to the task complexity:
- Simple tasks (classification, formatting, extraction): Use GPT-4o mini, Claude 3.5 Haiku, or Gemini 2.5 Flash ($0.15/M input tokens).
- Medium tasks (code generation, analysis, writing): Use Claude Sonnet 4, GPT-4o, or Gemini 2.5 Pro.
- Complex tasks (deep reasoning, multi-step analysis): Use Claude Opus 4, o3, or DeepSeek R1 only when quality truly matters.
- Development/testing: Use free models like Llama 4 Scout to avoid any costs during development.
Reduce Token Usage
- Be concise in prompts: Remove unnecessary words, examples, and context that the model does not need.
- Set max_tokens: Always set a reasonable
max_tokenslimit to prevent unexpectedly long (and expensive) responses. - Cache responses: Store responses for identical or similar queries to avoid making the same API call twice.
- Truncate context: Only send the relevant portion of long documents rather than the entire file.
import hashlib import json cache = {} def cached_completion(model, messages, **kwargs): # Create a cache key from the request key = hashlib.md5( json.dumps({"model": model, "messages": messages}).encode() ).hexdigest() if key in cache: return cache[key] response = client.chat.completions.create( model=model, messages=messages, **kwargs, ) result = response.choices[0].message.content cache[key] = result return result
Rate Limit Handling
When you hit rate limits (HTTP 429), implement exponential backoff:
import time from openai import RateLimitError def call_with_backoff(func, max_retries=5): for attempt in range(max_retries): try: return func() except RateLimitError: wait = (2 ** attempt) + random.random() print(f"Rate limited. Retrying in {wait:.1f}s...") time.sleep(wait) raise Exception("Max retries exceeded")
Fallback Strategies
Build resilient applications by implementing model fallbacks:
FALLBACK_MODELS = [
"anthropic/claude-sonnet-4",
"openai/gpt-4o",
"google/gemini-2.5-pro",
"meta-llama/llama-4-maverick",
]
def chat_with_fallback(messages):
for model in FALLBACK_MODELS:
try:
response = client.chat.completions.create(
model=model,
messages=messages,
)
return response.choices[0].message.content
except Exception as e:
print(f"{model} failed: {e}. Trying next...")
raise Exception("All models failed")
You can also use OpenRouter's built-in provider fallback by specifying provider preferences in your request:
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=messages,
extra_body={
"provider": {
"order": ["Anthropic", "AWS Bedrock", "GCP Vertex"],
"allow_fallbacks": True,
}
},
)
Monitoring Usage
Track your API usage to identify cost drivers and optimize spending:
- Dashboard: Check the OpenRouter Activity page regularly for usage trends and cost breakdowns.
- Per-key limits: Create separate API keys for different projects or environments, each with its own credit limit.
- Log token counts: Record the
usagefield from API responses to track token consumption in your own systems. - Set alerts: Configure credit alerts on your OpenRouter account to get notified before running out of credits.
def tracked_completion(model, messages): response = client.chat.completions.create( model=model, messages=messages, ) # Log usage details usage = response.usage print(f"Model: {model}") print(f"Input tokens: {usage.prompt_tokens}") print(f"Output tokens: {usage.completion_tokens}") print(f"Total tokens: {usage.total_tokens}") return response.choices[0].message.content
Security: API Key Management
Proper API key security is critical for production deployments:
- Environment variables: Never hardcode API keys. Use environment variables (
OPENROUTER_API_KEY) or a secrets manager. - Key rotation: Rotate API keys periodically. Create a new key, update your applications, then revoke the old key.
- Separate keys: Use different API keys for development, staging, and production environments.
- Credit limits: Set per-key credit limits to cap potential damage from a leaked key.
- Revoke immediately: If a key is compromised, revoke it immediately at openrouter.ai/keys.
- .gitignore: Always include
.envfiles in your.gitignoreto prevent accidental commits.
Production Deployment Checklist
- Use environment variables for all API keys
- Implement retry logic with exponential backoff
- Set up model fallback chains
- Configure per-key credit limits
- Set
max_tokenson all requests - Log token usage and costs
- Set up monitoring alerts for spending
- Use streaming for user-facing responses
- Cache identical requests where appropriate
- Test with free models before switching to paid
Frequently Asked Questions
OpenRouter itself does not charge a platform fee. You pay only for the tokens you consume at each model's listed price. Several models are available completely free of charge, making it easy to get started without any payment.
OpenRouter adds minimal latency (typically a few milliseconds) for routing. For most use cases, the difference is negligible. The benefits of fallbacks, unified billing, and model flexibility far outweigh the minor latency addition.
Yes. Many companies and applications use OpenRouter in production. Its automatic fallbacks and multi-provider routing actually make it more reliable than connecting to a single provider directly. Implement proper error handling and fallback strategies for the best results.
OpenRouter routes requests to the underlying providers, which have their own data policies. OpenRouter logs metadata (model used, token counts, costs) for billing purposes. Check OpenRouter's privacy policy and each provider's terms for specific data handling details.
If a model is deprecated or removed, API calls to that model will return an error. Implement fallback chains (as shown above) to automatically switch to an alternative model. OpenRouter typically provides advance notice before removing popular models.
Lilly Tech Systems