Advanced

Best Practices

Optimize costs, handle rate limits, implement fallbacks, monitor usage, and secure your API keys for production-grade OpenRouter deployments.

Cost Optimization

AI API costs can add up quickly. Here are proven strategies to keep spending under control:

Choose the Right Model for Each Task

Not every task needs the most expensive model. Match the model to the task complexity:

Simple tasks (classification, formatting, extraction): Use GPT-4o mini, Claude 3.5 Haiku, or Gemini 2.5 Flash ($0.15/M input tokens).
Medium tasks (code generation, analysis, writing): Use Claude Sonnet 4, GPT-4o, or Gemini 2.5 Pro.
Complex tasks (deep reasoning, multi-step analysis): Use Claude Opus 4, o3, or DeepSeek R1 only when quality truly matters.
Development/testing: Use free models like Llama 4 Scout to avoid any costs during development.

Reduce Token Usage

Be concise in prompts: Remove unnecessary words, examples, and context that the model does not need.
Set max_tokens: Always set a reasonable max_tokens limit to prevent unexpectedly long (and expensive) responses.
Cache responses: Store responses for identical or similar queries to avoid making the same API call twice.
Truncate context: Only send the relevant portion of long documents rather than the entire file.

Python (Simple Caching)

import hashlib
import json

cache = {}

def cached_completion(model, messages, **kwargs):
    # Create a cache key from the request
    key = hashlib.md5(
        json.dumps({"model": model, "messages": messages}).encode()
    ).hexdigest()

    if key in cache:
        return cache[key]

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs,
    )
    result = response.choices[0].message.content
    cache[key] = result
    return result

Rate Limit Handling

When you hit rate limits (HTTP 429), implement exponential backoff:

Python (Exponential Backoff)

import time
from openai import RateLimitError

def call_with_backoff(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            wait = (2 ** attempt) + random.random()
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Fallback Strategies

Build resilient applications by implementing model fallbacks:

Python (Model Fallback Chain)

FALLBACK_MODELS = [
    "anthropic/claude-sonnet-4",
    "openai/gpt-4o",
    "google/gemini-2.5-pro",
    "meta-llama/llama-4-maverick",
]

def chat_with_fallback(messages):
    for model in FALLBACK_MODELS:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"{model} failed: {e}. Trying next...")
    raise Exception("All models failed")

You can also use OpenRouter's built-in provider fallback by specifying provider preferences in your request:

Python (Provider Fallback)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=messages,
    extra_body={
        "provider": {
            "order": ["Anthropic", "AWS Bedrock", "GCP Vertex"],
            "allow_fallbacks": True,
        }
    },
)

Monitoring Usage

Track your API usage to identify cost drivers and optimize spending:

Dashboard: Check the OpenRouter Activity page regularly for usage trends and cost breakdowns.
Per-key limits: Create separate API keys for different projects or environments, each with its own credit limit.
Log token counts: Record the usage field from API responses to track token consumption in your own systems.
Set alerts: Configure credit alerts on your OpenRouter account to get notified before running out of credits.

Python (Usage Logging)

def tracked_completion(model, messages):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
    )

    # Log usage details
    usage = response.usage
    print(f"Model: {model}")
    print(f"Input tokens:  {usage.prompt_tokens}")
    print(f"Output tokens: {usage.completion_tokens}")
    print(f"Total tokens:  {usage.total_tokens}")

    return response.choices[0].message.content

Security: API Key Management

Proper API key security is critical for production deployments:

Environment variables: Never hardcode API keys. Use environment variables (OPENROUTER_API_KEY) or a secrets manager.
Key rotation: Rotate API keys periodically. Create a new key, update your applications, then revoke the old key.
Separate keys: Use different API keys for development, staging, and production environments.
Credit limits: Set per-key credit limits to cap potential damage from a leaked key.
Revoke immediately: If a key is compromised, revoke it immediately at openrouter.ai/keys.
.gitignore: Always include .env files in your .gitignore to prevent accidental commits.

⚠

Security warning: Never expose your OpenRouter API key in client-side JavaScript. API calls should always be made from a backend server. Anyone with your key can run up charges on your account.

Production Deployment Checklist

Use environment variables for all API keys
Implement retry logic with exponential backoff
Set up model fallback chains
Configure per-key credit limits
Set max_tokens on all requests
Log token usage and costs
Set up monitoring alerts for spending
Use streaming for user-facing responses
Cache identical requests where appropriate
Test with free models before switching to paid

Frequently Asked Questions

OpenRouter itself does not charge a platform fee. You pay only for the tokens you consume at each model's listed price. Several models are available completely free of charge, making it easy to get started without any payment.

OpenRouter adds minimal latency (typically a few milliseconds) for routing. For most use cases, the difference is negligible. The benefits of fallbacks, unified billing, and model flexibility far outweigh the minor latency addition.

Yes. Many companies and applications use OpenRouter in production. Its automatic fallbacks and multi-provider routing actually make it more reliable than connecting to a single provider directly. Implement proper error handling and fallback strategies for the best results.

OpenRouter routes requests to the underlying providers, which have their own data policies. OpenRouter logs metadata (model used, token counts, costs) for billing purposes. Check OpenRouter's privacy policy and each provider's terms for specific data handling details.

If a model is deprecated or removed, API calls to that model will return an error. Implement fallback chains (as shown above) to automatically switch to an alternative model. OpenRouter typically provides advance notice before removing popular models.

✅

Congratulations! You have completed the OpenRouter course. You now know how to access 200+ AI models through a unified API, optimize costs, build resilient applications with fallbacks, and deploy OpenRouter in production.

← Previous Integration