Intermediate

GitHub Models

Discover, compare, and integrate AI models from the GitHub Models marketplace — including GPT-4o, Claude, Llama, Mistral, and more — directly from your GitHub account.

What is GitHub Models?

GitHub Models is a marketplace and playground built into GitHub that gives developers direct access to a wide range of AI models from leading providers. Instead of signing up for separate accounts with OpenAI, Anthropic, Meta, and others, you can explore and use their models through a single, unified interface at github.com/marketplace/models.

GitHub Models serves several important purposes for developers:

  • Discovery — Browse and learn about available AI models without leaving GitHub
  • Experimentation — Test models in an interactive playground before writing any code
  • Comparison — Run the same prompt against multiple models side by side to evaluate quality
  • Integration — Use models via a standard API with your GitHub personal access token
  • Prototyping — Build and test AI features rapidly with free-tier access
💡
Good to know: GitHub Models is accessible to anyone with a GitHub account. The playground is free to use for experimentation, making it one of the easiest ways to try out different AI models without any credit card or separate sign-up.

Available Models

GitHub Models hosts a growing catalog of models from multiple providers. Each model has different strengths, context window sizes, and pricing. Here are the key models available:

Model Provider Best For Context Window
GPT-4o OpenAI General purpose, multimodal (text + image) 128K tokens
GPT-4o mini OpenAI Fast responses, cost-efficient tasks 128K tokens
o1 / o1-mini OpenAI Complex reasoning, math, code 200K tokens
Claude 3.5 Sonnet Anthropic Nuanced writing, analysis, coding 200K tokens
Llama 3.1 (405B/70B/8B) Meta Open-source, versatile, self-hostable 128K tokens
Mistral Large / Nemo Mistral AI Multilingual, efficient, European AI 128K tokens
Phi-3 / Phi-3.5 Microsoft Small, fast, on-device capable 128K tokens
Command R+ Cohere RAG, enterprise search, tool use 128K tokens
Key takeaway: No single model is best for everything. GPT-4o and Claude excel at general tasks, o1 models shine at complex reasoning, Llama is ideal when you need open-source flexibility, and Phi models work well for lightweight or edge deployments.

The Playground: Trying Models Interactively

The GitHub Models playground lets you interact with any model directly in your browser. Navigate to a model's page and click "Playground" to start a conversation. The interface provides several configuration options:

  • System prompt — Set instructions that guide the model's behavior (e.g., "You are a senior Python developer")
  • Temperature — Control randomness: 0 for deterministic responses, 1 for creative outputs
  • Max tokens — Limit response length to control cost and focus
  • Top-p — Fine-tune probability distribution for token selection

One of the most powerful playground features is the Compare mode. Select two or more models, type a single prompt, and see how each model responds side by side. This is invaluable for deciding which model to use in your application.

For example, you might compare how different models handle a code generation task:

Prompt
Write a TypeScript function that validates an email address
using a regex pattern. Include JSDoc comments and handle
edge cases like empty strings and null values.

Running this across GPT-4o, Claude, and Llama will show you differences in code style, error handling approaches, regex patterns chosen, and documentation quality — helping you make an informed choice for your project.

Using the GitHub Models API

Beyond the playground, GitHub Models provides a REST API that you can call from your applications. Authentication uses your GitHub personal access token (PAT), which means no additional API keys are needed.

The API follows the OpenAI-compatible chat completions format, making it easy to integrate with existing tooling:

Python
import requests
import os

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

response = requests.post(
    "https://models.inference.ai.azure.com/chat/completions",
    headers={
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Content-Type": "application/json"
    },
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "You are a helpful coding assistant."},
            {"role": "user", "content": "Write a Python function to merge two sorted lists."}
        ],
        "temperature": 0.7,
        "max_tokens": 1000
    }
)

result = response.json()
print(result["choices"][0]["message"]["content"])

You can also use the OpenAI SDK directly by pointing it at the GitHub Models endpoint:

JavaScript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://models.inference.ai.azure.com",
  apiKey: process.env.GITHUB_TOKEN,
});

async function main() {
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: "You are a senior code reviewer." },
      { role: "user", content: "Review this function for bugs:\n\nfunction divide(a, b) {\n  return a / b;\n}" }
    ],
    temperature: 0.3,
  });

  console.log(response.choices[0].message.content);
}

main();
💡
Good to know: To create a personal access token for GitHub Models, go to Settings → Developer settings → Personal access tokens → Fine-grained tokens. No special scopes are required for GitHub Models API access.

Model Selection: Choosing the Right Model

Selecting the right model for your use case is critical for balancing quality, speed, and cost. Here is a practical guide:

  • Code generation and debugging — GPT-4o or Claude 3.5 Sonnet offer the best code quality. For simpler tasks, GPT-4o mini is faster and cheaper.
  • Complex reasoning and math — OpenAI's o1 models are specifically designed for multi-step reasoning tasks where accuracy matters more than speed.
  • Content writing and analysis — Claude 3.5 Sonnet excels at nuanced, well-structured writing. GPT-4o is also strong here.
  • Multilingual applications — Mistral models have strong multilingual capabilities, especially for European languages.
  • Cost-sensitive or high-volume — GPT-4o mini, Phi-3, or Llama 8B provide good quality at lower cost per token.
  • RAG and enterprise search — Cohere's Command R+ is purpose-built for retrieval-augmented generation.
  • Privacy-sensitive or on-premise — Llama and Mistral are open-source and can be self-hosted.

Rate Limits and Pricing

GitHub Models offers generous free-tier access for experimentation, with paid tiers for production usage:

Tier Rate Limit Daily Token Limit Use Case
Free (Playground) ~15 requests/min ~150K tokens/day Exploration and learning
Free (API) ~15 requests/min ~150K tokens/day Prototyping and testing
Pay-as-you-go (via Azure) Configurable Unlimited Production applications

When you are ready to move from prototyping to production, GitHub Models integrates seamlessly with Azure AI. You can upgrade to pay-as-you-go pricing without changing your code — just update the endpoint and authentication to use an Azure API key instead of your GitHub token.

Warning: Free-tier rate limits are shared across all models. If you hit the limit with GPT-4o, you will also be rate-limited on other models. Plan your experimentation sessions accordingly, and switch to Azure for production workloads.

Integration with GitHub Copilot

GitHub Models connects directly to GitHub Copilot through the model switching feature. Copilot Pro and Enterprise users can select which underlying model powers their Copilot experience:

  • In VS Code or JetBrains, click the Copilot model selector in the status bar
  • Choose from available models like GPT-4o, Claude 3.5 Sonnet, or o1
  • The selected model is used for both inline completions and Copilot Chat
  • You can switch models mid-session based on your current task

This means the models you test in the GitHub Models playground are the same models you can use directly in your coding workflow through Copilot. Test a model's strengths in the playground, then select it in Copilot for your daily work.

Key takeaway: Use the GitHub Models playground to evaluate models for your specific use cases, then set your preferred model in Copilot for day-to-day coding. Switch to reasoning models like o1 when tackling complex algorithmic problems, and use faster models like GPT-4o mini for routine tasks.