GitHub Models
Discover, compare, and integrate AI models from the GitHub Models marketplace — including GPT-4o, Claude, Llama, Mistral, and more — directly from your GitHub account.
What is GitHub Models?
GitHub Models is a marketplace and playground built into GitHub that gives developers direct access to a wide range of AI models from leading providers. Instead of signing up for separate accounts with OpenAI, Anthropic, Meta, and others, you can explore and use their models through a single, unified interface at github.com/marketplace/models.
GitHub Models serves several important purposes for developers:
- Discovery — Browse and learn about available AI models without leaving GitHub
- Experimentation — Test models in an interactive playground before writing any code
- Comparison — Run the same prompt against multiple models side by side to evaluate quality
- Integration — Use models via a standard API with your GitHub personal access token
- Prototyping — Build and test AI features rapidly with free-tier access
Available Models
GitHub Models hosts a growing catalog of models from multiple providers. Each model has different strengths, context window sizes, and pricing. Here are the key models available:
| Model | Provider | Best For | Context Window |
|---|---|---|---|
| GPT-4o | OpenAI | General purpose, multimodal (text + image) | 128K tokens |
| GPT-4o mini | OpenAI | Fast responses, cost-efficient tasks | 128K tokens |
| o1 / o1-mini | OpenAI | Complex reasoning, math, code | 200K tokens |
| Claude 3.5 Sonnet | Anthropic | Nuanced writing, analysis, coding | 200K tokens |
| Llama 3.1 (405B/70B/8B) | Meta | Open-source, versatile, self-hostable | 128K tokens |
| Mistral Large / Nemo | Mistral AI | Multilingual, efficient, European AI | 128K tokens |
| Phi-3 / Phi-3.5 | Microsoft | Small, fast, on-device capable | 128K tokens |
| Command R+ | Cohere | RAG, enterprise search, tool use | 128K tokens |
The Playground: Trying Models Interactively
The GitHub Models playground lets you interact with any model directly in your browser. Navigate to a model's page and click "Playground" to start a conversation. The interface provides several configuration options:
- System prompt — Set instructions that guide the model's behavior (e.g., "You are a senior Python developer")
- Temperature — Control randomness: 0 for deterministic responses, 1 for creative outputs
- Max tokens — Limit response length to control cost and focus
- Top-p — Fine-tune probability distribution for token selection
One of the most powerful playground features is the Compare mode. Select two or more models, type a single prompt, and see how each model responds side by side. This is invaluable for deciding which model to use in your application.
For example, you might compare how different models handle a code generation task:
Write a TypeScript function that validates an email address using a regex pattern. Include JSDoc comments and handle edge cases like empty strings and null values.
Running this across GPT-4o, Claude, and Llama will show you differences in code style, error handling approaches, regex patterns chosen, and documentation quality — helping you make an informed choice for your project.
Using the GitHub Models API
Beyond the playground, GitHub Models provides a REST API that you can call from your applications. Authentication uses your GitHub personal access token (PAT), which means no additional API keys are needed.
The API follows the OpenAI-compatible chat completions format, making it easy to integrate with existing tooling:
import requests
import os
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
response = requests.post(
"https://models.inference.ai.azure.com/chat/completions",
headers={
"Authorization": f"Bearer {GITHUB_TOKEN}",
"Content-Type": "application/json"
},
json={
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to merge two sorted lists."}
],
"temperature": 0.7,
"max_tokens": 1000
}
)
result = response.json()
print(result["choices"][0]["message"]["content"])You can also use the OpenAI SDK directly by pointing it at the GitHub Models endpoint:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://models.inference.ai.azure.com",
apiKey: process.env.GITHUB_TOKEN,
});
async function main() {
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a senior code reviewer." },
{ role: "user", content: "Review this function for bugs:\n\nfunction divide(a, b) {\n return a / b;\n}" }
],
temperature: 0.3,
});
console.log(response.choices[0].message.content);
}
main();Model Selection: Choosing the Right Model
Selecting the right model for your use case is critical for balancing quality, speed, and cost. Here is a practical guide:
- Code generation and debugging — GPT-4o or Claude 3.5 Sonnet offer the best code quality. For simpler tasks, GPT-4o mini is faster and cheaper.
- Complex reasoning and math — OpenAI's o1 models are specifically designed for multi-step reasoning tasks where accuracy matters more than speed.
- Content writing and analysis — Claude 3.5 Sonnet excels at nuanced, well-structured writing. GPT-4o is also strong here.
- Multilingual applications — Mistral models have strong multilingual capabilities, especially for European languages.
- Cost-sensitive or high-volume — GPT-4o mini, Phi-3, or Llama 8B provide good quality at lower cost per token.
- RAG and enterprise search — Cohere's Command R+ is purpose-built for retrieval-augmented generation.
- Privacy-sensitive or on-premise — Llama and Mistral are open-source and can be self-hosted.
Rate Limits and Pricing
GitHub Models offers generous free-tier access for experimentation, with paid tiers for production usage:
| Tier | Rate Limit | Daily Token Limit | Use Case |
|---|---|---|---|
| Free (Playground) | ~15 requests/min | ~150K tokens/day | Exploration and learning |
| Free (API) | ~15 requests/min | ~150K tokens/day | Prototyping and testing |
| Pay-as-you-go (via Azure) | Configurable | Unlimited | Production applications |
When you are ready to move from prototyping to production, GitHub Models integrates seamlessly with Azure AI. You can upgrade to pay-as-you-go pricing without changing your code — just update the endpoint and authentication to use an Azure API key instead of your GitHub token.
Integration with GitHub Copilot
GitHub Models connects directly to GitHub Copilot through the model switching feature. Copilot Pro and Enterprise users can select which underlying model powers their Copilot experience:
- In VS Code or JetBrains, click the Copilot model selector in the status bar
- Choose from available models like GPT-4o, Claude 3.5 Sonnet, or o1
- The selected model is used for both inline completions and Copilot Chat
- You can switch models mid-session based on your current task
This means the models you test in the GitHub Models playground are the same models you can use directly in your coding workflow through Copilot. Test a model's strengths in the playground, then select it in Copilot for your daily work.