LLM Comparison Guide
Side-by-side comparison of all major LLMs with a decision guide to help you choose the right model for your use case, budget, and requirements.
Master Comparison Table: Frontier Models
The top-tier models from each provider, compared on key dimensions:
| Model | Provider | Context | Multimodal | Code | Input $/1M | Output $/1M | Open |
|---|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | 128K | Text, Vision, Audio | Strong | $2.50 | $10.00 | No |
| o3 | OpenAI | 200K | Text, Vision | Excellent | $10.00 | $40.00 | No |
| Claude Opus 4 | Anthropic | 200K | Text, Vision | Excellent | $15.00 | $75.00 | No |
| Claude Sonnet 4 | Anthropic | 200K | Text, Vision | Excellent | $3.00 | $15.00 | No |
| Gemini 2.5 Pro | 1M+ | Text, Vision, Audio, Video | Excellent | $1.25 | $10.00 | No | |
| Llama 4 Maverick | Meta | 1M | Text, Vision | Strong | Varies* | Varies* | Yes |
| DeepSeek V3 | DeepSeek | 128K | Text | Excellent | $0.27 | $1.10 | Yes |
| Mistral Large | Mistral | 128K | Text | Strong | $2.00 | $6.00 | No |
*Open-weight models have variable pricing depending on the inference provider. Self-hosted cost depends on your hardware.
Budget-Tier Comparison
Cost-effective models for high-volume or budget-conscious applications:
| Model | Provider | Context | Input $/1M | Output $/1M | Best For |
|---|---|---|---|---|---|
| GPT-4o mini | OpenAI | 128K | $0.15 | $0.60 | General purpose, high volume |
| Gemini 2.5 Flash | 1M | $0.15 | $0.60 | Reasoning at low cost | |
| Claude Haiku 3.5 | Anthropic | 200K | $0.80 | $4.00 | Fast, instruction-following |
| o4-mini | OpenAI | 200K | $1.10 | $4.40 | Reasoning tasks on budget |
| Mistral Small | Mistral | 32K | $0.20 | $0.60 | European data sovereignty |
| Command R | Cohere | 128K | $0.15 | $0.60 | Enterprise RAG |
Decision Guide: By Use Case
Coding and Software Development
| Need | Recommended Model | Why |
|---|---|---|
| Agentic coding (autonomous) | Claude Opus 4 / Sonnet 4 | Best sustained agentic performance, tool use |
| Complex code reasoning | o3 / o4-mini | Chain-of-thought for complex logic |
| General code assistance | Claude Sonnet 4 / GPT-4o | Fast, accurate, good cost balance |
| Open-source code model | Qwen 2.5 Coder 32B | Best open-weight code model |
| Local code completion | Phi-4 / CodeGemma 7B | Runs on consumer hardware |
Content Writing and Creative Work
| Need | Recommended Model | Why |
|---|---|---|
| Long-form content | Claude Sonnet 4 / Opus 4 | Excellent at following style guides, long output |
| Marketing copy | GPT-4o | Creative, good at varying tone |
| Technical writing | Claude Sonnet 4 | Precise, well-structured output |
| Bulk content generation | GPT-4o mini / Gemini Flash | Low cost for high volume |
Data Analysis and Research
| Need | Recommended Model | Why |
|---|---|---|
| Analyzing large documents | Gemini 2.5 Pro | 1M+ context window |
| Scientific reasoning | o3 / Claude Opus 4 | Deep reasoning capabilities |
| Structured data extraction | GPT-4o / Claude Sonnet 4 | Reliable JSON/structured output |
| Video/audio analysis | Gemini 2.5 Pro | Native multimodal processing |
Decision Guide: By Budget
Free / Minimal Budget
- Best approach: Run open-weight models locally with Ollama
- Models: Llama 3.2 3B (tiny), Qwen 2.5 7B (good), Llama 3.3 70B (best quality if you have the hardware)
- Free API tiers: Google AI Studio (Gemini), OpenAI free tier (limited)
Small Budget ($10-100/month)
- Best approach: Use budget-tier API models
- Models: GPT-4o mini, Gemini 2.5 Flash, Claude Haiku 3.5
- Strategy: Use cheap models for most tasks, reserve frontier models for complex tasks
Production Budget ($100-1000+/month)
- Best approach: Mix of frontier and budget models with intelligent routing
- Strategy: Route simple queries to cheap models, complex queries to frontier models
- Consider: Self-hosting open models for predictable costs at scale
Decision Guide: By Privacy Requirements
| Privacy Level | Approach | Models |
|---|---|---|
| Maximum privacy | Self-hosted, air-gapped | Llama, Qwen, Mistral (open-weight, run locally) |
| Enterprise privacy | API with data processing agreements | Any provider with enterprise agreements (Anthropic, OpenAI, Google Cloud) |
| Standard privacy | API with data retention controls | All major providers offer opt-out of training |
| European sovereignty | EU-hosted providers | Mistral (French), local deployment of open models |
Which LLM Should I Use? Decision Flowchart
Follow this decision process to narrow down your model choice:
-
Must data stay on your infrastructure?
Yes: Use open-weight models (Llama, Qwen, Mistral, DeepSeek). Skip to step 5 for size selection.
No: Continue to step 2.
-
What is your primary use case?
Complex reasoning / math / science: o3, Claude Opus 4, Gemini 2.5 Pro
Coding: Claude Sonnet 4, o4-mini, GPT-4o
General purpose: Continue to step 3
High volume / low cost: GPT-4o mini, Gemini Flash, Haiku 3.5
-
Do you need multimodal capabilities?
Video/audio: Gemini 2.5 Pro (only model with native video)
Images only: Any frontier model (GPT-4o, Claude Sonnet 4, Gemini Pro)
Text only: Continue to step 4
-
What is your budget sensitivity?
Cost is critical: GPT-4o mini ($0.15/$0.60) or Gemini 2.5 Flash ($0.15/$0.60)
Balanced: Claude Sonnet 4 ($3/$15) or GPT-4o ($2.50/$10)
Quality is paramount: Claude Opus 4, o3, or Gemini 2.5 Pro
-
For self-hosted: what hardware do you have?
Phone / laptop CPU: Llama 3.2 1B-3B, Phi-3 Mini
Single GPU (8-24GB): Qwen 2.5 7B, Gemma 2 9B, Phi-4, quantized 70B
Multi-GPU / cloud: Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3
Lilly Tech Systems