Advanced

LLM Comparison Guide

Side-by-side comparison of all major LLMs with a decision guide to help you choose the right model for your use case, budget, and requirements.

Master Comparison Table: Frontier Models

The top-tier models from each provider, compared on key dimensions:

ModelProviderContextMultimodalCodeInput $/1MOutput $/1MOpen
GPT-4oOpenAI128KText, Vision, AudioStrong$2.50$10.00No
o3OpenAI200KText, VisionExcellent$10.00$40.00No
Claude Opus 4Anthropic200KText, VisionExcellent$15.00$75.00No
Claude Sonnet 4Anthropic200KText, VisionExcellent$3.00$15.00No
Gemini 2.5 ProGoogle1M+Text, Vision, Audio, VideoExcellent$1.25$10.00No
Llama 4 MaverickMeta1MText, VisionStrongVaries*Varies*Yes
DeepSeek V3DeepSeek128KTextExcellent$0.27$1.10Yes
Mistral LargeMistral128KTextStrong$2.00$6.00No

*Open-weight models have variable pricing depending on the inference provider. Self-hosted cost depends on your hardware.

Budget-Tier Comparison

Cost-effective models for high-volume or budget-conscious applications:

ModelProviderContextInput $/1MOutput $/1MBest For
GPT-4o miniOpenAI128K$0.15$0.60General purpose, high volume
Gemini 2.5 FlashGoogle1M$0.15$0.60Reasoning at low cost
Claude Haiku 3.5Anthropic200K$0.80$4.00Fast, instruction-following
o4-miniOpenAI200K$1.10$4.40Reasoning tasks on budget
Mistral SmallMistral32K$0.20$0.60European data sovereignty
Command RCohere128K$0.15$0.60Enterprise RAG

Decision Guide: By Use Case

Coding and Software Development

NeedRecommended ModelWhy
Agentic coding (autonomous)Claude Opus 4 / Sonnet 4Best sustained agentic performance, tool use
Complex code reasoningo3 / o4-miniChain-of-thought for complex logic
General code assistanceClaude Sonnet 4 / GPT-4oFast, accurate, good cost balance
Open-source code modelQwen 2.5 Coder 32BBest open-weight code model
Local code completionPhi-4 / CodeGemma 7BRuns on consumer hardware

Content Writing and Creative Work

NeedRecommended ModelWhy
Long-form contentClaude Sonnet 4 / Opus 4Excellent at following style guides, long output
Marketing copyGPT-4oCreative, good at varying tone
Technical writingClaude Sonnet 4Precise, well-structured output
Bulk content generationGPT-4o mini / Gemini FlashLow cost for high volume

Data Analysis and Research

NeedRecommended ModelWhy
Analyzing large documentsGemini 2.5 Pro1M+ context window
Scientific reasoningo3 / Claude Opus 4Deep reasoning capabilities
Structured data extractionGPT-4o / Claude Sonnet 4Reliable JSON/structured output
Video/audio analysisGemini 2.5 ProNative multimodal processing

Decision Guide: By Budget

Free / Minimal Budget

  • Best approach: Run open-weight models locally with Ollama
  • Models: Llama 3.2 3B (tiny), Qwen 2.5 7B (good), Llama 3.3 70B (best quality if you have the hardware)
  • Free API tiers: Google AI Studio (Gemini), OpenAI free tier (limited)

Small Budget ($10-100/month)

  • Best approach: Use budget-tier API models
  • Models: GPT-4o mini, Gemini 2.5 Flash, Claude Haiku 3.5
  • Strategy: Use cheap models for most tasks, reserve frontier models for complex tasks

Production Budget ($100-1000+/month)

  • Best approach: Mix of frontier and budget models with intelligent routing
  • Strategy: Route simple queries to cheap models, complex queries to frontier models
  • Consider: Self-hosting open models for predictable costs at scale

Decision Guide: By Privacy Requirements

Privacy LevelApproachModels
Maximum privacySelf-hosted, air-gappedLlama, Qwen, Mistral (open-weight, run locally)
Enterprise privacyAPI with data processing agreementsAny provider with enterprise agreements (Anthropic, OpenAI, Google Cloud)
Standard privacyAPI with data retention controlsAll major providers offer opt-out of training
European sovereigntyEU-hosted providersMistral (French), local deployment of open models

Which LLM Should I Use? Decision Flowchart

Follow this decision process to narrow down your model choice:

  1. Must data stay on your infrastructure?

    Yes: Use open-weight models (Llama, Qwen, Mistral, DeepSeek). Skip to step 5 for size selection.

    No: Continue to step 2.

  2. What is your primary use case?

    Complex reasoning / math / science: o3, Claude Opus 4, Gemini 2.5 Pro

    Coding: Claude Sonnet 4, o4-mini, GPT-4o

    General purpose: Continue to step 3

    High volume / low cost: GPT-4o mini, Gemini Flash, Haiku 3.5

  3. Do you need multimodal capabilities?

    Video/audio: Gemini 2.5 Pro (only model with native video)

    Images only: Any frontier model (GPT-4o, Claude Sonnet 4, Gemini Pro)

    Text only: Continue to step 4

  4. What is your budget sensitivity?

    Cost is critical: GPT-4o mini ($0.15/$0.60) or Gemini 2.5 Flash ($0.15/$0.60)

    Balanced: Claude Sonnet 4 ($3/$15) or GPT-4o ($2.50/$10)

    Quality is paramount: Claude Opus 4, o3, or Gemini 2.5 Pro

  5. For self-hosted: what hardware do you have?

    Phone / laptop CPU: Llama 3.2 1B-3B, Phi-3 Mini

    Single GPU (8-24GB): Qwen 2.5 7B, Gemma 2 9B, Phi-4, quantized 70B

    Multi-GPU / cloud: Llama 3.3 70B, Qwen 2.5 72B, DeepSeek V3

Best practice: Do not commit to a single model. Build your application with an abstraction layer that allows easy model switching. Use cheap models for development and testing, frontier models for production. Re-evaluate monthly as new models release frequently.
Pricing changes frequently. The prices listed in this directory are approximate and based on information available as of early 2025. Always check the provider's current pricing page before making budget decisions. Most providers adjust pricing as they release new models.