Reference

Other LLM Providers

Beyond the big four (OpenAI, Anthropic, Google, Meta), a rich ecosystem of LLM providers offers compelling models for different needs, from European sovereignty to code specialization to cost efficiency.

Mistral AI

A French AI company that has rapidly become one of the most important LLM providers, known for efficient open-weight models and a strong commercial API.

ModelReleasedParametersContextOpen WeightPricing (Input/Output per 1M)
Mistral LargeFeb 2024 (v2 Jul 2024)~123B128KNo$2.00 / $6.00
Mistral MediumDec 2023Undisclosed32KNo$2.70 / $8.10
Mistral SmallFeb 2024~22B32KNo$0.20 / $0.60
Mixtral 8x22BApr 2024141B (39B active)64KYes (Apache 2.0)$0.90 / $0.90
Mixtral 8x7BDec 202346.7B (12.9B active)32KYes (Apache 2.0)$0.24 / $0.24
Mistral 7BSep 20237.3B32KYes (Apache 2.0)$0.06 / $0.06
CodestralMay 202422B32KNon-commercial$0.20 / $0.60
PixtralSep 202412B128KYes (Apache 2.0)$0.15 / $0.15
Why Mistral matters: Mistral pioneered the efficient MoE (Mixture of Experts) approach with Mixtral, proving that sparse models can match dense models at a fraction of the compute cost. Their open-weight models (Apache 2.0) are among the most permissively licensed in the industry.

DeepSeek

A Chinese AI company that gained global attention by producing frontier-competitive models at remarkably low training costs.

ModelReleasedParametersContextLicense
DeepSeek V3Jan 2025671B MoE (37B active)128KMIT
DeepSeek V2May 2024236B MoE (21B active)128KMIT
DeepSeek Coder V2Jun 2024236B MoE128KMIT

DeepSeek V3

Reportedly trained for just $5.6 million in compute, DeepSeek V3 competes with models that cost 10-100x more to train. Uses novel architectures including Multi-head Latent Attention (MLA) and an auxiliary-loss-free load balancing strategy.

  • Best for: General purpose, coding, math, research
  • Standout: Exceptional cost-efficiency in both training and inference

Qwen (Alibaba Cloud)

Alibaba's open-weight model family, among the strongest multilingual models with particular excellence in Chinese and English.

ModelReleasedParametersContextLicense
Qwen 2.5 72BSep 202472B128KApache 2.0
Qwen 2.5 32BSep 202432B128KApache 2.0
Qwen 2.5 7BSep 20247B128KApache 2.0
Qwen 2.5 Coder 32BNov 202432B128KApache 2.0
QwQ 32BNov 202432B32KApache 2.0

QwQ (Qwen with Questions)

A reasoning-focused model similar to OpenAI's o1 concept. Uses extended thinking to reason through complex problems before answering. Surprisingly strong for a 32B model.

Microsoft Phi Models

Microsoft Research's small language models that punch well above their weight class.

ModelReleasedParametersContextLicense
Phi-4Dec 202414B16KMIT
Phi-3 MediumMay 202414B128KMIT
Phi-3 SmallMay 20247B128KMIT
Phi-3 MiniApr 20243.8B128KMIT

Phi models demonstrate that small models trained on high-quality data can match much larger models. Phi-4 at 14B rivals models 5-10x its size on reasoning benchmarks.

Cohere

Enterprise-focused AI company with models optimized for business applications, particularly RAG and enterprise search.

ModelReleasedContextPricing (Input/Output per 1M)
Command R+Apr 2024128K$2.50 / $10.00
Command RMar 2024128K$0.15 / $0.60
Embed v3Nov 2023512$0.10 per 1M tokens
  • Best for: Enterprise RAG, search, multilingual business applications
  • Key features: Built-in RAG with citation, strong multilingual (100+ languages), grounded generation

xAI (Grok)

Elon Musk's AI company, integrated with the X (Twitter) platform.

ModelReleasedContextKey Feature
Grok-2Aug 2024128KReal-time X data access, image generation
Grok-1.5Mar 2024128KVision, strong reasoning
  • Best for: Real-time information, social media analysis, conversational AI with personality
  • Note: Grok-1 (314B MoE) was open-sourced under Apache 2.0

AI21 Labs (Jamba)

ModelReleasedParametersContextArchitecture
Jamba 1.5 LargeAug 2024398B MoE (94B active)256KSSM-Transformer Hybrid
Jamba 1.5 MiniAug 202452B MoE (12B active)256KSSM-Transformer Hybrid

Jamba uses a novel hybrid architecture combining Structured State Space Models (SSMs, specifically Mamba) with Transformer layers. This enables very long context windows with efficient memory usage.

Amazon (Titan)

ModelTypeAvailable Via
Titan Text ExpressText generationAmazon Bedrock
Titan Text LiteLightweight textAmazon Bedrock
Titan Embeddings V2EmbeddingsAmazon Bedrock
Titan Image GeneratorImage generationAmazon Bedrock

Amazon's own models, available exclusively through AWS Bedrock. Positioned as enterprise-grade with AWS integration rather than frontier capability.

Inflection (Pi)

  • Inflection 2.5: Powers the Pi assistant. Known for empathetic, conversational responses
  • Focus: Personal AI assistant, emotional intelligence, natural conversation
  • Note: Most of Inflection's team joined Microsoft in 2024; the company pivoted to enterprise AI

Stability AI

  • Stable LM 2 (1.6B, 12B): Open-weight text models
  • Stable Diffusion XL / 3: Leading open-source image generation models
  • Stable Audio: Audio generation
  • Stable Video: Video generation
  • License: Various (Stability AI Community License for most models)
💡
Rapidly evolving space: New providers and models emerge frequently. This page covers the major players as of early 2025. Check provider websites for the latest model releases and pricing updates.