Reference

Other LLM Providers

Beyond the big four (OpenAI, Anthropic, Google, Meta), a rich ecosystem of LLM providers offers compelling models for different needs, from European sovereignty to code specialization to cost efficiency.

Mistral AI

A French AI company that has rapidly become one of the most important LLM providers, known for efficient open-weight models and a strong commercial API.

Model	Released	Parameters	Context	Open Weight	Pricing (Input/Output per 1M)
Mistral Large	Feb 2024 (v2 Jul 2024)	~123B	128K	No	$2.00 / $6.00
Mistral Medium	Dec 2023	Undisclosed	32K	No	$2.70 / $8.10
Mistral Small	Feb 2024	~22B	32K	No	$0.20 / $0.60
Mixtral 8x22B	Apr 2024	141B (39B active)	64K	Yes (Apache 2.0)	$0.90 / $0.90
Mixtral 8x7B	Dec 2023	46.7B (12.9B active)	32K	Yes (Apache 2.0)	$0.24 / $0.24
Mistral 7B	Sep 2023	7.3B	32K	Yes (Apache 2.0)	$0.06 / $0.06
Codestral	May 2024	22B	32K	Non-commercial	$0.20 / $0.60
Pixtral	Sep 2024	12B	128K	Yes (Apache 2.0)	$0.15 / $0.15

✅

Why Mistral matters: Mistral pioneered the efficient MoE (Mixture of Experts) approach with Mixtral, proving that sparse models can match dense models at a fraction of the compute cost. Their open-weight models (Apache 2.0) are among the most permissively licensed in the industry.

DeepSeek

A Chinese AI company that gained global attention by producing frontier-competitive models at remarkably low training costs.

Model	Released	Parameters	Context	License
DeepSeek V3	Jan 2025	671B MoE (37B active)	128K	MIT
DeepSeek V2	May 2024	236B MoE (21B active)	128K	MIT
DeepSeek Coder V2	Jun 2024	236B MoE	128K	MIT

DeepSeek V3

Reportedly trained for just $5.6 million in compute, DeepSeek V3 competes with models that cost 10-100x more to train. Uses novel architectures including Multi-head Latent Attention (MLA) and an auxiliary-loss-free load balancing strategy.

Best for: General purpose, coding, math, research
Standout: Exceptional cost-efficiency in both training and inference

Qwen (Alibaba Cloud)

Alibaba's open-weight model family, among the strongest multilingual models with particular excellence in Chinese and English.

Model	Released	Parameters	Context	License
Qwen 2.5 72B	Sep 2024	72B	128K	Apache 2.0
Qwen 2.5 32B	Sep 2024	32B	128K	Apache 2.0
Qwen 2.5 7B	Sep 2024	7B	128K	Apache 2.0
Qwen 2.5 Coder 32B	Nov 2024	32B	128K	Apache 2.0
QwQ 32B	Nov 2024	32B	32K	Apache 2.0

QwQ (Qwen with Questions)

A reasoning-focused model similar to OpenAI's o1 concept. Uses extended thinking to reason through complex problems before answering. Surprisingly strong for a 32B model.

Microsoft Phi Models

Microsoft Research's small language models that punch well above their weight class.

Model	Released	Parameters	Context	License
Phi-4	Dec 2024	14B	16K	MIT
Phi-3 Medium	May 2024	14B	128K	MIT
Phi-3 Small	May 2024	7B	128K	MIT
Phi-3 Mini	Apr 2024	3.8B	128K	MIT

Phi models demonstrate that small models trained on high-quality data can match much larger models. Phi-4 at 14B rivals models 5-10x its size on reasoning benchmarks.

Cohere

Enterprise-focused AI company with models optimized for business applications, particularly RAG and enterprise search.

Model	Released	Context	Pricing (Input/Output per 1M)
Command R+	Apr 2024	128K	$2.50 / $10.00
Command R	Mar 2024	128K	$0.15 / $0.60
Embed v3	Nov 2023	512	$0.10 per 1M tokens

Best for: Enterprise RAG, search, multilingual business applications
Key features: Built-in RAG with citation, strong multilingual (100+ languages), grounded generation

xAI (Grok)

Elon Musk's AI company, integrated with the X (Twitter) platform.

Model	Released	Context	Key Feature
Grok-2	Aug 2024	128K	Real-time X data access, image generation
Grok-1.5	Mar 2024	128K	Vision, strong reasoning

Best for: Real-time information, social media analysis, conversational AI with personality
Note: Grok-1 (314B MoE) was open-sourced under Apache 2.0

AI21 Labs (Jamba)

Model	Released	Parameters	Context	Architecture
Jamba 1.5 Large	Aug 2024	398B MoE (94B active)	256K	SSM-Transformer Hybrid
Jamba 1.5 Mini	Aug 2024	52B MoE (12B active)	256K	SSM-Transformer Hybrid

Jamba uses a novel hybrid architecture combining Structured State Space Models (SSMs, specifically Mamba) with Transformer layers. This enables very long context windows with efficient memory usage.

Amazon (Titan)

Model	Type	Available Via
Titan Text Express	Text generation	Amazon Bedrock
Titan Text Lite	Lightweight text	Amazon Bedrock
Titan Embeddings V2	Embeddings	Amazon Bedrock
Titan Image Generator	Image generation	Amazon Bedrock

Amazon's own models, available exclusively through AWS Bedrock. Positioned as enterprise-grade with AWS integration rather than frontier capability.

Inflection (Pi)

Inflection 2.5: Powers the Pi assistant. Known for empathetic, conversational responses
Focus: Personal AI assistant, emotional intelligence, natural conversation
Note: Most of Inflection's team joined Microsoft in 2024; the company pivoted to enterprise AI

Stability AI

Stable LM 2 (1.6B, 12B): Open-weight text models
Stable Diffusion XL / 3: Leading open-source image generation models
Stable Audio: Audio generation
Stable Video: Video generation
License: Various (Stability AI Community License for most models)

💡

Rapidly evolving space: New providers and models emerge frequently. This page covers the major players as of early 2025. Check provider websites for the latest model releases and pricing updates.

← Previous Meta Models Next → Open Source Models