Other LLM Providers
Beyond the big four (OpenAI, Anthropic, Google, Meta), a rich ecosystem of LLM providers offers compelling models for different needs, from European sovereignty to code specialization to cost efficiency.
Mistral AI
A French AI company that has rapidly become one of the most important LLM providers, known for efficient open-weight models and a strong commercial API.
| Model | Released | Parameters | Context | Open Weight | Pricing (Input/Output per 1M) |
|---|---|---|---|---|---|
| Mistral Large | Feb 2024 (v2 Jul 2024) | ~123B | 128K | No | $2.00 / $6.00 |
| Mistral Medium | Dec 2023 | Undisclosed | 32K | No | $2.70 / $8.10 |
| Mistral Small | Feb 2024 | ~22B | 32K | No | $0.20 / $0.60 |
| Mixtral 8x22B | Apr 2024 | 141B (39B active) | 64K | Yes (Apache 2.0) | $0.90 / $0.90 |
| Mixtral 8x7B | Dec 2023 | 46.7B (12.9B active) | 32K | Yes (Apache 2.0) | $0.24 / $0.24 |
| Mistral 7B | Sep 2023 | 7.3B | 32K | Yes (Apache 2.0) | $0.06 / $0.06 |
| Codestral | May 2024 | 22B | 32K | Non-commercial | $0.20 / $0.60 |
| Pixtral | Sep 2024 | 12B | 128K | Yes (Apache 2.0) | $0.15 / $0.15 |
DeepSeek
A Chinese AI company that gained global attention by producing frontier-competitive models at remarkably low training costs.
| Model | Released | Parameters | Context | License |
|---|---|---|---|---|
| DeepSeek V3 | Jan 2025 | 671B MoE (37B active) | 128K | MIT |
| DeepSeek V2 | May 2024 | 236B MoE (21B active) | 128K | MIT |
| DeepSeek Coder V2 | Jun 2024 | 236B MoE | 128K | MIT |
DeepSeek V3
Reportedly trained for just $5.6 million in compute, DeepSeek V3 competes with models that cost 10-100x more to train. Uses novel architectures including Multi-head Latent Attention (MLA) and an auxiliary-loss-free load balancing strategy.
- Best for: General purpose, coding, math, research
- Standout: Exceptional cost-efficiency in both training and inference
Qwen (Alibaba Cloud)
Alibaba's open-weight model family, among the strongest multilingual models with particular excellence in Chinese and English.
| Model | Released | Parameters | Context | License |
|---|---|---|---|---|
| Qwen 2.5 72B | Sep 2024 | 72B | 128K | Apache 2.0 |
| Qwen 2.5 32B | Sep 2024 | 32B | 128K | Apache 2.0 |
| Qwen 2.5 7B | Sep 2024 | 7B | 128K | Apache 2.0 |
| Qwen 2.5 Coder 32B | Nov 2024 | 32B | 128K | Apache 2.0 |
| QwQ 32B | Nov 2024 | 32B | 32K | Apache 2.0 |
QwQ (Qwen with Questions)
A reasoning-focused model similar to OpenAI's o1 concept. Uses extended thinking to reason through complex problems before answering. Surprisingly strong for a 32B model.
Microsoft Phi Models
Microsoft Research's small language models that punch well above their weight class.
| Model | Released | Parameters | Context | License |
|---|---|---|---|---|
| Phi-4 | Dec 2024 | 14B | 16K | MIT |
| Phi-3 Medium | May 2024 | 14B | 128K | MIT |
| Phi-3 Small | May 2024 | 7B | 128K | MIT |
| Phi-3 Mini | Apr 2024 | 3.8B | 128K | MIT |
Phi models demonstrate that small models trained on high-quality data can match much larger models. Phi-4 at 14B rivals models 5-10x its size on reasoning benchmarks.
Cohere
Enterprise-focused AI company with models optimized for business applications, particularly RAG and enterprise search.
| Model | Released | Context | Pricing (Input/Output per 1M) |
|---|---|---|---|
| Command R+ | Apr 2024 | 128K | $2.50 / $10.00 |
| Command R | Mar 2024 | 128K | $0.15 / $0.60 |
| Embed v3 | Nov 2023 | 512 | $0.10 per 1M tokens |
- Best for: Enterprise RAG, search, multilingual business applications
- Key features: Built-in RAG with citation, strong multilingual (100+ languages), grounded generation
xAI (Grok)
Elon Musk's AI company, integrated with the X (Twitter) platform.
| Model | Released | Context | Key Feature |
|---|---|---|---|
| Grok-2 | Aug 2024 | 128K | Real-time X data access, image generation |
| Grok-1.5 | Mar 2024 | 128K | Vision, strong reasoning |
- Best for: Real-time information, social media analysis, conversational AI with personality
- Note: Grok-1 (314B MoE) was open-sourced under Apache 2.0
AI21 Labs (Jamba)
| Model | Released | Parameters | Context | Architecture |
|---|---|---|---|---|
| Jamba 1.5 Large | Aug 2024 | 398B MoE (94B active) | 256K | SSM-Transformer Hybrid |
| Jamba 1.5 Mini | Aug 2024 | 52B MoE (12B active) | 256K | SSM-Transformer Hybrid |
Jamba uses a novel hybrid architecture combining Structured State Space Models (SSMs, specifically Mamba) with Transformer layers. This enables very long context windows with efficient memory usage.
Amazon (Titan)
| Model | Type | Available Via |
|---|---|---|
| Titan Text Express | Text generation | Amazon Bedrock |
| Titan Text Lite | Lightweight text | Amazon Bedrock |
| Titan Embeddings V2 | Embeddings | Amazon Bedrock |
| Titan Image Generator | Image generation | Amazon Bedrock |
Amazon's own models, available exclusively through AWS Bedrock. Positioned as enterprise-grade with AWS integration rather than frontier capability.
Inflection (Pi)
- Inflection 2.5: Powers the Pi assistant. Known for empathetic, conversational responses
- Focus: Personal AI assistant, emotional intelligence, natural conversation
- Note: Most of Inflection's team joined Microsoft in 2024; the company pivoted to enterprise AI
Stability AI
- Stable LM 2 (1.6B, 12B): Open-weight text models
- Stable Diffusion XL / 3: Leading open-source image generation models
- Stable Audio: Audio generation
- Stable Video: Video generation
- License: Various (Stability AI Community License for most models)
Lilly Tech Systems