Google Models
The full Google DeepMind model lineup: Gemini (commercial API), Gemma (open-weight), and PaLM 2 (legacy). Google offers the largest context windows in the industry — up to 2 million tokens.
Gemini 2.5 Series
Google's latest and most capable models, featuring native "thinking" capabilities for enhanced reasoning.
| Model | Released | Context | Max Output | Multimodal | Pricing (Input/Output per 1M) |
|---|---|---|---|---|---|
| Gemini 2.5 Pro | Mar 2025 | 1M (2M preview) | 65,536 | Text, Vision, Audio, Video | $1.25 / $10.00 (≤200K), $2.50 / $15.00 (>200K) |
| Gemini 2.5 Flash | Mar 2025 | 1M | 65,536 | Text, Vision, Audio, Video | $0.15 / $0.60 (non-thinking), $0.30 / $2.50 (thinking) |
Gemini 2.5 Pro
Google's most intelligent model. Features built-in "thinking" that allows the model to reason through complex problems before responding. Leads many public benchmarks in coding, math, and science.
- Best for: Complex reasoning, coding, math, scientific analysis, long-document processing
- Key features: Native thinking mode, 1M context (2M in preview), processes text/images/audio/video natively, tool use, code execution
- Standout: Industry-leading context window allows processing of entire codebases or hundreds of documents at once
Gemini 2.5 Flash
A fast, efficient thinking model. Offers strong reasoning capabilities at much lower cost and latency than 2.5 Pro. The "thinking" can be toggled or given a token budget for cost control.
- Best for: Production applications needing reasoning, high-volume processing, cost-sensitive deployments
- Key features: Configurable thinking budget, very fast, multimodal
Gemini 2.0 Series
| Model | Released | Context | Multimodal | Pricing (Input/Output per 1M) |
|---|---|---|---|---|
| Gemini 2.0 Flash | Dec 2024 | 1M | Text, Vision, Audio, Video | $0.10 / $0.40 |
Gemini 2.0 Flash
Introduced multimodal output (text, images, audio) and agentic capabilities. Designed for building AI agents with tool use, code execution, and Google Search grounding.
- Best for: Agent-based applications, multimodal generation, real-time interactions
- Key features: Native tool use, multimodal output, Google Search grounding
Gemini 1.5 Series
| Model | Released | Context | Multimodal | Pricing (Input/Output per 1M) |
|---|---|---|---|---|
| Gemini 1.5 Pro | Feb 2024 | 2M | Text, Vision, Audio, Video | $1.25 / $5.00 (≤128K), $2.50 / $10.00 (>128K) |
| Gemini 1.5 Flash | May 2024 | 1M | Text, Vision, Audio, Video | $0.075 / $0.30 |
Gemini 1.5 Pro
The first model to offer a 2 million token context window, enough to process multiple entire books or hours of video at once. Pioneered the "long context" paradigm.
- Best for: Very long document analysis, video understanding, large codebase processing
- Historical significance: Demonstrated that million-token contexts could be practical and useful
Gemini 1.5 Flash
A lighter, faster variant of 1.5 Pro. Optimized for speed and efficiency while maintaining strong capabilities across modalities.
Gemini 1.0 & Legacy
Gemini 1.0 Pro
- Released: December 2023
- Context: 32K tokens
- Status: Superseded by newer Gemini models
- Significance: Google's first Gemini model; replaced PaLM 2 in most applications
Gemini Nano
- Released: December 2023
- Sizes: Nano-1 (1.8B) and Nano-2 (3.25B)
- Context: 32K tokens
- Purpose: On-device model for mobile phones (Pixel, Samsung Galaxy). Runs locally without internet. Used for Smart Reply, summarization, and other device features.
PaLM 2
- Released: May 2023
- Variants: Gecko (smallest), Otter, Bison, Unicorn (largest)
- Significance: Preceded Gemini. Powered early Bard, MakerSuite, and Google Cloud AI
- Status: Deprecated in favor of Gemini
Gemma Family (Open-Weight)
Gemma is Google's family of open-weight models based on the same research and technology as Gemini but released for the community to use, fine-tune, and deploy.
| Model | Released | Parameters | Context | License |
|---|---|---|---|---|
| Gemma 2 27B | Jun 2024 | 27B | 8K | Gemma License |
| Gemma 2 9B | Jun 2024 | 9B | 8K | Gemma License |
| Gemma 2 2B | Jun 2024 | 2B | 8K | Gemma License |
| Gemma 7B | Feb 2024 | 7B | 8K | Gemma License |
| Gemma 2B | Feb 2024 | 2B | 8K | Gemma License |
| CodeGemma | Apr 2024 | 2B / 7B | 8K | Gemma License |
| RecurrentGemma | Apr 2024 | 2B / 9B | 8K | Gemma License |
Gemma 2
The second generation of Google's open models. Gemma 2 27B is particularly notable for outperforming many larger models. Uses a novel architecture with alternating local and global attention layers.
- Best for: Self-hosted applications, fine-tuning, research, edge deployment (2B)
- Key features: Strong benchmark scores for size class, instruction-tuned variants available
CodeGemma
Specialized for code generation and understanding. Available in 2B (fill-in-the-middle) and 7B (code generation) variants.
RecurrentGemma
An experimental model using a novel recurrent architecture (Griffin) instead of pure Transformers. Designed for efficient inference on long sequences.
Lilly Tech Systems