Reference

Google Models

The full Google DeepMind model lineup: Gemini (commercial API), Gemma (open-weight), and PaLM 2 (legacy). Google offers the largest context windows in the industry — up to 2 million tokens.

Gemini 2.5 Series

Google's latest and most capable models, featuring native "thinking" capabilities for enhanced reasoning.

Model	Released	Context	Max Output	Multimodal	Pricing (Input/Output per 1M)
Gemini 2.5 Pro	Mar 2025	1M (2M preview)	65,536	Text, Vision, Audio, Video	$1.25 / $10.00 (≤200K), $2.50 / $15.00 (>200K)
Gemini 2.5 Flash	Mar 2025	1M	65,536	Text, Vision, Audio, Video	$0.15 / $0.60 (non-thinking), $0.30 / $2.50 (thinking)

Gemini 2.5 Pro

Google's most intelligent model. Features built-in "thinking" that allows the model to reason through complex problems before responding. Leads many public benchmarks in coding, math, and science.

Best for: Complex reasoning, coding, math, scientific analysis, long-document processing
Key features: Native thinking mode, 1M context (2M in preview), processes text/images/audio/video natively, tool use, code execution
Standout: Industry-leading context window allows processing of entire codebases or hundreds of documents at once

Gemini 2.5 Flash

A fast, efficient thinking model. Offers strong reasoning capabilities at much lower cost and latency than 2.5 Pro. The "thinking" can be toggled or given a token budget for cost control.

Best for: Production applications needing reasoning, high-volume processing, cost-sensitive deployments
Key features: Configurable thinking budget, very fast, multimodal

Gemini 2.0 Series

Model	Released	Context	Multimodal	Pricing (Input/Output per 1M)
Gemini 2.0 Flash	Dec 2024	1M	Text, Vision, Audio, Video	$0.10 / $0.40

Gemini 2.0 Flash

Introduced multimodal output (text, images, audio) and agentic capabilities. Designed for building AI agents with tool use, code execution, and Google Search grounding.

Best for: Agent-based applications, multimodal generation, real-time interactions
Key features: Native tool use, multimodal output, Google Search grounding

Gemini 1.5 Series

Model	Released	Context	Multimodal	Pricing (Input/Output per 1M)
Gemini 1.5 Pro	Feb 2024	2M	Text, Vision, Audio, Video	$1.25 / $5.00 (≤128K), $2.50 / $10.00 (>128K)
Gemini 1.5 Flash	May 2024	1M	Text, Vision, Audio, Video	$0.075 / $0.30

Gemini 1.5 Pro

The first model to offer a 2 million token context window, enough to process multiple entire books or hours of video at once. Pioneered the "long context" paradigm.

Best for: Very long document analysis, video understanding, large codebase processing
Historical significance: Demonstrated that million-token contexts could be practical and useful

Gemini 1.5 Flash

A lighter, faster variant of 1.5 Pro. Optimized for speed and efficiency while maintaining strong capabilities across modalities.

Gemini 1.0 & Legacy

Gemini 1.0 Pro

Released: December 2023
Context: 32K tokens
Status: Superseded by newer Gemini models
Significance: Google's first Gemini model; replaced PaLM 2 in most applications

Gemini Nano

Released: December 2023
Sizes: Nano-1 (1.8B) and Nano-2 (3.25B)
Context: 32K tokens
Purpose: On-device model for mobile phones (Pixel, Samsung Galaxy). Runs locally without internet. Used for Smart Reply, summarization, and other device features.

PaLM 2

Released: May 2023
Variants: Gecko (smallest), Otter, Bison, Unicorn (largest)
Significance: Preceded Gemini. Powered early Bard, MakerSuite, and Google Cloud AI
Status: Deprecated in favor of Gemini

Gemma Family (Open-Weight)

Gemma is Google's family of open-weight models based on the same research and technology as Gemini but released for the community to use, fine-tune, and deploy.

Model	Released	Parameters	Context	License
Gemma 2 27B	Jun 2024	27B	8K	Gemma License
Gemma 2 9B	Jun 2024	9B	8K	Gemma License
Gemma 2 2B	Jun 2024	2B	8K	Gemma License
Gemma 7B	Feb 2024	7B	8K	Gemma License
Gemma 2B	Feb 2024	2B	8K	Gemma License
CodeGemma	Apr 2024	2B / 7B	8K	Gemma License
RecurrentGemma	Apr 2024	2B / 9B	8K	Gemma License

Gemma 2

The second generation of Google's open models. Gemma 2 27B is particularly notable for outperforming many larger models. Uses a novel architecture with alternating local and global attention layers.

Best for: Self-hosted applications, fine-tuning, research, edge deployment (2B)
Key features: Strong benchmark scores for size class, instruction-tuned variants available

CodeGemma

Specialized for code generation and understanding. Available in 2B (fill-in-the-middle) and 7B (code generation) variants.

RecurrentGemma

An experimental model using a novel recurrent architecture (Griffin) instead of pure Transformers. Designed for efficient inference on long sequences.

✅

Gemma License: The Gemma License allows free commercial use but prohibits using model outputs to train other models that compete with Gemma. It is more permissive than Llama's license but more restrictive than Apache 2.0. Read the full license before building commercial products.

💡

Google's API access: Gemini models are available through Google AI Studio (free tier available) and Vertex AI (enterprise). Gemma models can be downloaded from Hugging Face or Kaggle.

← Previous Anthropic Models Next → Meta Models