Reference

Meta Llama Models

Meta's Llama family is the most influential open-weight model series, powering thousands of applications and fine-tunes. From the original Llama to the latest Llama 4 with mixture-of-experts architecture.

Llama 4 Series

Meta's latest generation introduces a Mixture-of-Experts (MoE) architecture, enabling massive total parameter counts while keeping inference efficient by only activating a subset of parameters per token.

Model	Released	Total Params	Active Params	Context	License
Llama 4 Scout	Apr 2025	109B (16 experts)	17B	10M	Llama 4 Community License
Llama 4 Maverick	Apr 2025	400B (128 experts)	17B	1M	Llama 4 Community License

Llama 4 Scout

A 16-expert MoE model with an industry-leading 10 million token context window. Designed for applications requiring massive context, such as processing entire repositories or long document collections.

Best for: Long-context applications, document analysis, codebase processing
Key features: 10M context window, multimodal (text + images), MoE architecture, fits on a single H100 node
Where to run: Cloud providers (AWS, Azure, GCP), Together AI, Fireworks AI, Groq

Llama 4 Maverick

A larger 128-expert MoE model offering stronger performance on reasoning and coding tasks. Despite 400B total parameters, the active parameter count per token is only 17B, enabling efficient inference.

Best for: Complex reasoning, coding, multilingual tasks, high-quality generation
Key features: 128 experts, 1M context, multimodal, strong multilingual performance
Where to run: Cloud providers, dedicated inference endpoints

Llama 3.3

Model	Released	Parameters	Context	License
Llama 3.3 70B	Dec 2024	70B	128K	Llama 3.3 Community License

A refined 70B model that matches the performance of the much larger Llama 3.1 405B on many benchmarks. Represents a significant efficiency improvement — getting 405B-level quality from a 70B model.

Best for: High-quality open-weight deployment where 405B is too expensive to run
Where to run: 2x A100 80GB, cloud providers, Ollama, vLLM

Llama 3.2 Series

Introduced lightweight and multimodal models to the Llama family.

Model	Released	Parameters	Context	Multimodal
Llama 3.2 90B Vision	Sep 2024	90B	128K	Text + Vision
Llama 3.2 11B Vision	Sep 2024	11B	128K	Text + Vision
Llama 3.2 3B	Sep 2024	3B	128K	Text only
Llama 3.2 1B	Sep 2024	1B	128K	Text only

Llama 3.2 Vision Models (90B, 11B)

The first multimodal Llama models. Can process images alongside text for tasks like image captioning, visual Q&A, and document understanding.

Best for: Multimodal applications, image understanding, document OCR
11B: Runs on a single GPU, good for development and lightweight deployment
90B: Highest quality multimodal open-weight model

Llama 3.2 Lightweight Models (3B, 1B)

Tiny models designed for on-device and edge deployment. Can run on mobile phones and IoT devices.

Best for: Mobile apps, edge computing, IoT, privacy-sensitive on-device inference
Where to run: Phones, laptops, Raspberry Pi, any device with 2-4GB RAM

Llama 3.1 Series

Model	Released	Parameters	Context	License
Llama 3.1 405B	Jul 2024	405B	128K	Llama 3.1 Community License
Llama 3.1 70B	Jul 2024	70B	128K	Llama 3.1 Community License
Llama 3.1 8B	Jul 2024	8B	128K	Llama 3.1 Community License

Llama 3.1 405B

The largest open-weight dense model ever released. Competitive with GPT-4o and Claude 3.5 Sonnet at launch. A landmark for the open model community.

Best for: Synthetic data generation, model distillation, highest quality open-weight inference
Where to run: 8x A100/H100 GPUs or cloud providers

Llama 3 Series

Model	Released	Parameters	Context
Llama 3 70B	Apr 2024	70B	8K
Llama 3 8B	Apr 2024	8B	8K

The Llama 3 base models with 8K context. Superseded by Llama 3.1 which extended context to 128K and improved quality.

Llama 2 Series (Legacy)

Model	Released	Parameters	Context
Llama 2 70B	Jul 2023	70B	4K
Llama 2 13B	Jul 2023	13B	4K
Llama 2 7B	Jul 2023	7B	4K

The first commercially licensable Llama models. Sparked a massive wave of fine-tuning and community development. Still widely used as a baseline for research.

Code Llama

Model	Released	Parameters	Context	Specialization
Code Llama	Aug 2023	7B / 13B / 34B / 70B	16K (100K for 7B/13B)	General code
Code Llama Python	Aug 2023	7B / 13B / 34B / 70B	16K	Python-specific
Code Llama Instruct	Aug 2023	7B / 13B / 34B / 70B	16K	Instruction-tuned for chat

Code-specialized models fine-tuned from Llama 2. Support fill-in-the-middle (code completion) and long context for code. Largely superseded by Llama 3+ models which have strong native coding abilities.

💡

Llama License: The Llama Community License is not a standard open-source license. Key restrictions: organizations with 700M+ monthly active users must request a special license from Meta. The license permits commercial use, modification, and redistribution with attribution. Read the full license at llama.meta.com.

Where to Run Llama Models

Local: Ollama, llama.cpp, LM Studio, GPT4All
Cloud inference: Together AI, Fireworks AI, Groq, Anyscale, Replicate
Cloud hosting: AWS SageMaker, Azure ML, Google Cloud Vertex AI
Fine-tuning: Hugging Face, Axolotl, Unsloth, Modal

← Previous Google Models Next → Other Providers