Reference

Meta Llama Models

Meta's Llama family is the most influential open-weight model series, powering thousands of applications and fine-tunes. From the original Llama to the latest Llama 4 with mixture-of-experts architecture.

Llama 4 Series

Meta's latest generation introduces a Mixture-of-Experts (MoE) architecture, enabling massive total parameter counts while keeping inference efficient by only activating a subset of parameters per token.

ModelReleasedTotal ParamsActive ParamsContextLicense
Llama 4 ScoutApr 2025109B (16 experts)17B10MLlama 4 Community License
Llama 4 MaverickApr 2025400B (128 experts)17B1MLlama 4 Community License

Llama 4 Scout

A 16-expert MoE model with an industry-leading 10 million token context window. Designed for applications requiring massive context, such as processing entire repositories or long document collections.

  • Best for: Long-context applications, document analysis, codebase processing
  • Key features: 10M context window, multimodal (text + images), MoE architecture, fits on a single H100 node
  • Where to run: Cloud providers (AWS, Azure, GCP), Together AI, Fireworks AI, Groq

Llama 4 Maverick

A larger 128-expert MoE model offering stronger performance on reasoning and coding tasks. Despite 400B total parameters, the active parameter count per token is only 17B, enabling efficient inference.

  • Best for: Complex reasoning, coding, multilingual tasks, high-quality generation
  • Key features: 128 experts, 1M context, multimodal, strong multilingual performance
  • Where to run: Cloud providers, dedicated inference endpoints

Llama 3.3

ModelReleasedParametersContextLicense
Llama 3.3 70BDec 202470B128KLlama 3.3 Community License

A refined 70B model that matches the performance of the much larger Llama 3.1 405B on many benchmarks. Represents a significant efficiency improvement — getting 405B-level quality from a 70B model.

  • Best for: High-quality open-weight deployment where 405B is too expensive to run
  • Where to run: 2x A100 80GB, cloud providers, Ollama, vLLM

Llama 3.2 Series

Introduced lightweight and multimodal models to the Llama family.

ModelReleasedParametersContextMultimodal
Llama 3.2 90B VisionSep 202490B128KText + Vision
Llama 3.2 11B VisionSep 202411B128KText + Vision
Llama 3.2 3BSep 20243B128KText only
Llama 3.2 1BSep 20241B128KText only

Llama 3.2 Vision Models (90B, 11B)

The first multimodal Llama models. Can process images alongside text for tasks like image captioning, visual Q&A, and document understanding.

  • Best for: Multimodal applications, image understanding, document OCR
  • 11B: Runs on a single GPU, good for development and lightweight deployment
  • 90B: Highest quality multimodal open-weight model

Llama 3.2 Lightweight Models (3B, 1B)

Tiny models designed for on-device and edge deployment. Can run on mobile phones and IoT devices.

  • Best for: Mobile apps, edge computing, IoT, privacy-sensitive on-device inference
  • Where to run: Phones, laptops, Raspberry Pi, any device with 2-4GB RAM

Llama 3.1 Series

ModelReleasedParametersContextLicense
Llama 3.1 405BJul 2024405B128KLlama 3.1 Community License
Llama 3.1 70BJul 202470B128KLlama 3.1 Community License
Llama 3.1 8BJul 20248B128KLlama 3.1 Community License

Llama 3.1 405B

The largest open-weight dense model ever released. Competitive with GPT-4o and Claude 3.5 Sonnet at launch. A landmark for the open model community.

  • Best for: Synthetic data generation, model distillation, highest quality open-weight inference
  • Where to run: 8x A100/H100 GPUs or cloud providers

Llama 3 Series

ModelReleasedParametersContext
Llama 3 70BApr 202470B8K
Llama 3 8BApr 20248B8K

The Llama 3 base models with 8K context. Superseded by Llama 3.1 which extended context to 128K and improved quality.

Llama 2 Series (Legacy)

ModelReleasedParametersContext
Llama 2 70BJul 202370B4K
Llama 2 13BJul 202313B4K
Llama 2 7BJul 20237B4K

The first commercially licensable Llama models. Sparked a massive wave of fine-tuning and community development. Still widely used as a baseline for research.

Code Llama

ModelReleasedParametersContextSpecialization
Code LlamaAug 20237B / 13B / 34B / 70B16K (100K for 7B/13B)General code
Code Llama PythonAug 20237B / 13B / 34B / 70B16KPython-specific
Code Llama InstructAug 20237B / 13B / 34B / 70B16KInstruction-tuned for chat

Code-specialized models fine-tuned from Llama 2. Support fill-in-the-middle (code completion) and long context for code. Largely superseded by Llama 3+ models which have strong native coding abilities.

💡
Llama License: The Llama Community License is not a standard open-source license. Key restrictions: organizations with 700M+ monthly active users must request a special license from Meta. The license permits commercial use, modification, and redistribution with attribution. Read the full license at llama.meta.com.

Where to Run Llama Models

  • Local: Ollama, llama.cpp, LM Studio, GPT4All
  • Cloud inference: Together AI, Fireworks AI, Groq, Anyscale, Replicate
  • Cloud hosting: AWS SageMaker, Azure ML, Google Cloud Vertex AI
  • Fine-tuning: Hugging Face, Axolotl, Unsloth, Modal