llama.cpp Mastery

Master llama.cpp — the C++ inference engine that runs LLMs on anything. Learn GGUF, quantization formats, Metal/CUDA backends, and tuning for CPU, GPU, and edge.

Start Topic → View All Lessons

6

Lessons

💻

Code Examples

✅

Production-Ready

100%

Free

Lessons in This Topic

Work through these 6 lessons in order, or jump to whichever topic you need most.

llama.cpp Overview

Beginner

GGUF Format

Intermediate

GGUF Quantization Formats (Q4, Q5, Q8)

Intermediate

Metal, CUDA, ROCm, Vulkan Backends

Advanced

llama.cpp Server Mode

Intermediate

Tuning llama.cpp

Advanced