AI Models

Master every AI model that matters. 50 deep dives covering frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2), open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen 2.5/QwQ, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2), image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram), video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo), audio (Whisper, ElevenLabs, Suno, Udio, F5-TTS), embeddings (text-embedding-3, Cohere v3, BGE-M3, Voyage-3), and specialized foundation models (CLIP, SAM 2).

Start Learning View All Models

50 Models

300 Lessons

7 Categories

100% Free

All Models

50 model deep dives organized into 7 categories spanning the full AI model landscape.

Frontier Closed LLMs

GPT-5

Master GPT-5 — OpenAI's flagship model. Learn capabilities, context window, multimodal inputs, native tool use, pricing, and the patterns for production GPT-5 use.

GPT-4o

Master GPT-4o — OpenAI's omni-modal workhorse. Learn vision, audio, native tool use, structured outputs, and the patterns that ship most production OpenAI apps today.

Claude Opus 4.7

Master Claude Opus 4.7 — Anthropic's flagship for complex reasoning, coding, and agents. Learn 1M context, prompt caching, computer use, and Opus-specific patterns.

Claude Sonnet 4.6

Master Claude Sonnet 4.6 — Anthropic's balanced workhorse. Learn the cost/quality sweet spot, tool use, vision, and the patterns for high-volume Sonnet workloads.

Claude Haiku 4.5

Master Claude Haiku 4.5 — Anthropic's fast, cheap workhorse. Learn the latency/cost edge, batch use, and the patterns for high-throughput Haiku deployments.

Gemini 2.5 Pro

Master Gemini 2.5 Pro — Google's flagship long-context multimodal model. Learn 1M-2M context, native multimodal (image, video, audio), and search grounding.

Gemini 2.0 Flash

Master Gemini 2.0 Flash — Google's fast, cheap workhorse with native multimodal. Learn the speed/cost edge, agentic native tools, and high-throughput patterns.

Grok 4

Master Grok 4 — xAI's frontier model with real-time X data access. Learn the unique data advantages, voice mode, and patterns for using Grok effectively.

OpenAI o-series (Reasoning)

Master OpenAI's reasoning models: o1, o3, o4. Learn the chain-of-thought-as-a-service paradigm, when reasoning models beat regular ones, and cost-optimization patterns.

Mistral Large 2

Master Mistral Large 2 — France's frontier model. Learn its tool use, JSON mode, multilingual strengths, and the European data sovereignty story.

Open-Weight LLMs

Llama 3.3 70B

Master Meta Llama 3.3 70B — the most popular open-weight LLM. Learn its capabilities, fine-tuning, deployment, and why it powers most production open-LLM apps.

Llama 4 Family

Master Meta's Llama 4 family: Scout, Maverick, Behemoth. Learn the MoE architecture, multimodal native, 10M context, and what's new vs Llama 3.3.

DeepSeek-V3

Master DeepSeek-V3 — the open-weight MoE model that matches GPT-4 at 1/10 cost. Learn the architecture, training innovations, and self-hosting patterns.

DeepSeek-R1 Reasoning

Master DeepSeek-R1 — the open-weight reasoning model that matches OpenAI o1. Learn the reasoning architecture, distillations, and reasoning model patterns.

Qwen 2.5 Family

Master Alibaba Qwen 2.5: 0.5B-72B sizes, Qwen-VL multimodal, Qwen-Coder. Learn the family, strengths in Chinese/English, and deployment patterns.

QwQ-32B Reasoning

Master Alibaba's QwQ-32B — the open-weight reasoning model. Learn the reasoning approach, when QwQ beats DeepSeek-R1, and self-hosted reasoning patterns.

Mixtral 8x22B

Master Mistral's Mixtral 8x22B — a sparse MoE with 39B active params. Learn the MoE pattern, deployment cost, and when Mixtral beats dense alternatives.

Gemma 2 / Gemma 3

Master Google's open-weight Gemma 2 (2B/9B/27B) and Gemma 3. Learn its strengths at small scale, tokenizer differences, and on-device deployment patterns.

Phi-4 (Microsoft)

Master Microsoft Phi-4 — small but mighty 14B model. Learn synthetic data training, when Phi beats much larger models, and its niche in edge/coding.

DBRX (Databricks)

Master Databricks DBRX — 132B MoE with 36B active. Learn the architecture, Databricks integration, and DBRX's niche in enterprise data workloads.

Falcon (TII)

Master TII's Falcon family (7B-180B). Learn the Mamba+attention hybrid Falcon3, training data story, and when Falcon fits a workload.

Yi (01.AI)

Master 01.AI's Yi family (Yi-34B, Yi-Lightning). Learn the Chinese/English balance, long-context variants, and Yi's positioning in the open-weight market.

NVIDIA Nemotron

Master NVIDIA's Nemotron family (Llama-3.1-Nemotron-70B, Nemotron-Mini). Learn how NVIDIA tunes Llama for steerability and RLHF improvements.

Cohere Command R+ (Open)

Master Cohere Command R+ open weights — RAG-native LLM. Learn the citation-built-in design, tool use, and self-hosted Command R+ deployment.

SmolLM2 / TinyLlama

Master tiny LLMs: SmolLM2 (135M-1.7B), TinyLlama (1.1B). Learn the on-device, edge, and CPU-only deployment patterns where tiny LLMs shine.

Image Generation Models

Stable Diffusion 3.5

Master Stability AI's SD 3.5 (Large, Medium, Turbo). Learn the MMDiT architecture, prompt engineering for SD3, and the open-image-model frontier.

SDXL

Master Stable Diffusion XL — still the most-fine-tuned base. Learn the architecture, refiner pipeline, ControlNet, LoRA ecosystem, and when SDXL beats SD 3.5.

FLUX.1 [pro/dev/schnell]

Master Black Forest Labs FLUX.1 — currently the best open image model. Learn pro vs dev vs schnell, prompt engineering, and FLUX-specific deployment patterns.

DALL-E 3

Master DALL-E 3 — OpenAI's text-to-image with built-in prompt rewriting. Learn the strengths in text rendering, ChatGPT integration, and production patterns.

gpt-image-1 (4o-image)

Master gpt-image-1 — OpenAI's flagship image model. Learn its strengths in editing, character consistency, multi-turn editing, and the patterns that beat DALL-E 3.

Midjourney v7

Master Midjourney v7 — the artistic-quality leader. Learn parameters (--ar, --s, --c, --w), personalization, style references, and the Midjourney production patterns.

Imagen 4 (Google)

Master Google Imagen 4 — frontier photorealistic image gen. Learn safety controls, aspect ratios, Vertex AI integration, and Imagen-specific prompting.

Ideogram

Master Ideogram (1.0, 2.0, 3.0) — the text-rendering champion. Learn typography prompting, magic prompt, style references, and Ideogram's design-focused niche.

Video Generation Models

Sora (OpenAI)

Master OpenAI Sora — frontier text-to-video. Learn the diffusion-transformer architecture, capabilities, limitations, and patterns for production Sora use.

Runway Gen-3 Alpha

Master Runway Gen-3 Alpha and Gen-3 Alpha Turbo. Learn text-to-video, image-to-video, motion control, and the Runway video production workflow.

Luma Dream Machine

Master Luma Dream Machine — fast cinematic video. Learn keyframe conditioning, camera motion, loop generation, and the Dream Machine production patterns.

Kling 1.5 / 1.6

Master Kuaishou Kling 1.5/1.6 — long-form video model. Learn motion brush, camera movement, the std vs pro modes, and Kling's production strengths.

Pika 1.5

Master Pika Labs 1.5 — creative video with Pikaffects. Learn the effects-driven approach, image-to-video, lip sync, and Pika's stylized strengths.

HunyuanVideo (Tencent)

Master Tencent HunyuanVideo — 13B open-source video model. Learn the architecture, prompt engineering, self-hosting, and when HunyuanVideo wins on cost.

Audio & Speech Models

Whisper Large v3

Master OpenAI Whisper Large v3 — the open-weight ASR standard. Learn the architecture, multilingual support, fine-tuning, distil-whisper, and production deployment.

ElevenLabs Multilingual v2

Master ElevenLabs Multilingual v2 + Turbo v2.5 — state-of-the-art TTS. Learn voice settings, language coverage, voice cloning, and production patterns.

Suno v4 (Music)

Master Suno v4 — frontier text-to-music. Learn lyric prompting, custom mode, style descriptors, audio extension, and Suno's production music workflow.

Udio

Master Udio — premium text-to-music model. Learn prompting, extend functionality, and when Udio beats Suno for specific music tasks.

F5-TTS (Open)

Master F5-TTS — frontier open-source TTS with voice cloning. Learn the architecture, voice reference, multilingual capability, and self-hosted deployment.

Embedding Models

text-embedding-3-large/small

Master OpenAI text-embedding-3-large and -small. Learn Matryoshka embeddings (variable dimensions), MTEB scores, and the patterns for production OpenAI embeddings.

Cohere Embed v3

Master Cohere Embed v3 (English, Multilingual, Image). Learn input_type=query/document, multilingual coverage, and the patterns for Cohere embeddings.

BGE-M3 (BAAI)

Master BGE-M3 — multi-functional, multi-lingual, multi-granularity open embedding model. Learn dense+sparse+multi-vector outputs and self-hosted patterns.

Voyage-3-large

Master Voyage-3 and Voyage-3-large — top-of-MTEB embeddings + voyage-rerank-2. Learn domain-tuned variants (code, finance, legal) and production patterns.

Specialized Foundation Models

CLIP (OpenAI)

Master CLIP — the multimodal embedding model that connects images and text. Learn variants (ViT-L/14, OpenCLIP, SigLIP), zero-shot classification, and production CLIP.

SAM 2 (Meta Segment Anything)

Master SAM 2 — Meta's image+video segmentation foundation model. Learn point/box/text prompting, video tracking, and production segmentation patterns.

Why an AI Models Track?

There are 100+ models that matter and a new frontier model ships every month. This track gives you a single up-to-date map.

🧠

Frontier + Open

10 frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2) + 15 open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2).

🎨

Image + Video

Image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram) and video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo).

🎧

Audio + Embeddings

Audio (Whisper Large v3, ElevenLabs Multilingual v2, Suno, Udio, F5-TTS) + embeddings (text-embedding-3, Cohere Embed v3, BGE-M3, Voyage-3-large).

👁

Foundation Models

Specialized: CLIP for multimodal embeddings, SAM 2 for image and video segmentation.