AI Models
Master every AI model that matters. 50 deep dives covering frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2), open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen 2.5/QwQ, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2), image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram), video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo), audio (Whisper, ElevenLabs, Suno, Udio, F5-TTS), embeddings (text-embedding-3, Cohere v3, BGE-M3, Voyage-3), and specialized foundation models (CLIP, SAM 2).
All Models
50 model deep dives organized into 7 categories spanning the full AI model landscape.
Frontier Closed LLMs
GPT-5
Master GPT-5 — OpenAI's flagship model. Learn capabilities, context window, multimodal inputs, native tool use, pricing, and the patterns for production GPT-5 use.
6 LessonsGPT-4o
Master GPT-4o — OpenAI's omni-modal workhorse. Learn vision, audio, native tool use, structured outputs, and the patterns that ship most production OpenAI apps today.
6 LessonsClaude Opus 4.7
Master Claude Opus 4.7 — Anthropic's flagship for complex reasoning, coding, and agents. Learn 1M context, prompt caching, computer use, and Opus-specific patterns.
6 LessonsClaude Sonnet 4.6
Master Claude Sonnet 4.6 — Anthropic's balanced workhorse. Learn the cost/quality sweet spot, tool use, vision, and the patterns for high-volume Sonnet workloads.
6 LessonsClaude Haiku 4.5
Master Claude Haiku 4.5 — Anthropic's fast, cheap workhorse. Learn the latency/cost edge, batch use, and the patterns for high-throughput Haiku deployments.
6 LessonsGemini 2.5 Pro
Master Gemini 2.5 Pro — Google's flagship long-context multimodal model. Learn 1M-2M context, native multimodal (image, video, audio), and search grounding.
6 LessonsGemini 2.0 Flash
Master Gemini 2.0 Flash — Google's fast, cheap workhorse with native multimodal. Learn the speed/cost edge, agentic native tools, and high-throughput patterns.
6 LessonsGrok 4
Master Grok 4 — xAI's frontier model with real-time X data access. Learn the unique data advantages, voice mode, and patterns for using Grok effectively.
6 LessonsOpenAI o-series (Reasoning)
Master OpenAI's reasoning models: o1, o3, o4. Learn the chain-of-thought-as-a-service paradigm, when reasoning models beat regular ones, and cost-optimization patterns.
6 LessonsMistral Large 2
Master Mistral Large 2 — France's frontier model. Learn its tool use, JSON mode, multilingual strengths, and the European data sovereignty story.
6 LessonsOpen-Weight LLMs
Llama 3.3 70B
Master Meta Llama 3.3 70B — the most popular open-weight LLM. Learn its capabilities, fine-tuning, deployment, and why it powers most production open-LLM apps.
6 LessonsLlama 4 Family
Master Meta's Llama 4 family: Scout, Maverick, Behemoth. Learn the MoE architecture, multimodal native, 10M context, and what's new vs Llama 3.3.
6 LessonsDeepSeek-V3
Master DeepSeek-V3 — the open-weight MoE model that matches GPT-4 at 1/10 cost. Learn the architecture, training innovations, and self-hosting patterns.
6 LessonsDeepSeek-R1 Reasoning
Master DeepSeek-R1 — the open-weight reasoning model that matches OpenAI o1. Learn the reasoning architecture, distillations, and reasoning model patterns.
6 LessonsQwen 2.5 Family
Master Alibaba Qwen 2.5: 0.5B-72B sizes, Qwen-VL multimodal, Qwen-Coder. Learn the family, strengths in Chinese/English, and deployment patterns.
6 LessonsQwQ-32B Reasoning
Master Alibaba's QwQ-32B — the open-weight reasoning model. Learn the reasoning approach, when QwQ beats DeepSeek-R1, and self-hosted reasoning patterns.
6 LessonsMixtral 8x22B
Master Mistral's Mixtral 8x22B — a sparse MoE with 39B active params. Learn the MoE pattern, deployment cost, and when Mixtral beats dense alternatives.
6 LessonsGemma 2 / Gemma 3
Master Google's open-weight Gemma 2 (2B/9B/27B) and Gemma 3. Learn its strengths at small scale, tokenizer differences, and on-device deployment patterns.
6 LessonsPhi-4 (Microsoft)
Master Microsoft Phi-4 — small but mighty 14B model. Learn synthetic data training, when Phi beats much larger models, and its niche in edge/coding.
6 LessonsDBRX (Databricks)
Master Databricks DBRX — 132B MoE with 36B active. Learn the architecture, Databricks integration, and DBRX's niche in enterprise data workloads.
6 LessonsFalcon (TII)
Master TII's Falcon family (7B-180B). Learn the Mamba+attention hybrid Falcon3, training data story, and when Falcon fits a workload.
6 LessonsYi (01.AI)
Master 01.AI's Yi family (Yi-34B, Yi-Lightning). Learn the Chinese/English balance, long-context variants, and Yi's positioning in the open-weight market.
6 LessonsNVIDIA Nemotron
Master NVIDIA's Nemotron family (Llama-3.1-Nemotron-70B, Nemotron-Mini). Learn how NVIDIA tunes Llama for steerability and RLHF improvements.
6 LessonsCohere Command R+ (Open)
Master Cohere Command R+ open weights — RAG-native LLM. Learn the citation-built-in design, tool use, and self-hosted Command R+ deployment.
6 LessonsSmolLM2 / TinyLlama
Master tiny LLMs: SmolLM2 (135M-1.7B), TinyLlama (1.1B). Learn the on-device, edge, and CPU-only deployment patterns where tiny LLMs shine.
6 LessonsImage Generation Models
Stable Diffusion 3.5
Master Stability AI's SD 3.5 (Large, Medium, Turbo). Learn the MMDiT architecture, prompt engineering for SD3, and the open-image-model frontier.
6 LessonsSDXL
Master Stable Diffusion XL — still the most-fine-tuned base. Learn the architecture, refiner pipeline, ControlNet, LoRA ecosystem, and when SDXL beats SD 3.5.
6 LessonsFLUX.1 [pro/dev/schnell]
Master Black Forest Labs FLUX.1 — currently the best open image model. Learn pro vs dev vs schnell, prompt engineering, and FLUX-specific deployment patterns.
6 LessonsDALL-E 3
Master DALL-E 3 — OpenAI's text-to-image with built-in prompt rewriting. Learn the strengths in text rendering, ChatGPT integration, and production patterns.
6 Lessonsgpt-image-1 (4o-image)
Master gpt-image-1 — OpenAI's flagship image model. Learn its strengths in editing, character consistency, multi-turn editing, and the patterns that beat DALL-E 3.
6 LessonsMidjourney v7
Master Midjourney v7 — the artistic-quality leader. Learn parameters (--ar, --s, --c, --w), personalization, style references, and the Midjourney production patterns.
6 LessonsImagen 4 (Google)
Master Google Imagen 4 — frontier photorealistic image gen. Learn safety controls, aspect ratios, Vertex AI integration, and Imagen-specific prompting.
6 LessonsIdeogram
Master Ideogram (1.0, 2.0, 3.0) — the text-rendering champion. Learn typography prompting, magic prompt, style references, and Ideogram's design-focused niche.
6 LessonsVideo Generation Models
Sora (OpenAI)
Master OpenAI Sora — frontier text-to-video. Learn the diffusion-transformer architecture, capabilities, limitations, and patterns for production Sora use.
6 LessonsRunway Gen-3 Alpha
Master Runway Gen-3 Alpha and Gen-3 Alpha Turbo. Learn text-to-video, image-to-video, motion control, and the Runway video production workflow.
6 LessonsLuma Dream Machine
Master Luma Dream Machine — fast cinematic video. Learn keyframe conditioning, camera motion, loop generation, and the Dream Machine production patterns.
6 LessonsKling 1.5 / 1.6
Master Kuaishou Kling 1.5/1.6 — long-form video model. Learn motion brush, camera movement, the std vs pro modes, and Kling's production strengths.
6 LessonsPika 1.5
Master Pika Labs 1.5 — creative video with Pikaffects. Learn the effects-driven approach, image-to-video, lip sync, and Pika's stylized strengths.
6 LessonsHunyuanVideo (Tencent)
Master Tencent HunyuanVideo — 13B open-source video model. Learn the architecture, prompt engineering, self-hosting, and when HunyuanVideo wins on cost.
6 LessonsAudio & Speech Models
Whisper Large v3
Master OpenAI Whisper Large v3 — the open-weight ASR standard. Learn the architecture, multilingual support, fine-tuning, distil-whisper, and production deployment.
6 LessonsElevenLabs Multilingual v2
Master ElevenLabs Multilingual v2 + Turbo v2.5 — state-of-the-art TTS. Learn voice settings, language coverage, voice cloning, and production patterns.
6 LessonsSuno v4 (Music)
Master Suno v4 — frontier text-to-music. Learn lyric prompting, custom mode, style descriptors, audio extension, and Suno's production music workflow.
6 LessonsUdio
Master Udio — premium text-to-music model. Learn prompting, extend functionality, and when Udio beats Suno for specific music tasks.
6 LessonsF5-TTS (Open)
Master F5-TTS — frontier open-source TTS with voice cloning. Learn the architecture, voice reference, multilingual capability, and self-hosted deployment.
6 LessonsEmbedding Models
text-embedding-3-large/small
Master OpenAI text-embedding-3-large and -small. Learn Matryoshka embeddings (variable dimensions), MTEB scores, and the patterns for production OpenAI embeddings.
6 LessonsCohere Embed v3
Master Cohere Embed v3 (English, Multilingual, Image). Learn input_type=query/document, multilingual coverage, and the patterns for Cohere embeddings.
6 LessonsBGE-M3 (BAAI)
Master BGE-M3 — multi-functional, multi-lingual, multi-granularity open embedding model. Learn dense+sparse+multi-vector outputs and self-hosted patterns.
6 LessonsVoyage-3-large
Master Voyage-3 and Voyage-3-large — top-of-MTEB embeddings + voyage-rerank-2. Learn domain-tuned variants (code, finance, legal) and production patterns.
6 LessonsSpecialized Foundation Models
CLIP (OpenAI)
Master CLIP — the multimodal embedding model that connects images and text. Learn variants (ViT-L/14, OpenCLIP, SigLIP), zero-shot classification, and production CLIP.
6 LessonsSAM 2 (Meta Segment Anything)
Master SAM 2 — Meta's image+video segmentation foundation model. Learn point/box/text prompting, video tracking, and production segmentation patterns.
6 LessonsWhy an AI Models Track?
There are 100+ models that matter and a new frontier model ships every month. This track gives you a single up-to-date map.
Frontier + Open
10 frontier closed LLMs (GPT-5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4, o-series, Mistral Large 2) + 15 open-weight LLMs (Llama 3.3/4, DeepSeek-V3/R1, Qwen, Mixtral, Gemma, Phi-4, DBRX, Falcon, Yi, Nemotron, Command R+, SmolLM2).
Image + Video
Image generation (SD 3.5, SDXL, FLUX.1, DALL-E 3, gpt-image-1, Midjourney, Imagen 4, Ideogram) and video generation (Sora, Runway Gen-3, Luma, Kling, Pika, HunyuanVideo).
Audio + Embeddings
Audio (Whisper Large v3, ElevenLabs Multilingual v2, Suno, Udio, F5-TTS) + embeddings (text-embedding-3, Cohere Embed v3, BGE-M3, Voyage-3-large).
Foundation Models
Specialized: CLIP for multimodal embeddings, SAM 2 for image and video segmentation.
Lilly Tech Systems