AI Skills

Master the practitioner skills that AI engineers reach for every day. Not theory, not framework tours — the practical techniques that separate engineers who ship reliable AI from those who don't. 47 skills, 282 hands-on lessons.

Start Learning View All Skills

47 Skills

282 Lessons

100% Free

💻 Hands-On Code

AI Skills is a deliberately narrower track than AI School: it focuses on the practitioner verbs that recur on every serious AI project, regardless of the domain. Prompt engineering, retrieval design, fine-tuning, evaluation, cost optimization, guardrails, and agent orchestration are the moves an engineer reaches for dozens of times a week, so we treat them as first-class skills worth mastering in isolation. Each topic is scoped to be learnable in an afternoon and to give you something you can apply to your current project tomorrow.

We built this track after noticing a pattern in how AI projects shipped. Teams that mastered the verbs (how to structure a few-shot prompt, how to decide chunk size, how to evaluate a RAG system, when to quantize) moved faster and made fewer irreversible mistakes than teams that only knew the nouns (what a transformer is, what RAG is, what LoRA is). Nouns tell you what you are looking at; verbs let you build. This track is the verbs.

All Skills

47 skills organized into 6 categories spanning the full AI engineering stack — from prompts to production.

Prompting & LLM Mastery

Advanced Prompt Engineering

Master the craft of prompt engineering: structured prompts, system messages, role design, output contracts, and reliable instruction-following at scale.

Chain-of-Thought Prompting

Use step-by-step reasoning to dramatically improve LLM accuracy on math, logic, multi-hop QA, and planning tasks. Learn zero-shot CoT, few-shot CoT, and self-consistency.

Few-Shot Learning Skills

Teach LLMs new tasks by showing 2-10 examples instead of fine-tuning. Master example selection, ordering, and the bias-variance tradeoffs of in-context learning.

Role and Persona Prompting

Use roles and personas to steer LLM tone, expertise, and behavior. Build expert assistants, customer service voices, and domain specialists with persona prompts.

Prompt Compression

Cut prompt tokens 50-90% without losing accuracy. Learn LLMLingua, summarization, schema compression, and token-aware prompt engineering for cost reduction.

Multimodal Prompting

Prompt vision-language and audio-language models. Build OCR pipelines, chart readers, document analyzers, and image-grounded chat with GPT-4V, Claude, and Gemini.

Structured Output Prompting

Force LLMs to emit valid JSON, XML, YAML, or any structured format. Master JSON mode, function calling, Pydantic schemas, and grammar-constrained decoding.

Prompt Injection Defense

Defend production LLM apps from prompt injection, jailbreaks, and indirect attacks. Layer detection, isolation, and constraint techniques to harden user-facing AI.

RAG & Retrieval

Document Chunking Strategies

The chunking strategy makes or breaks RAG quality. Master fixed-size, recursive, semantic, structural, and late chunking patterns for documents, code, and PDFs.

Vector Embedding Selection

Choose the right embedding model for your data, language, and budget. Compare OpenAI, Cohere, Voyage, BGE, E5, and learn when to fine-tune your own.

Hybrid Search (BM25 + Vector)

Combine keyword (BM25) and semantic (vector) search to beat either alone. Learn fusion techniques, RRF, weighted scoring, and tuning for production retrieval.

Reranking Models

Boost RAG accuracy 10-30% with a second-stage reranker. Master cross-encoders, Cohere Rerank, BGE-Reranker, and ColBERT for precision-critical retrieval.

Query Rewriting and Expansion

Rewrite vague user queries into search-optimized forms. Master HyDE, multi-query, query decomposition, and step-back prompting for better retrieval.

Metadata Filtering

Combine vector search with structured filters: dates, tenants, permissions, categories. Build multi-tenant RAG that returns only the documents users are allowed to see.

RAG Evaluation Metrics

Measure RAG quality across retrieval (hit rate, MRR, NDCG) and generation (faithfulness, answer relevance). Build eval suites with RAGAS, TruLens, and DeepEval.

Long Context RAG

Stuff 100K-1M token contexts into Claude, Gemini, and GPT-4 long-context models. Learn when long-context replaces RAG and when to combine them.

Model Customization

Fine-Tuning with LoRA

Fine-tune 7B-70B parameter LLMs on consumer GPUs using LoRA adapters. Train domain-specific models for 1-10% of the cost of full fine-tuning.

QLoRA Quantized Fine-Tuning

Fine-tune 70B parameter models on a single A100 with 4-bit quantization. Master bitsandbytes, NF4, double quantization, and the QLoRA training recipe.

Instruction Tuning

Turn a base LLM into an instruction-following assistant. Curate datasets like Alpaca, Dolly, and OpenHermes; format with chat templates; train SFT pipelines.

DPO and RLHF Alignment

Align LLMs with human preferences using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF). Build preference datasets and training loops.

Model Distillation

Compress a large teacher model into a small student model. Cut inference cost 10-100x while preserving most capability through knowledge distillation.

Model Quantization (GGUF, AWQ, GPTQ)

Run 70B models on a laptop or 7B models on a phone via 4-bit and 2-bit quantization. Master GGUF, AWQ, GPTQ, and bitsandbytes quantization formats.

Model Merging

Combine the strengths of multiple fine-tuned models without further training. Master TIES, DARE, SLERP, and frankenmerges with mergekit.

Continued Pretraining

Inject new domain knowledge by continuing pretraining on raw text. Build models that know your codebase, medical literature, or legal corpus deeply.

Agent Skills

Tool Calling Design

Design tool definitions LLMs can use reliably. Master tool descriptions, parameter schemas, error handling, and the patterns that turn LLMs into capable agents.

Agent Memory Engineering

Give agents short-term, working, episodic, and semantic memory. Build conversational, file-based, and vector memory systems for stateful AI agents.

Multi-Agent Orchestration

Coordinate multiple specialized agents. Master supervisor patterns, router agents, blackboard architecture, and handoffs with LangGraph and CrewAI.

Agent Evaluation

Measure agent quality on tool use, planning, task completion, and cost. Build trajectory evaluations, golden tests, and online monitoring for production agents.

ReAct Agent Patterns

Implement Reason+Act agents that interleave thinking and tool use. Master the ReAct pattern, scratchpads, and the variants used by LangChain and AutoGPT.

Planning and Task Decomposition

Teach agents to break large tasks into executable subtasks. Master Plan-and-Execute, Tree-of-Thoughts, hierarchical planning, and replanning under failure.

Agent Guardrails

Constrain what agents can do. Layer permissions, action whitelists, cost caps, output validators, and human-in-the-loop checkpoints for production safety.

Production AI

LLM Cost Optimization

Cut LLM bills 50-90% without sacrificing quality. Master model routing, prompt caching, batch inference, and the levers that drive production LLM cost.

Prompt Caching Mastery

Slash latency and cost with prompt caching. Master Anthropic prompt caching, OpenAI auto-caching, semantic caching, and cache invalidation patterns.

Streaming LLM Responses

Build snappy chat UIs with streaming responses. Master SSE, WebSockets, partial JSON parsing, and stream cancellation across web, mobile, and serverless apps.

Inference Latency Tuning

Cut p50 and p99 LLM latency with the right techniques. Master TTFT optimization, speculative decoding, KV cache reuse, and batching for low-latency inference.

Model Serving with vLLM

Serve LLMs in production with vLLM. Master PagedAttention, continuous batching, tensor parallelism, and the OpenAI-compatible API for high-throughput inference.

GPU Memory Management

Fit big models on small GPUs. Master OOM debugging, gradient checkpointing, model offloading, FlashAttention, and the techniques that double effective memory.

Batch Inference Optimization

Process millions of LLM calls cheaply with batch APIs. Master OpenAI/Anthropic batch APIs, Ray Data, async parallelism, and offline LLM workflows.

Token Budget Management

Stay within context windows and cost ceilings. Master tokenizers, dynamic context trimming, summarization, and budget-aware request routing.

Evaluation & Safety

LLM-as-Judge Evaluation

Use a strong LLM to grade outputs of another LLM. Master pairwise comparison, rubric scoring, judge calibration, and avoiding the bias traps.

Hallucination Detection

Detect when LLMs make things up. Master citation grounding, NLI verification, self-consistency checks, and SelfCheckGPT for production hallucination detection.

Bias and Fairness Auditing

Audit LLMs and ML models for demographic bias. Master fairness metrics, counterfactual probing, BBQ benchmark, and bias mitigation techniques.

AI Red Teaming

Stress-test AI systems for harmful, unsafe, and policy-violating behavior. Master jailbreak techniques, automated red teaming, and reporting findings.

PII Detection and Redaction

Strip personal data from prompts and outputs. Master Microsoft Presidio, regex, NER, and LLM-based redaction for HIPAA, GDPR, and CCPA compliance.

AI Output Validation

Validate every LLM output before it ships to users or systems. Master Pydantic, Guardrails AI, JSON Schema, and the validation patterns that prevent disasters.

Eval Dataset Creation

Build the eval datasets your AI features need. Master synthetic generation, human labeling, edge case mining, and the iteration loop that compounds quality.

Production AI Monitoring

Monitor AI in production: cost, latency, quality, drift, and abuse. Master LangSmith, Arize, Helicone, and the observability stack for LLM applications.

Why a Skills Track?

Projects show you what to build. Skills show you the techniques you reuse across every project.

🔨

Transferable

Each skill applies to dozens of projects. Learn it once, use it for the rest of your career.

🎯

Practical

Every lesson includes runnable code and a production checklist. No theory dumps.

📊

Job-Ready

The exact skills hiring managers screen for in AI engineer, ML engineer, and applied scientist roles.

🎯

Composable

Skills are designed to combine. Pair them to solve problems no single technique can.