AI Skills
Master the practitioner skills that AI engineers reach for every day. Not theory, not framework tours — the practical techniques that separate engineers who ship reliable AI from those who don't. 47 skills, 282 hands-on lessons.
All Skills
47 skills organized into 6 categories spanning the full AI engineering stack — from prompts to production.
Prompting & LLM Mastery
Advanced Prompt Engineering
Master the craft of prompt engineering: structured prompts, system messages, role design, output contracts, and reliable instruction-following at scale.
6 LessonsChain-of-Thought Prompting
Use step-by-step reasoning to dramatically improve LLM accuracy on math, logic, multi-hop QA, and planning tasks. Learn zero-shot CoT, few-shot CoT, and self-consistency.
6 LessonsFew-Shot Learning Skills
Teach LLMs new tasks by showing 2-10 examples instead of fine-tuning. Master example selection, ordering, and the bias-variance tradeoffs of in-context learning.
6 LessonsRole and Persona Prompting
Use roles and personas to steer LLM tone, expertise, and behavior. Build expert assistants, customer service voices, and domain specialists with persona prompts.
6 LessonsPrompt Compression
Cut prompt tokens 50-90% without losing accuracy. Learn LLMLingua, summarization, schema compression, and token-aware prompt engineering for cost reduction.
6 LessonsMultimodal Prompting
Prompt vision-language and audio-language models. Build OCR pipelines, chart readers, document analyzers, and image-grounded chat with GPT-4V, Claude, and Gemini.
6 LessonsStructured Output Prompting
Force LLMs to emit valid JSON, XML, YAML, or any structured format. Master JSON mode, function calling, Pydantic schemas, and grammar-constrained decoding.
6 LessonsPrompt Injection Defense
Defend production LLM apps from prompt injection, jailbreaks, and indirect attacks. Layer detection, isolation, and constraint techniques to harden user-facing AI.
6 LessonsRAG & Retrieval
Document Chunking Strategies
The chunking strategy makes or breaks RAG quality. Master fixed-size, recursive, semantic, structural, and late chunking patterns for documents, code, and PDFs.
6 LessonsVector Embedding Selection
Choose the right embedding model for your data, language, and budget. Compare OpenAI, Cohere, Voyage, BGE, E5, and learn when to fine-tune your own.
6 LessonsHybrid Search (BM25 + Vector)
Combine keyword (BM25) and semantic (vector) search to beat either alone. Learn fusion techniques, RRF, weighted scoring, and tuning for production retrieval.
6 LessonsReranking Models
Boost RAG accuracy 10-30% with a second-stage reranker. Master cross-encoders, Cohere Rerank, BGE-Reranker, and ColBERT for precision-critical retrieval.
6 LessonsQuery Rewriting and Expansion
Rewrite vague user queries into search-optimized forms. Master HyDE, multi-query, query decomposition, and step-back prompting for better retrieval.
6 LessonsMetadata Filtering
Combine vector search with structured filters: dates, tenants, permissions, categories. Build multi-tenant RAG that returns only the documents users are allowed to see.
6 LessonsRAG Evaluation Metrics
Measure RAG quality across retrieval (hit rate, MRR, NDCG) and generation (faithfulness, answer relevance). Build eval suites with RAGAS, TruLens, and DeepEval.
6 LessonsLong Context RAG
Stuff 100K-1M token contexts into Claude, Gemini, and GPT-4 long-context models. Learn when long-context replaces RAG and when to combine them.
6 LessonsModel Customization
Fine-Tuning with LoRA
Fine-tune 7B-70B parameter LLMs on consumer GPUs using LoRA adapters. Train domain-specific models for 1-10% of the cost of full fine-tuning.
6 LessonsQLoRA Quantized Fine-Tuning
Fine-tune 70B parameter models on a single A100 with 4-bit quantization. Master bitsandbytes, NF4, double quantization, and the QLoRA training recipe.
6 LessonsInstruction Tuning
Turn a base LLM into an instruction-following assistant. Curate datasets like Alpaca, Dolly, and OpenHermes; format with chat templates; train SFT pipelines.
6 LessonsDPO and RLHF Alignment
Align LLMs with human preferences using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF). Build preference datasets and training loops.
6 LessonsModel Distillation
Compress a large teacher model into a small student model. Cut inference cost 10-100x while preserving most capability through knowledge distillation.
6 LessonsModel Quantization (GGUF, AWQ, GPTQ)
Run 70B models on a laptop or 7B models on a phone via 4-bit and 2-bit quantization. Master GGUF, AWQ, GPTQ, and bitsandbytes quantization formats.
6 LessonsModel Merging
Combine the strengths of multiple fine-tuned models without further training. Master TIES, DARE, SLERP, and frankenmerges with mergekit.
6 LessonsContinued Pretraining
Inject new domain knowledge by continuing pretraining on raw text. Build models that know your codebase, medical literature, or legal corpus deeply.
6 LessonsAgent Skills
Tool Calling Design
Design tool definitions LLMs can use reliably. Master tool descriptions, parameter schemas, error handling, and the patterns that turn LLMs into capable agents.
6 LessonsAgent Memory Engineering
Give agents short-term, working, episodic, and semantic memory. Build conversational, file-based, and vector memory systems for stateful AI agents.
6 LessonsMulti-Agent Orchestration
Coordinate multiple specialized agents. Master supervisor patterns, router agents, blackboard architecture, and handoffs with LangGraph and CrewAI.
6 LessonsAgent Evaluation
Measure agent quality on tool use, planning, task completion, and cost. Build trajectory evaluations, golden tests, and online monitoring for production agents.
6 LessonsReAct Agent Patterns
Implement Reason+Act agents that interleave thinking and tool use. Master the ReAct pattern, scratchpads, and the variants used by LangChain and AutoGPT.
6 LessonsPlanning and Task Decomposition
Teach agents to break large tasks into executable subtasks. Master Plan-and-Execute, Tree-of-Thoughts, hierarchical planning, and replanning under failure.
6 LessonsAgent Guardrails
Constrain what agents can do. Layer permissions, action whitelists, cost caps, output validators, and human-in-the-loop checkpoints for production safety.
6 LessonsProduction AI
LLM Cost Optimization
Cut LLM bills 50-90% without sacrificing quality. Master model routing, prompt caching, batch inference, and the levers that drive production LLM cost.
6 LessonsPrompt Caching Mastery
Slash latency and cost with prompt caching. Master Anthropic prompt caching, OpenAI auto-caching, semantic caching, and cache invalidation patterns.
6 LessonsStreaming LLM Responses
Build snappy chat UIs with streaming responses. Master SSE, WebSockets, partial JSON parsing, and stream cancellation across web, mobile, and serverless apps.
6 LessonsInference Latency Tuning
Cut p50 and p99 LLM latency with the right techniques. Master TTFT optimization, speculative decoding, KV cache reuse, and batching for low-latency inference.
6 LessonsModel Serving with vLLM
Serve LLMs in production with vLLM. Master PagedAttention, continuous batching, tensor parallelism, and the OpenAI-compatible API for high-throughput inference.
6 LessonsGPU Memory Management
Fit big models on small GPUs. Master OOM debugging, gradient checkpointing, model offloading, FlashAttention, and the techniques that double effective memory.
6 LessonsBatch Inference Optimization
Process millions of LLM calls cheaply with batch APIs. Master OpenAI/Anthropic batch APIs, Ray Data, async parallelism, and offline LLM workflows.
6 LessonsToken Budget Management
Stay within context windows and cost ceilings. Master tokenizers, dynamic context trimming, summarization, and budget-aware request routing.
6 LessonsEvaluation & Safety
LLM-as-Judge Evaluation
Use a strong LLM to grade outputs of another LLM. Master pairwise comparison, rubric scoring, judge calibration, and avoiding the bias traps.
6 LessonsHallucination Detection
Detect when LLMs make things up. Master citation grounding, NLI verification, self-consistency checks, and SelfCheckGPT for production hallucination detection.
6 LessonsBias and Fairness Auditing
Audit LLMs and ML models for demographic bias. Master fairness metrics, counterfactual probing, BBQ benchmark, and bias mitigation techniques.
6 LessonsAI Red Teaming
Stress-test AI systems for harmful, unsafe, and policy-violating behavior. Master jailbreak techniques, automated red teaming, and reporting findings.
6 LessonsPII Detection and Redaction
Strip personal data from prompts and outputs. Master Microsoft Presidio, regex, NER, and LLM-based redaction for HIPAA, GDPR, and CCPA compliance.
6 LessonsAI Output Validation
Validate every LLM output before it ships to users or systems. Master Pydantic, Guardrails AI, JSON Schema, and the validation patterns that prevent disasters.
6 LessonsEval Dataset Creation
Build the eval datasets your AI features need. Master synthetic generation, human labeling, edge case mining, and the iteration loop that compounds quality.
6 LessonsProduction AI Monitoring
Monitor AI in production: cost, latency, quality, drift, and abuse. Master LangSmith, Arize, Helicone, and the observability stack for LLM applications.
6 LessonsWhy a Skills Track?
Projects show you what to build. Skills show you the techniques you reuse across every project.
Transferable
Each skill applies to dozens of projects. Learn it once, use it for the rest of your career.
Practical
Every lesson includes runnable code and a production checklist. No theory dumps.
Job-Ready
The exact skills hiring managers screen for in AI engineer, ML engineer, and applied scientist roles.
Composable
Skills are designed to combine. Pair them to solve problems no single technique can.
Lilly Tech Systems