Optimization for Machine Learning
Master the algorithms that train every ML model. From gradient descent fundamentals to modern optimizers like Adam, learn how to efficiently find the best model parameters. Understand convex optimization, learning rate schedules, and hyperparameter tuning.
What You'll Learn
By the end of this course, you'll understand how optimization algorithms train ML models and how to tune them.
Gradient Descent
Master the foundational algorithm: batch, stochastic, and mini-batch gradient descent with momentum.
Modern Optimizers
Understand Adam, AdaGrad, RMSProp, and when to use each optimizer for different ML tasks.
Convex Optimization
Learn convexity, duality, and constrained optimization that underpin classical ML algorithms.
Hyperparameter Tuning
Systematic approaches to finding optimal learning rates, batch sizes, and architecture choices.
Course Lessons
Follow the lessons in order or jump to any topic you need.
1. Introduction
Why optimization is the core of ML training. Overview of the optimization landscape and key challenges.
2. Gradient Descent
The foundational algorithm: batch, stochastic, mini-batch GD. Momentum, learning rates, and convergence.
3. Adam & Optimizers
Adaptive optimizers: AdaGrad, RMSProp, Adam, AdamW. How they work and when to use each one.
4. Convex Optimization
Convex functions, duality, KKT conditions, and constrained optimization for ML.
5. Hyperparameter Tuning
Grid search, random search, Bayesian optimization, learning rate schedules, and automated tuning.
6. Best Practices
Training recipes, debugging optimization, learning rate warmup, weight decay, and practical tips.
Prerequisites
What you need before starting this course.
- Understanding of derivatives and gradients (see Calculus for ML course)
- Basic linear algebra knowledge (vectors, matrices)
- Python with NumPy and PyTorch installed
- Recommended: Complete Linear Algebra, Calculus, and Probability courses first
Lilly Tech Systems