Beginner

Introduction to Gradient Boosting

Understand the fundamental concepts behind gradient boosting: decision trees, ensemble methods, the bias-variance tradeoff, and why boosting consistently wins on tabular data.

Why Gradient Boosting?

Gradient boosting is the dominant algorithm for structured/tabular data. It wins the vast majority of Kaggle competitions involving tabular datasets and powers production ML systems at companies like Airbnb, Uber, and Netflix. While deep learning excels at images and text, gradient boosting remains king for tables.

Kaggle Dominance: Over 70% of winning solutions on Kaggle for tabular data competitions use XGBoost, LightGBM, or CatBoost — often all three in an ensemble.

Decision Trees Refresher

Gradient boosting builds on decision trees. A single decision tree splits data based on feature thresholds to make predictions:

Python

from sklearn.tree import DecisionTreeClassifier

# A single decision tree (weak learner)
tree = DecisionTreeClassifier(max_depth=3)
tree.fit(X_train, y_train)
print(f"Single tree accuracy: {tree.score(X_test, y_test):.3f}")

# Problem: single trees are weak learners
# Solution: combine many trees via boosting!

How Gradient Boosting Works

Start with a simple prediction
Initialize with a constant (e.g., mean of target for regression, log-odds for classification).
Compute residuals
Calculate the error (residuals) between current predictions and actual values.
Fit a tree to residuals
Train a new shallow decision tree to predict the residuals — the mistakes of the current model.
Update predictions
Add the new tree's predictions (scaled by learning rate) to the running total.
Repeat
Continue adding trees that correct remaining errors until a stopping criterion is met.

Boosting vs Bagging

Aspect	Boosting (XGBoost)	Bagging (Random Forest)
Strategy	Sequential: each tree corrects errors	Parallel: independent trees, average results
Bias/Variance	Reduces bias (and variance)	Primarily reduces variance
Trees	Shallow trees (3-8 depth)	Deep trees (often unlimited)
Overfitting risk	Higher (needs tuning)	Lower (more robust by default)
Performance	Usually higher with tuning	Good out of the box

The Three Frameworks

Framework	Key Innovation	Best For
XGBoost	Regularized objective, sparse-aware	General purpose, most battle-tested
LightGBM	Histogram-based, leaf-wise growth	Large datasets, fastest training
CatBoost	Ordered boosting, native categoricals	Categorical-heavy data, least tuning

Ready to Learn XGBoost?

Dive into the most popular gradient boosting framework and learn its API, parameters, and features.

Next: XGBoost →

← Course Overview XGBoost →

Introduction to Gradient Boosting

Why Gradient Boosting?

Decision Trees Refresher

How Gradient Boosting Works

Start with a simple prediction

Compute residuals

Fit a tree to residuals

Update predictions

Repeat

Boosting vs Bagging

The Three Frameworks

Ready to Learn XGBoost?