Beginner

Introduction to Gradient Boosting

Understand the fundamental concepts behind gradient boosting: decision trees, ensemble methods, the bias-variance tradeoff, and why boosting consistently wins on tabular data.

Why Gradient Boosting?

Gradient boosting is the dominant algorithm for structured/tabular data. It wins the vast majority of Kaggle competitions involving tabular datasets and powers production ML systems at companies like Airbnb, Uber, and Netflix. While deep learning excels at images and text, gradient boosting remains king for tables.

Kaggle Dominance: Over 70% of winning solutions on Kaggle for tabular data competitions use XGBoost, LightGBM, or CatBoost — often all three in an ensemble.

Decision Trees Refresher

Gradient boosting builds on decision trees. A single decision tree splits data based on feature thresholds to make predictions:

Python
from sklearn.tree import DecisionTreeClassifier

# A single decision tree (weak learner)
tree = DecisionTreeClassifier(max_depth=3)
tree.fit(X_train, y_train)
print(f"Single tree accuracy: {tree.score(X_test, y_test):.3f}")

# Problem: single trees are weak learners
# Solution: combine many trees via boosting!

How Gradient Boosting Works

  1. Start with a simple prediction

    Initialize with a constant (e.g., mean of target for regression, log-odds for classification).

  2. Compute residuals

    Calculate the error (residuals) between current predictions and actual values.

  3. Fit a tree to residuals

    Train a new shallow decision tree to predict the residuals — the mistakes of the current model.

  4. Update predictions

    Add the new tree's predictions (scaled by learning rate) to the running total.

  5. Repeat

    Continue adding trees that correct remaining errors until a stopping criterion is met.

Boosting vs Bagging

AspectBoosting (XGBoost)Bagging (Random Forest)
StrategySequential: each tree corrects errorsParallel: independent trees, average results
Bias/VarianceReduces bias (and variance)Primarily reduces variance
TreesShallow trees (3-8 depth)Deep trees (often unlimited)
Overfitting riskHigher (needs tuning)Lower (more robust by default)
PerformanceUsually higher with tuningGood out of the box

The Three Frameworks

FrameworkKey InnovationBest For
XGBoostRegularized objective, sparse-awareGeneral purpose, most battle-tested
LightGBMHistogram-based, leaf-wise growthLarge datasets, fastest training
CatBoostOrdered boosting, native categoricalsCategorical-heavy data, least tuning

Ready to Learn XGBoost?

Dive into the most popular gradient boosting framework and learn its API, parameters, and features.

Next: XGBoost →