Introduction to Gradient Boosting
Understand the fundamental concepts behind gradient boosting: decision trees, ensemble methods, the bias-variance tradeoff, and why boosting consistently wins on tabular data.
Why Gradient Boosting?
Gradient boosting is the dominant algorithm for structured/tabular data. It wins the vast majority of Kaggle competitions involving tabular datasets and powers production ML systems at companies like Airbnb, Uber, and Netflix. While deep learning excels at images and text, gradient boosting remains king for tables.
Decision Trees Refresher
Gradient boosting builds on decision trees. A single decision tree splits data based on feature thresholds to make predictions:
from sklearn.tree import DecisionTreeClassifier # A single decision tree (weak learner) tree = DecisionTreeClassifier(max_depth=3) tree.fit(X_train, y_train) print(f"Single tree accuracy: {tree.score(X_test, y_test):.3f}") # Problem: single trees are weak learners # Solution: combine many trees via boosting!
How Gradient Boosting Works
Start with a simple prediction
Initialize with a constant (e.g., mean of target for regression, log-odds for classification).
Compute residuals
Calculate the error (residuals) between current predictions and actual values.
Fit a tree to residuals
Train a new shallow decision tree to predict the residuals — the mistakes of the current model.
Update predictions
Add the new tree's predictions (scaled by learning rate) to the running total.
Repeat
Continue adding trees that correct remaining errors until a stopping criterion is met.
Boosting vs Bagging
| Aspect | Boosting (XGBoost) | Bagging (Random Forest) |
|---|---|---|
| Strategy | Sequential: each tree corrects errors | Parallel: independent trees, average results |
| Bias/Variance | Reduces bias (and variance) | Primarily reduces variance |
| Trees | Shallow trees (3-8 depth) | Deep trees (often unlimited) |
| Overfitting risk | Higher (needs tuning) | Lower (more robust by default) |
| Performance | Usually higher with tuning | Good out of the box |
The Three Frameworks
| Framework | Key Innovation | Best For |
|---|---|---|
| XGBoost | Regularized objective, sparse-aware | General purpose, most battle-tested |
| LightGBM | Histogram-based, leaf-wise growth | Large datasets, fastest training |
| CatBoost | Ordered boosting, native categoricals | Categorical-heavy data, least tuning |
Ready to Learn XGBoost?
Dive into the most popular gradient boosting framework and learn its API, parameters, and features.
Next: XGBoost →
Lilly Tech Systems