Regression Algorithms (15)
Algorithms for predicting continuous numeric values
Regression algorithms predict a continuous output variable based on one or more input features. From simple linear models to powerful gradient boosting machines, these 15 algorithms cover every regression need.
Quick Reference Table
| Algorithm | Complexity | Interpretability | Best For |
|---|---|---|---|
| Linear Regression | Low | High | Linearly related data |
| Polynomial Regression | Low-Med | Medium | Non-linear curves |
| Ridge Regression | Low | High | Multicollinear features |
| Lasso Regression | Low | High | Feature selection |
| Elastic Net | Low | High | High-dimensional sparse data |
| Bayesian Linear Regression | Medium | High | Uncertainty quantification |
| SVR | High | Low | Small-medium nonlinear data |
| Decision Tree Regression | Medium | High | Non-linear, interpretable models |
| Random Forest Regression | High | Medium | General-purpose, robust |
| Gradient Boosting Regression | High | Low | High accuracy needed |
| XGBoost Regression | High | Low | Competitions, structured data |
| LightGBM Regression | High | Low | Large datasets, fast training |
| CatBoost Regression | High | Low | Categorical features |
| Quantile Regression | Low-Med | High | Prediction intervals |
| Poisson Regression | Low | High | Count data |
1. Linear Regression
Description: Fits a linear equation (y = w0 + w1x1 + ... + wpxp) to the data by minimizing the sum of squared residuals (Ordinary Least Squares).
Use Cases: House price prediction, sales forecasting, trend analysis, any problem where the relationship between features and target is approximately linear.
Key Parameters:
fit_intercept— Whether to calculate the intercept (default: True)normalize— Whether to normalize features before fitting
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.squeeze() + np.random.randn(100) * 2
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_:.4f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
2. Polynomial Regression
Description: Extends linear regression by adding polynomial terms (x2, x3, etc.) to capture non-linear relationships while still using a linear model framework.
Use Cases: Growth curves, quadratic trends, any non-linear relationship that can be captured by polynomial features.
Key Parameters:
degree— Degree of the polynomial features (2, 3, etc.)interaction_only— If True, only interaction features are producedinclude_bias— Include a bias column of ones
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# Create polynomial regression (degree=3)
model = make_pipeline(
PolynomialFeatures(degree=3),
LinearRegression()
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")
3. Ridge Regression
Description: Linear regression with L2 regularization. Adds a penalty term (λ · ||w||2) to the loss function to prevent overfitting and handle multicollinearity.
Use Cases: When features are correlated, when you have more features than samples, preventing overfitting in linear models.
Key Parameters:
alpha— Regularization strength (higher = more regularization, default: 1.0)solver— Computational algorithm (auto, svd, cholesky, lsqr, sparse_cg, sag, saga)
from sklearn.linear_model import Ridge, RidgeCV
# Ridge with cross-validation to find best alpha
model = RidgeCV(alphas=[0.01, 0.1, 1.0, 10.0, 100.0], cv=5)
model.fit(X_train, y_train)
print(f"Best alpha: {model.alpha_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
4. Lasso Regression
Description: Linear regression with L1 regularization. Adds a penalty term (λ · ||w||1) that can shrink some coefficients to exactly zero, performing automatic feature selection.
Use Cases: Feature selection, sparse models, high-dimensional data where many features are irrelevant.
Key Parameters:
alpha— Regularization strength (default: 1.0)max_iter— Maximum number of iterations (default: 1000)tol— Tolerance for the optimization
from sklearn.linear_model import Lasso, LassoCV
model = LassoCV(cv=5, random_state=42)
model.fit(X_train, y_train)
print(f"Best alpha: {model.alpha_:.6f}")
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
5. Elastic Net Regression
Description: Combines L1 (Lasso) and L2 (Ridge) regularization. The loss function includes both penalty terms, controlled by a mixing parameter. This gives the benefits of both feature selection and coefficient shrinkage.
Use Cases: High-dimensional data with correlated features, when you want both feature selection and regularization.
Key Parameters:
alpha— Overall regularization strengthl1_ratio— Mix between L1 and L2 (0 = Ridge, 1 = Lasso, 0.5 = equal mix)
from sklearn.linear_model import ElasticNet, ElasticNetCV
model = ElasticNetCV(
l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
cv=5, random_state=42
)
model.fit(X_train, y_train)
print(f"Best alpha: {model.alpha_:.6f}")
print(f"Best l1_ratio: {model.l1_ratio_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
6. Bayesian Linear Regression
Description: A probabilistic approach to linear regression that places prior distributions on the model parameters. Instead of point estimates, it provides a full posterior distribution over weights, enabling uncertainty quantification.
Use Cases: When you need uncertainty estimates on predictions, small datasets, incorporating prior knowledge, medical/scientific applications where confidence intervals matter.
Key Parameters:
alpha_1, alpha_2— Shape parameters for the Gamma prior over alphalambda_1, lambda_2— Shape parameters for the Gamma prior over lambdacompute_score— Compute the log marginal likelihood at each iteration
from sklearn.linear_model import BayesianRidge
model = BayesianRidge(compute_score=True)
model.fit(X_train, y_train)
# Get predictions with uncertainty
y_pred, y_std = model.predict(X_test, return_std=True)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Mean prediction std: {y_std.mean():.4f}")
print(f"Alpha: {model.alpha_:.4f}")
print(f"Lambda: {model.lambda_:.4f}")
7. Support Vector Regression (SVR)
Description: Applies the Support Vector Machine framework to regression. SVR finds a function that deviates from the actual target by at most epsilon (ε), while being as flat as possible. Uses kernel trick for non-linear regression.
Use Cases: Non-linear regression with small-to-medium datasets, high-dimensional data, financial time series.
Key Parameters:
kernel— Kernel type: 'linear', 'poly', 'rbf', 'sigmoid' (default: 'rbf')C— Regularization parameter (default: 1.0)epsilon— Epsilon-tube width (default: 0.1)gamma— Kernel coefficient for rbf/poly/sigmoid
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
SVR(kernel='rbf', C=100, epsilon=0.1, gamma='scale')
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
8. Decision Tree Regression
Description: Recursively partitions the feature space into regions and predicts the mean target value in each region. Builds a tree structure by choosing the best split at each node to minimize variance (MSE).
Use Cases: Interpretable non-linear regression, feature importance analysis, data exploration.
Key Parameters:
max_depth— Maximum depth of the tree (controls overfitting)min_samples_split— Minimum samples required to split a node (default: 2)min_samples_leaf— Minimum samples in a leaf node (default: 1)criterion— 'squared_error', 'friedman_mse', 'absolute_error', 'poisson'
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor(
max_depth=5,
min_samples_split=10,
min_samples_leaf=5,
random_state=42
)
model.fit(X_train, y_train)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Tree depth: {model.get_depth()}")
print(f"Number of leaves: {model.get_n_leaves()}")
9. Random Forest Regression
Description: An ensemble of decision trees trained on random subsets of data (bagging) and random subsets of features. Predictions are averaged across all trees, reducing variance and overfitting.
Use Cases: General-purpose regression, robust to outliers, handles non-linear relationships, feature importance ranking.
Key Parameters:
n_estimators— Number of trees (default: 100)max_depth— Maximum tree depth (default: None = unlimited)max_features— Number of features to consider for each splitmin_samples_leaf— Minimum samples per leaf
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(
n_estimators=200,
max_depth=10,
min_samples_leaf=5,
max_features='sqrt',
random_state=42,
n_jobs=-1
)
model.fit(X_train, y_train)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Feature importances: {model.feature_importances_}")
10. Gradient Boosting Regression
Description: Builds an ensemble of weak learners (usually shallow decision trees) sequentially. Each new tree corrects the errors of the previous ensemble by fitting the negative gradient of the loss function.
Use Cases: High-accuracy predictions, structured/tabular data, competitions, any regression task where performance is critical.
Key Parameters:
n_estimators— Number of boosting stages (default: 100)learning_rate— Shrinkage factor (default: 0.1)max_depth— Maximum depth of individual trees (default: 3)subsample— Fraction of samples used per tree (stochastic gradient boosting)
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor(
n_estimators=300,
learning_rate=0.05,
max_depth=4,
subsample=0.8,
random_state=42
)
model.fit(X_train, y_train)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
11. XGBoost Regression
Description: An optimized implementation of gradient boosting with built-in regularization (L1 and L2), efficient handling of sparse data, parallel tree construction, and automatic handling of missing values.
Use Cases: Kaggle competitions, production ML systems, structured data, when you need the best accuracy on tabular data.
Key Parameters:
n_estimators— Number of boosting roundslearning_rate/eta— Step size shrinkagemax_depth— Maximum tree depth (default: 6)reg_alpha— L1 regularization on weightsreg_lambda— L2 regularization on weightssubsample— Row subsampling ratiocolsample_bytree— Column subsampling ratio
import xgboost as xgb
model = xgb.XGBRegressor(
n_estimators=500,
learning_rate=0.05,
max_depth=5,
reg_alpha=0.1,
reg_lambda=1.0,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
n_jobs=-1
)
model.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
verbose=False
)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Best iteration: {model.best_iteration}")
12. LightGBM Regression
Description: A fast, distributed gradient boosting framework that uses histogram-based learning (binning continuous features) and leaf-wise tree growth instead of level-wise, making it faster and often more accurate than traditional implementations.
Use Cases: Large datasets (millions of rows), fast training needed, structured/tabular data, production systems.
Key Parameters:
n_estimators— Number of boosting roundslearning_rate— Step size (default: 0.1)num_leaves— Maximum number of leaves per tree (default: 31)max_depth— Tree depth limit (-1 = no limit)min_child_samples— Minimum data in a leafreg_alpha,reg_lambda— L1 and L2 regularization
import lightgbm as lgb
model = lgb.LGBMRegressor(
n_estimators=500,
learning_rate=0.05,
num_leaves=31,
max_depth=-1,
min_child_samples=20,
reg_alpha=0.1,
reg_lambda=0.1,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
n_jobs=-1,
verbose=-1
)
model.fit(X_train, y_train)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
13. CatBoost Regression
Description: A gradient boosting library developed by Yandex with built-in support for categorical features (no manual encoding needed), ordered boosting to reduce overfitting, and symmetric tree structure for fast inference.
Use Cases: Datasets with many categorical features, production systems, when minimal preprocessing is desired.
Key Parameters:
iterations— Number of boosting roundslearning_rate— Step sizedepth— Tree depth (default: 6)l2_leaf_reg— L2 regularization coefficientcat_features— Indices of categorical columns
from catboost import CatBoostRegressor
model = CatBoostRegressor(
iterations=500,
learning_rate=0.05,
depth=6,
l2_leaf_reg=3,
random_seed=42,
verbose=0
)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")
14. Quantile Regression
Description: Instead of predicting the conditional mean (like ordinary regression), quantile regression predicts conditional quantiles (e.g., median, 10th percentile, 90th percentile). This allows constructing prediction intervals and understanding the full distribution of the target variable.
Use Cases: Risk assessment, financial modeling, prediction intervals, when you care about the distribution of outcomes rather than just the mean.
Key Parameters:
quantile— The quantile to predict (0.5 = median)alpha— Regularization strength
from sklearn.linear_model import QuantileRegressor
# Predict lower bound (10th percentile), median, and upper bound (90th percentile)
quantiles = [0.1, 0.5, 0.9]
predictions = {}
for q in quantiles:
model = QuantileRegressor(quantile=q, alpha=0.01, solver='highs')
model.fit(X_train, y_train)
predictions[q] = model.predict(X_test)
print(f"10th percentile (lower bound): {predictions[0.1][:5]}")
print(f"Median prediction: {predictions[0.5][:5]}")
print(f"90th percentile (upper bound): {predictions[0.9][:5]}")
15. Poisson Regression
Description: A generalized linear model (GLM) for count data. Assumes the target variable follows a Poisson distribution and uses a log link function. Suitable when predicting non-negative integer counts.
Use Cases: Insurance claim counts, website visits, number of defects, customer arrivals, any count-based outcome.
Key Parameters:
alpha— Regularization strength (default: 1.0)max_iter— Maximum number of iterations (default: 100)
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_poisson_deviance
import numpy as np
# Generate count data
np.random.seed(42)
X_count = np.random.rand(200, 3)
y_count = np.random.poisson(lam=np.exp(X_count @ [0.5, 1.0, -0.3]))
X_tr, X_te, y_tr, y_te = train_test_split(X_count, y_count, test_size=0.2)
model = PoissonRegressor(alpha=0.01, max_iter=300)
model.fit(X_tr, y_tr)
y_pred = model.predict(X_te)
print(f"R2 Score: {model.score(X_te, y_te):.4f}")
print(f"Mean Poisson deviance: {mean_poisson_deviance(y_te, y_pred):.4f}")