Regression Algorithms (15)

Algorithms for predicting continuous numeric values

Regression algorithms predict a continuous output variable based on one or more input features. From simple linear models to powerful gradient boosting machines, these 15 algorithms cover every regression need.

Quick Reference Table

AlgorithmComplexityInterpretabilityBest For
Linear RegressionLowHighLinearly related data
Polynomial RegressionLow-MedMediumNon-linear curves
Ridge RegressionLowHighMulticollinear features
Lasso RegressionLowHighFeature selection
Elastic NetLowHighHigh-dimensional sparse data
Bayesian Linear RegressionMediumHighUncertainty quantification
SVRHighLowSmall-medium nonlinear data
Decision Tree RegressionMediumHighNon-linear, interpretable models
Random Forest RegressionHighMediumGeneral-purpose, robust
Gradient Boosting RegressionHighLowHigh accuracy needed
XGBoost RegressionHighLowCompetitions, structured data
LightGBM RegressionHighLowLarge datasets, fast training
CatBoost RegressionHighLowCategorical features
Quantile RegressionLow-MedHighPrediction intervals
Poisson RegressionLowHighCount data

1. Linear Regression

Description: Fits a linear equation (y = w0 + w1x1 + ... + wpxp) to the data by minimizing the sum of squared residuals (Ordinary Least Squares).

Use Cases: House price prediction, sales forecasting, trend analysis, any problem where the relationship between features and target is approximately linear.

Key Parameters:

  • fit_intercept — Whether to calculate the intercept (default: True)
  • normalize — Whether to normalize features before fitting
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.squeeze() + np.random.randn(100) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_:.4f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

2. Polynomial Regression

Description: Extends linear regression by adding polynomial terms (x2, x3, etc.) to capture non-linear relationships while still using a linear model framework.

Use Cases: Growth curves, quadratic trends, any non-linear relationship that can be captured by polynomial features.

Key Parameters:

  • degree — Degree of the polynomial features (2, 3, etc.)
  • interaction_only — If True, only interaction features are produced
  • include_bias — Include a bias column of ones
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Create polynomial regression (degree=3)
model = make_pipeline(
    PolynomialFeatures(degree=3),
    LinearRegression()
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")

3. Ridge Regression

Description: Linear regression with L2 regularization. Adds a penalty term (λ · ||w||2) to the loss function to prevent overfitting and handle multicollinearity.

Use Cases: When features are correlated, when you have more features than samples, preventing overfitting in linear models.

Key Parameters:

  • alpha — Regularization strength (higher = more regularization, default: 1.0)
  • solver — Computational algorithm (auto, svd, cholesky, lsqr, sparse_cg, sag, saga)
from sklearn.linear_model import Ridge, RidgeCV

# Ridge with cross-validation to find best alpha
model = RidgeCV(alphas=[0.01, 0.1, 1.0, 10.0, 100.0], cv=5)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

4. Lasso Regression

Description: Linear regression with L1 regularization. Adds a penalty term (λ · ||w||1) that can shrink some coefficients to exactly zero, performing automatic feature selection.

Use Cases: Feature selection, sparse models, high-dimensional data where many features are irrelevant.

Key Parameters:

  • alpha — Regularization strength (default: 1.0)
  • max_iter — Maximum number of iterations (default: 1000)
  • tol — Tolerance for the optimization
from sklearn.linear_model import Lasso, LassoCV

model = LassoCV(cv=5, random_state=42)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_:.6f}")
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

5. Elastic Net Regression

Description: Combines L1 (Lasso) and L2 (Ridge) regularization. The loss function includes both penalty terms, controlled by a mixing parameter. This gives the benefits of both feature selection and coefficient shrinkage.

Use Cases: High-dimensional data with correlated features, when you want both feature selection and regularization.

Key Parameters:

  • alpha — Overall regularization strength
  • l1_ratio — Mix between L1 and L2 (0 = Ridge, 1 = Lasso, 0.5 = equal mix)
from sklearn.linear_model import ElasticNet, ElasticNetCV

model = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    cv=5, random_state=42
)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_:.6f}")
print(f"Best l1_ratio: {model.l1_ratio_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

6. Bayesian Linear Regression

Description: A probabilistic approach to linear regression that places prior distributions on the model parameters. Instead of point estimates, it provides a full posterior distribution over weights, enabling uncertainty quantification.

Use Cases: When you need uncertainty estimates on predictions, small datasets, incorporating prior knowledge, medical/scientific applications where confidence intervals matter.

Key Parameters:

  • alpha_1, alpha_2 — Shape parameters for the Gamma prior over alpha
  • lambda_1, lambda_2 — Shape parameters for the Gamma prior over lambda
  • compute_score — Compute the log marginal likelihood at each iteration
from sklearn.linear_model import BayesianRidge

model = BayesianRidge(compute_score=True)
model.fit(X_train, y_train)

# Get predictions with uncertainty
y_pred, y_std = model.predict(X_test, return_std=True)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Mean prediction std: {y_std.mean():.4f}")
print(f"Alpha: {model.alpha_:.4f}")
print(f"Lambda: {model.lambda_:.4f}")

7. Support Vector Regression (SVR)

Description: Applies the Support Vector Machine framework to regression. SVR finds a function that deviates from the actual target by at most epsilon (ε), while being as flat as possible. Uses kernel trick for non-linear regression.

Use Cases: Non-linear regression with small-to-medium datasets, high-dimensional data, financial time series.

Key Parameters:

  • kernel — Kernel type: 'linear', 'poly', 'rbf', 'sigmoid' (default: 'rbf')
  • C — Regularization parameter (default: 1.0)
  • epsilon — Epsilon-tube width (default: 0.1)
  • gamma — Kernel coefficient for rbf/poly/sigmoid
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    SVR(kernel='rbf', C=100, epsilon=0.1, gamma='scale')
)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

8. Decision Tree Regression

Description: Recursively partitions the feature space into regions and predicts the mean target value in each region. Builds a tree structure by choosing the best split at each node to minimize variance (MSE).

Use Cases: Interpretable non-linear regression, feature importance analysis, data exploration.

Key Parameters:

  • max_depth — Maximum depth of the tree (controls overfitting)
  • min_samples_split — Minimum samples required to split a node (default: 2)
  • min_samples_leaf — Minimum samples in a leaf node (default: 1)
  • criterion — 'squared_error', 'friedman_mse', 'absolute_error', 'poisson'
from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor(
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=5,
    random_state=42
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Tree depth: {model.get_depth()}")
print(f"Number of leaves: {model.get_n_leaves()}")

9. Random Forest Regression

Description: An ensemble of decision trees trained on random subsets of data (bagging) and random subsets of features. Predictions are averaged across all trees, reducing variance and overfitting.

Use Cases: General-purpose regression, robust to outliers, handles non-linear relationships, feature importance ranking.

Key Parameters:

  • n_estimators — Number of trees (default: 100)
  • max_depth — Maximum tree depth (default: None = unlimited)
  • max_features — Number of features to consider for each split
  • min_samples_leaf — Minimum samples per leaf
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=200,
    max_depth=10,
    min_samples_leaf=5,
    max_features='sqrt',
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Feature importances: {model.feature_importances_}")

10. Gradient Boosting Regression

Description: Builds an ensemble of weak learners (usually shallow decision trees) sequentially. Each new tree corrects the errors of the previous ensemble by fitting the negative gradient of the loss function.

Use Cases: High-accuracy predictions, structured/tabular data, competitions, any regression task where performance is critical.

Key Parameters:

  • n_estimators — Number of boosting stages (default: 100)
  • learning_rate — Shrinkage factor (default: 0.1)
  • max_depth — Maximum depth of individual trees (default: 3)
  • subsample — Fraction of samples used per tree (stochastic gradient boosting)
from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.8,
    random_state=42
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")

11. XGBoost Regression

Description: An optimized implementation of gradient boosting with built-in regularization (L1 and L2), efficient handling of sparse data, parallel tree construction, and automatic handling of missing values.

Use Cases: Kaggle competitions, production ML systems, structured data, when you need the best accuracy on tabular data.

Key Parameters:

  • n_estimators — Number of boosting rounds
  • learning_rate / eta — Step size shrinkage
  • max_depth — Maximum tree depth (default: 6)
  • reg_alpha — L1 regularization on weights
  • reg_lambda — L2 regularization on weights
  • subsample — Row subsampling ratio
  • colsample_bytree — Column subsampling ratio
import xgboost as xgb

model = xgb.XGBRegressor(
    n_estimators=500,
    learning_rate=0.05,
    max_depth=5,
    reg_alpha=0.1,
    reg_lambda=1.0,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1
)
model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    verbose=False
)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Best iteration: {model.best_iteration}")

12. LightGBM Regression

Description: A fast, distributed gradient boosting framework that uses histogram-based learning (binning continuous features) and leaf-wise tree growth instead of level-wise, making it faster and often more accurate than traditional implementations.

Use Cases: Large datasets (millions of rows), fast training needed, structured/tabular data, production systems.

Key Parameters:

  • n_estimators — Number of boosting rounds
  • learning_rate — Step size (default: 0.1)
  • num_leaves — Maximum number of leaves per tree (default: 31)
  • max_depth — Tree depth limit (-1 = no limit)
  • min_child_samples — Minimum data in a leaf
  • reg_alpha, reg_lambda — L1 and L2 regularization
import lightgbm as lgb

model = lgb.LGBMRegressor(
    n_estimators=500,
    learning_rate=0.05,
    num_leaves=31,
    max_depth=-1,
    min_child_samples=20,
    reg_alpha=0.1,
    reg_lambda=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1,
    verbose=-1
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")

13. CatBoost Regression

Description: A gradient boosting library developed by Yandex with built-in support for categorical features (no manual encoding needed), ordered boosting to reduce overfitting, and symmetric tree structure for fast inference.

Use Cases: Datasets with many categorical features, production systems, when minimal preprocessing is desired.

Key Parameters:

  • iterations — Number of boosting rounds
  • learning_rate — Step size
  • depth — Tree depth (default: 6)
  • l2_leaf_reg — L2 regularization coefficient
  • cat_features — Indices of categorical columns
from catboost import CatBoostRegressor

model = CatBoostRegressor(
    iterations=500,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    random_seed=42,
    verbose=0
)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

14. Quantile Regression

Description: Instead of predicting the conditional mean (like ordinary regression), quantile regression predicts conditional quantiles (e.g., median, 10th percentile, 90th percentile). This allows constructing prediction intervals and understanding the full distribution of the target variable.

Use Cases: Risk assessment, financial modeling, prediction intervals, when you care about the distribution of outcomes rather than just the mean.

Key Parameters:

  • quantile — The quantile to predict (0.5 = median)
  • alpha — Regularization strength
from sklearn.linear_model import QuantileRegressor

# Predict lower bound (10th percentile), median, and upper bound (90th percentile)
quantiles = [0.1, 0.5, 0.9]
predictions = {}

for q in quantiles:
    model = QuantileRegressor(quantile=q, alpha=0.01, solver='highs')
    model.fit(X_train, y_train)
    predictions[q] = model.predict(X_test)

print(f"10th percentile (lower bound): {predictions[0.1][:5]}")
print(f"Median prediction: {predictions[0.5][:5]}")
print(f"90th percentile (upper bound): {predictions[0.9][:5]}")

15. Poisson Regression

Description: A generalized linear model (GLM) for count data. Assumes the target variable follows a Poisson distribution and uses a log link function. Suitable when predicting non-negative integer counts.

Use Cases: Insurance claim counts, website visits, number of defects, customer arrivals, any count-based outcome.

Key Parameters:

  • alpha — Regularization strength (default: 1.0)
  • max_iter — Maximum number of iterations (default: 100)
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_poisson_deviance
import numpy as np

# Generate count data
np.random.seed(42)
X_count = np.random.rand(200, 3)
y_count = np.random.poisson(lam=np.exp(X_count @ [0.5, 1.0, -0.3]))

X_tr, X_te, y_tr, y_te = train_test_split(X_count, y_count, test_size=0.2)

model = PoissonRegressor(alpha=0.01, max_iter=300)
model.fit(X_tr, y_tr)

y_pred = model.predict(X_te)
print(f"R2 Score: {model.score(X_te, y_te):.4f}")
print(f"Mean Poisson deviance: {mean_poisson_deviance(y_te, y_pred):.4f}")