Home » ML Algorithm Directory » Regression

Regression Algorithms (15)

Algorithms for predicting continuous numeric values

Regression algorithms predict a continuous output variable based on one or more input features. From simple linear models to powerful gradient boosting machines, these 15 algorithms cover every regression need.

Quick Reference Table

Algorithm	Complexity	Interpretability	Best For
Linear Regression	Low	High	Linearly related data
Polynomial Regression	Low-Med	Medium	Non-linear curves
Ridge Regression	Low	High	Multicollinear features
Lasso Regression	Low	High	Feature selection
Elastic Net	Low	High	High-dimensional sparse data
Bayesian Linear Regression	Medium	High	Uncertainty quantification
SVR	High	Low	Small-medium nonlinear data
Decision Tree Regression	Medium	High	Non-linear, interpretable models
Random Forest Regression	High	Medium	General-purpose, robust
Gradient Boosting Regression	High	Low	High accuracy needed
XGBoost Regression	High	Low	Competitions, structured data
LightGBM Regression	High	Low	Large datasets, fast training
CatBoost Regression	High	Low	Categorical features
Quantile Regression	Low-Med	High	Prediction intervals
Poisson Regression	Low	High	Count data

1. Linear Regression

Description: Fits a linear equation (y = w₀ + w₁x₁ + ... + w_px_p) to the data by minimizing the sum of squared residuals (Ordinary Least Squares).

Use Cases: House price prediction, sales forecasting, trend analysis, any problem where the relationship between features and target is approximately linear.

Key Parameters:

fit_intercept — Whether to calculate the intercept (default: True)
normalize — Whether to normalize features before fitting

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate sample data
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2.5 * X.squeeze() + np.random.randn(100) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_:.4f}")
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

2. Polynomial Regression

Description: Extends linear regression by adding polynomial terms (x², x³, etc.) to capture non-linear relationships while still using a linear model framework.

Use Cases: Growth curves, quadratic trends, any non-linear relationship that can be captured by polynomial features.

Key Parameters:

degree — Degree of the polynomial features (2, 3, etc.)
interaction_only — If True, only interaction features are produced
include_bias — Include a bias column of ones

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# Create polynomial regression (degree=3)
model = make_pipeline(
    PolynomialFeatures(degree=3),
    LinearRegression()
)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"RMSE: {mean_squared_error(y_test, y_pred, squared=False):.4f}")

3. Ridge Regression

Description: Linear regression with L2 regularization. Adds a penalty term (λ · ||w||²) to the loss function to prevent overfitting and handle multicollinearity.

Use Cases: When features are correlated, when you have more features than samples, preventing overfitting in linear models.

Key Parameters:

alpha — Regularization strength (higher = more regularization, default: 1.0)
solver — Computational algorithm (auto, svd, cholesky, lsqr, sparse_cg, sag, saga)

from sklearn.linear_model import Ridge, RidgeCV

# Ridge with cross-validation to find best alpha
model = RidgeCV(alphas=[0.01, 0.1, 1.0, 10.0, 100.0], cv=5)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

4. Lasso Regression

Description: Linear regression with L1 regularization. Adds a penalty term (λ · ||w||₁) that can shrink some coefficients to exactly zero, performing automatic feature selection.

Use Cases: Feature selection, sparse models, high-dimensional data where many features are irrelevant.

Key Parameters:

alpha — Regularization strength (default: 1.0)
max_iter — Maximum number of iterations (default: 1000)
tol — Tolerance for the optimization

from sklearn.linear_model import Lasso, LassoCV

model = LassoCV(cv=5, random_state=42)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_:.6f}")
print(f"Non-zero coefficients: {(model.coef_ != 0).sum()}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

5. Elastic Net Regression

Description: Combines L1 (Lasso) and L2 (Ridge) regularization. The loss function includes both penalty terms, controlled by a mixing parameter. This gives the benefits of both feature selection and coefficient shrinkage.

Use Cases: High-dimensional data with correlated features, when you want both feature selection and regularization.

Key Parameters:

alpha — Overall regularization strength
l1_ratio — Mix between L1 and L2 (0 = Ridge, 1 = Lasso, 0.5 = equal mix)

from sklearn.linear_model import ElasticNet, ElasticNetCV

model = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    cv=5, random_state=42
)
model.fit(X_train, y_train)

print(f"Best alpha: {model.alpha_:.6f}")
print(f"Best l1_ratio: {model.l1_ratio_}")
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

6. Bayesian Linear Regression

Description: A probabilistic approach to linear regression that places prior distributions on the model parameters. Instead of point estimates, it provides a full posterior distribution over weights, enabling uncertainty quantification.

Use Cases: When you need uncertainty estimates on predictions, small datasets, incorporating prior knowledge, medical/scientific applications where confidence intervals matter.

Key Parameters:

alpha_1, alpha_2 — Shape parameters for the Gamma prior over alpha
lambda_1, lambda_2 — Shape parameters for the Gamma prior over lambda
compute_score — Compute the log marginal likelihood at each iteration

from sklearn.linear_model import BayesianRidge

model = BayesianRidge(compute_score=True)
model.fit(X_train, y_train)

# Get predictions with uncertainty
y_pred, y_std = model.predict(X_test, return_std=True)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Mean prediction std: {y_std.mean():.4f}")
print(f"Alpha: {model.alpha_:.4f}")
print(f"Lambda: {model.lambda_:.4f}")

7. Support Vector Regression (SVR)

Description: Applies the Support Vector Machine framework to regression. SVR finds a function that deviates from the actual target by at most epsilon (ε), while being as flat as possible. Uses kernel trick for non-linear regression.

Use Cases: Non-linear regression with small-to-medium datasets, high-dimensional data, financial time series.

Key Parameters:

kernel — Kernel type: 'linear', 'poly', 'rbf', 'sigmoid' (default: 'rbf')
C — Regularization parameter (default: 1.0)
epsilon — Epsilon-tube width (default: 0.1)
gamma — Kernel coefficient for rbf/poly/sigmoid

from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    SVR(kernel='rbf', C=100, epsilon=0.1, gamma='scale')
)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

8. Decision Tree Regression

Description: Recursively partitions the feature space into regions and predicts the mean target value in each region. Builds a tree structure by choosing the best split at each node to minimize variance (MSE).

Use Cases: Interpretable non-linear regression, feature importance analysis, data exploration.

Key Parameters:

max_depth — Maximum depth of the tree (controls overfitting)
min_samples_split — Minimum samples required to split a node (default: 2)
min_samples_leaf — Minimum samples in a leaf node (default: 1)
criterion — 'squared_error', 'friedman_mse', 'absolute_error', 'poisson'

from sklearn.tree import DecisionTreeRegressor

model = DecisionTreeRegressor(
    max_depth=5,
    min_samples_split=10,
    min_samples_leaf=5,
    random_state=42
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Tree depth: {model.get_depth()}")
print(f"Number of leaves: {model.get_n_leaves()}")

9. Random Forest Regression

Description: An ensemble of decision trees trained on random subsets of data (bagging) and random subsets of features. Predictions are averaged across all trees, reducing variance and overfitting.

Use Cases: General-purpose regression, robust to outliers, handles non-linear relationships, feature importance ranking.

Key Parameters:

n_estimators — Number of trees (default: 100)
max_depth — Maximum tree depth (default: None = unlimited)
max_features — Number of features to consider for each split
min_samples_leaf — Minimum samples per leaf

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(
    n_estimators=200,
    max_depth=10,
    min_samples_leaf=5,
    max_features='sqrt',
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Feature importances: {model.feature_importances_}")

10. Gradient Boosting Regression

Description: Builds an ensemble of weak learners (usually shallow decision trees) sequentially. Each new tree corrects the errors of the previous ensemble by fitting the negative gradient of the loss function.

Use Cases: High-accuracy predictions, structured/tabular data, competitions, any regression task where performance is critical.

Key Parameters:

n_estimators — Number of boosting stages (default: 100)
learning_rate — Shrinkage factor (default: 0.1)
max_depth — Maximum depth of individual trees (default: 3)
subsample — Fraction of samples used per tree (stochastic gradient boosting)

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.8,
    random_state=42
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")

11. XGBoost Regression

Description: An optimized implementation of gradient boosting with built-in regularization (L1 and L2), efficient handling of sparse data, parallel tree construction, and automatic handling of missing values.

Use Cases: Kaggle competitions, production ML systems, structured data, when you need the best accuracy on tabular data.

Key Parameters:

n_estimators — Number of boosting rounds
learning_rate / eta — Step size shrinkage
max_depth — Maximum tree depth (default: 6)
reg_alpha — L1 regularization on weights
reg_lambda — L2 regularization on weights
subsample — Row subsampling ratio
colsample_bytree — Column subsampling ratio

import xgboost as xgb

model = xgb.XGBRegressor(
    n_estimators=500,
    learning_rate=0.05,
    max_depth=5,
    reg_alpha=0.1,
    reg_lambda=1.0,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1
)
model.fit(
    X_train, y_train,
    eval_set=[(X_test, y_test)],
    verbose=False
)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")
print(f"Best iteration: {model.best_iteration}")

12. LightGBM Regression

Description: A fast, distributed gradient boosting framework that uses histogram-based learning (binning continuous features) and leaf-wise tree growth instead of level-wise, making it faster and often more accurate than traditional implementations.

Use Cases: Large datasets (millions of rows), fast training needed, structured/tabular data, production systems.

Key Parameters:

n_estimators — Number of boosting rounds
learning_rate — Step size (default: 0.1)
num_leaves — Maximum number of leaves per tree (default: 31)
max_depth — Tree depth limit (-1 = no limit)
min_child_samples — Minimum data in a leaf
reg_alpha, reg_lambda — L1 and L2 regularization

import lightgbm as lgb

model = lgb.LGBMRegressor(
    n_estimators=500,
    learning_rate=0.05,
    num_leaves=31,
    max_depth=-1,
    min_child_samples=20,
    reg_alpha=0.1,
    reg_lambda=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1,
    verbose=-1
)
model.fit(X_train, y_train)

print(f"R2 Score: {model.score(X_test, y_test):.4f}")

13. CatBoost Regression

Description: A gradient boosting library developed by Yandex with built-in support for categorical features (no manual encoding needed), ordered boosting to reduce overfitting, and symmetric tree structure for fast inference.

Use Cases: Datasets with many categorical features, production systems, when minimal preprocessing is desired.

Key Parameters:

iterations — Number of boosting rounds
learning_rate — Step size
depth — Tree depth (default: 6)
l2_leaf_reg — L2 regularization coefficient
cat_features — Indices of categorical columns

from catboost import CatBoostRegressor

model = CatBoostRegressor(
    iterations=500,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    random_seed=42,
    verbose=0
)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(f"R2 Score: {model.score(X_test, y_test):.4f}")

14. Quantile Regression

Description: Instead of predicting the conditional mean (like ordinary regression), quantile regression predicts conditional quantiles (e.g., median, 10th percentile, 90th percentile). This allows constructing prediction intervals and understanding the full distribution of the target variable.

Use Cases: Risk assessment, financial modeling, prediction intervals, when you care about the distribution of outcomes rather than just the mean.

Key Parameters:

quantile — The quantile to predict (0.5 = median)
alpha — Regularization strength

from sklearn.linear_model import QuantileRegressor

# Predict lower bound (10th percentile), median, and upper bound (90th percentile)
quantiles = [0.1, 0.5, 0.9]
predictions = {}

for q in quantiles:
    model = QuantileRegressor(quantile=q, alpha=0.01, solver='highs')
    model.fit(X_train, y_train)
    predictions[q] = model.predict(X_test)

print(f"10th percentile (lower bound): {predictions[0.1][:5]}")
print(f"Median prediction: {predictions[0.5][:5]}")
print(f"90th percentile (upper bound): {predictions[0.9][:5]}")

15. Poisson Regression

Description: A generalized linear model (GLM) for count data. Assumes the target variable follows a Poisson distribution and uses a log link function. Suitable when predicting non-negative integer counts.

Use Cases: Insurance claim counts, website visits, number of defects, customer arrivals, any count-based outcome.

Key Parameters:

alpha — Regularization strength (default: 1.0)
max_iter — Maximum number of iterations (default: 100)

from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import mean_poisson_deviance
import numpy as np

# Generate count data
np.random.seed(42)
X_count = np.random.rand(200, 3)
y_count = np.random.poisson(lam=np.exp(X_count @ [0.5, 1.0, -0.3]))

X_tr, X_te, y_tr, y_te = train_test_split(X_count, y_count, test_size=0.2)

model = PoissonRegressor(alpha=0.01, max_iter=300)
model.fit(X_tr, y_tr)

y_pred = model.predict(X_te)
print(f"R2 Score: {model.score(X_te, y_te):.4f}")
print(f"Mean Poisson deviance: {mean_poisson_deviance(y_te, y_pred):.4f}")

← Previous: Overview Next: Classification →