Home » ML Algorithm Directory » Classification

Classification Algorithms (17)

Algorithms for predicting discrete class labels

Classification algorithms assign input data to predefined categories. From simple logistic regression to powerful gradient boosting, these 17 algorithms cover binary, multi-class, and multi-label classification tasks.

Quick Reference Table

Algorithm	Type	Best For
Logistic Regression	Linear	Baseline, interpretable binary classification
KNN	Instance-based	Small datasets, simple boundaries
SVM	Kernel-based	High-dimensional, clear margins
Decision Tree	Tree-based	Interpretable, non-linear
Random Forest	Ensemble	General-purpose, robust
Gaussian NB	Probabilistic	Continuous features, fast
Bernoulli NB	Probabilistic	Binary features, text
Multinomial NB	Probabilistic	Text classification
Gradient Boosting	Ensemble	High accuracy, tabular data
AdaBoost	Ensemble	Weak learner boosting
XGBoost	Ensemble	Competitions, production
LightGBM	Ensemble	Large datasets, fast
CatBoost	Ensemble	Categorical features
SGD Classifier	Linear	Very large datasets, online learning
Perceptron	Linear	Linearly separable data
Passive Aggressive	Linear	Online learning, streaming

1. Logistic Regression

Description: Despite its name, logistic regression is a classification algorithm. It models the probability of class membership using the logistic (sigmoid) function, producing probabilities between 0 and 1. The decision boundary is linear in the feature space.

Use Cases: Medical diagnosis (disease/no disease), spam detection, credit scoring, any binary classification baseline.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=5000, C=1.0, solver='lbfgs')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

2. K-Nearest Neighbors (KNN)

Description: A lazy learner that classifies new data points based on the majority class of their k nearest neighbors in the feature space. Requires no training phase but stores all training data.

Use Cases: Recommendation systems, image recognition, anomaly detection, small-to-medium datasets.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    KNeighborsClassifier(n_neighbors=5, weights='distance', metric='minkowski')
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

3. Support Vector Machine (SVM)

Description: Finds the optimal hyperplane that maximizes the margin between classes. Uses the kernel trick to project data into higher-dimensional space for non-linear classification.

Use Cases: Text classification, image recognition, bioinformatics, high-dimensional data with clear margins.

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    SVC(kernel='rbf', C=1.0, gamma='scale', probability=True)
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

# Get probability predictions
proba = model.predict_proba(X_test)[:5]
print(f"Sample probabilities: {proba}")

4. Decision Tree Classifier

Description: Builds a tree structure by recursively splitting the data based on feature values that maximize information gain (or minimize Gini impurity). Each leaf node represents a class prediction.

Use Cases: Medical diagnosis, customer segmentation, rule extraction, interpretable models.

from sklearn.tree import DecisionTreeClassifier, export_text

model = DecisionTreeClassifier(
    max_depth=5,
    min_samples_split=10,
    criterion='gini',
    random_state=42
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Tree depth: {model.get_depth()}")
print(f"Number of leaves: {model.get_n_leaves()}")

# Print tree rules (first few lines)
tree_rules = export_text(model, feature_names=load_breast_cancer().feature_names.tolist())
print(tree_rules[:500])

5. Random Forest Classifier

Description: An ensemble of decision trees trained on bootstrap samples with random feature subsets. Final prediction is the majority vote across all trees. Reduces overfitting and variance compared to single trees.

Use Cases: General-purpose classification, fraud detection, churn prediction, feature importance analysis.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    max_features='sqrt',
    min_samples_leaf=5,
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

# Feature importance
importances = model.feature_importances_
top_features = sorted(zip(load_breast_cancer().feature_names, importances),
                      key=lambda x: x[1], reverse=True)[:5]
for name, imp in top_features:
    print(f"  {name}: {imp:.4f}")

6. Gaussian Naive Bayes

Description: Applies Bayes' theorem with the assumption that features are independent and follow a Gaussian (normal) distribution. Despite its simplicity, often performs surprisingly well.

Use Cases: Text classification, medical diagnosis, real-time prediction, when features are approximately normal.

from sklearn.naive_bayes import GaussianNB

model = GaussianNB(var_smoothing=1e-9)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Class priors: {model.class_prior_}")

7. Bernoulli Naive Bayes

Description: A variant of Naive Bayes designed for binary/boolean features. Models each feature as a Bernoulli distribution. Penalizes the absence of a feature, unlike Multinomial NB.

Use Cases: Short text classification, binary feature data, sentiment analysis with binary bag-of-words.

from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer

# Example with text data
texts = ["good movie", "bad movie", "great film", "terrible film", "amazing", "awful"]
labels = [1, 0, 1, 0, 1, 0]

vectorizer = CountVectorizer(binary=True)
X_text = vectorizer.fit_transform(texts)

model = BernoulliNB(alpha=1.0, binarize=0.0)
model.fit(X_text, labels)
print(f"Classes: {model.classes_}")

8. Multinomial Naive Bayes

Description: Designed for count-based features (word counts, term frequencies). Models the distribution of word occurrences using the multinomial distribution. The standard choice for text classification.

Use Cases: Document classification, spam filtering, topic labeling, language detection.

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import make_pipeline

# Text classification pipeline
model = make_pipeline(
    TfidfVectorizer(max_features=5000, ngram_range=(1, 2)),
    MultinomialNB(alpha=0.1)
)

# Example with 20newsgroups or similar text data
# model.fit(X_train_text, y_train_text)

print("MultinomialNB: ideal for text classification with TF-IDF features")

9. Gradient Boosting Classifier

Description: Sequentially builds an ensemble of weak decision trees, where each new tree corrects the errors of the ensemble so far. Uses gradient descent on the loss function to determine the direction of improvement.

Use Cases: Credit risk modeling, churn prediction, click-through rate prediction, any structured data classification task.

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=3,
    subsample=0.8,
    random_state=42
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

10. AdaBoost

Description: Adaptive Boosting combines multiple weak classifiers (typically decision stumps) by reweighting misclassified samples at each iteration. Samples that are hard to classify get higher weights, forcing subsequent learners to focus on them.

Use Cases: Face detection (Viola-Jones), binary classification, when a simple model needs boosting.

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

model = AdaBoostClassifier(
    estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=200,
    learning_rate=0.1,
    algorithm='SAMME',
    random_state=42
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

11. XGBoost Classifier

Description: Extreme Gradient Boosting for classification. Adds L1/L2 regularization, handles missing values natively, supports parallel processing, and includes built-in cross-validation.

Use Cases: Competition-winning models, production ML, structured/tabular data, imbalanced classes (with scale_pos_weight).

import xgboost as xgb

model = xgb.XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=5,
    reg_alpha=0.1,
    reg_lambda=1.0,
    subsample=0.8,
    colsample_bytree=0.8,
    eval_metric='logloss',
    random_state=42,
    n_jobs=-1
)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

12. LightGBM Classifier

Description: Microsoft's gradient boosting framework using histogram-based learning and leaf-wise tree growth. Faster than XGBoost on large datasets while achieving comparable or better accuracy.

Use Cases: Large-scale classification, real-time systems, when training speed matters.

import lightgbm as lgb

model = lgb.LGBMClassifier(
    n_estimators=300,
    learning_rate=0.05,
    num_leaves=31,
    max_depth=-1,
    min_child_samples=20,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1,
    verbose=-1
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

13. CatBoost Classifier

Description: Yandex's gradient boosting library with native categorical feature handling, ordered boosting to prevent target leakage, and symmetric tree structure for fast GPU-based inference.

Use Cases: Datasets with many categorical features, minimal preprocessing, production deployment.

from catboost import CatBoostClassifier

model = CatBoostClassifier(
    iterations=300,
    learning_rate=0.05,
    depth=6,
    l2_leaf_reg=3,
    random_seed=42,
    verbose=0
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

14. Stochastic Gradient Descent (SGD) Classifier

Description: Implements linear classifiers (SVM, logistic regression) using stochastic gradient descent optimization. Processes one sample at a time, making it extremely efficient for very large datasets and online learning.

Use Cases: Very large datasets (millions of samples), online/incremental learning, text classification at scale.

from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    SGDClassifier(
        loss='hinge',           # SVM loss; use 'log_loss' for logistic
        alpha=0.0001,           # Regularization
        max_iter=1000,
        tol=1e-3,
        random_state=42
    )
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

15. Perceptron

Description: The simplest neural network -- a single-layer linear classifier. Updates weights only when a misclassification occurs. Converges for linearly separable data but cannot handle XOR-like problems.

Use Cases: Linearly separable problems, educational purposes, extremely fast baseline classifier.

from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    Perceptron(
        max_iter=1000,
        eta0=0.1,
        tol=1e-3,
        random_state=42
    )
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

16. Passive Aggressive Classifier

Description: An online learning algorithm that remains passive for correct classifications and becomes aggressive for misclassifications. Updates the model just enough to correct each mistake, controlled by the parameter C.

Use Cases: Online learning from streaming data, text classification, when data arrives sequentially.

from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline(
    StandardScaler(),
    PassiveAggressiveClassifier(
        C=1.0,                  # Aggressiveness parameter
        max_iter=1000,
        tol=1e-3,
        random_state=42
    )
)
model.fit(X_train, y_train)

print(f"Accuracy: {model.score(X_test, y_test):.4f}")

# Partial fit for online learning
# model.named_steps['passiveaggressiveclassifier'].partial_fit(X_new, y_new)

17. Naive Bayes (General)

Description: The Naive Bayes family applies Bayes' theorem with the "naive" assumption that all features are conditionally independent given the class. Despite this strong assumption, Naive Bayes classifiers work remarkably well in practice, especially for text data.

Use Cases: All variants are fast, work well with small training data, and provide probabilistic outputs. Choose Gaussian for continuous features, Multinomial for counts, and Bernoulli for binary features.

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score

# Compare all three Naive Bayes variants
classifiers = {
    'GaussianNB': GaussianNB(),
    'BernoulliNB': BernoulliNB(),
}

for name, clf in classifiers.items():
    clf.fit(X_train, y_train)
    acc = clf.score(X_test, y_test)
    print(f"{name}: Accuracy = {acc:.4f}")

← Previous: Regression Next: Clustering →