Classification Algorithms (17)
Algorithms for predicting discrete class labels
Classification algorithms assign input data to predefined categories. From simple logistic regression to powerful gradient boosting, these 17 algorithms cover binary, multi-class, and multi-label classification tasks.
Quick Reference Table
| Algorithm | Type | Best For |
|---|---|---|
| Logistic Regression | Linear | Baseline, interpretable binary classification |
| KNN | Instance-based | Small datasets, simple boundaries |
| SVM | Kernel-based | High-dimensional, clear margins |
| Decision Tree | Tree-based | Interpretable, non-linear |
| Random Forest | Ensemble | General-purpose, robust |
| Gaussian NB | Probabilistic | Continuous features, fast |
| Bernoulli NB | Probabilistic | Binary features, text |
| Multinomial NB | Probabilistic | Text classification |
| Gradient Boosting | Ensemble | High accuracy, tabular data |
| AdaBoost | Ensemble | Weak learner boosting |
| XGBoost | Ensemble | Competitions, production |
| LightGBM | Ensemble | Large datasets, fast |
| CatBoost | Ensemble | Categorical features |
| SGD Classifier | Linear | Very large datasets, online learning |
| Perceptron | Linear | Linearly separable data |
| Passive Aggressive | Linear | Online learning, streaming |
1. Logistic Regression
Description: Despite its name, logistic regression is a classification algorithm. It models the probability of class membership using the logistic (sigmoid) function, producing probabilities between 0 and 1. The decision boundary is linear in the feature space.
Use Cases: Medical diagnosis (disease/no disease), spam detection, credit scoring, any binary classification baseline.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=5000, C=1.0, solver='lbfgs')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
2. K-Nearest Neighbors (KNN)
Description: A lazy learner that classifies new data points based on the majority class of their k nearest neighbors in the feature space. Requires no training phase but stores all training data.
Use Cases: Recommendation systems, image recognition, anomaly detection, small-to-medium datasets.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
KNeighborsClassifier(n_neighbors=5, weights='distance', metric='minkowski')
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
3. Support Vector Machine (SVM)
Description: Finds the optimal hyperplane that maximizes the margin between classes. Uses the kernel trick to project data into higher-dimensional space for non-linear classification.
Use Cases: Text classification, image recognition, bioinformatics, high-dimensional data with clear margins.
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
SVC(kernel='rbf', C=1.0, gamma='scale', probability=True)
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
# Get probability predictions
proba = model.predict_proba(X_test)[:5]
print(f"Sample probabilities: {proba}")
4. Decision Tree Classifier
Description: Builds a tree structure by recursively splitting the data based on feature values that maximize information gain (or minimize Gini impurity). Each leaf node represents a class prediction.
Use Cases: Medical diagnosis, customer segmentation, rule extraction, interpretable models.
from sklearn.tree import DecisionTreeClassifier, export_text
model = DecisionTreeClassifier(
max_depth=5,
min_samples_split=10,
criterion='gini',
random_state=42
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Tree depth: {model.get_depth()}")
print(f"Number of leaves: {model.get_n_leaves()}")
# Print tree rules (first few lines)
tree_rules = export_text(model, feature_names=load_breast_cancer().feature_names.tolist())
print(tree_rules[:500])
5. Random Forest Classifier
Description: An ensemble of decision trees trained on bootstrap samples with random feature subsets. Final prediction is the majority vote across all trees. Reduces overfitting and variance compared to single trees.
Use Cases: General-purpose classification, fraud detection, churn prediction, feature importance analysis.
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(
n_estimators=200,
max_depth=10,
max_features='sqrt',
min_samples_leaf=5,
random_state=42,
n_jobs=-1
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
# Feature importance
importances = model.feature_importances_
top_features = sorted(zip(load_breast_cancer().feature_names, importances),
key=lambda x: x[1], reverse=True)[:5]
for name, imp in top_features:
print(f" {name}: {imp:.4f}")
6. Gaussian Naive Bayes
Description: Applies Bayes' theorem with the assumption that features are independent and follow a Gaussian (normal) distribution. Despite its simplicity, often performs surprisingly well.
Use Cases: Text classification, medical diagnosis, real-time prediction, when features are approximately normal.
from sklearn.naive_bayes import GaussianNB
model = GaussianNB(var_smoothing=1e-9)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
print(f"Class priors: {model.class_prior_}")
7. Bernoulli Naive Bayes
Description: A variant of Naive Bayes designed for binary/boolean features. Models each feature as a Bernoulli distribution. Penalizes the absence of a feature, unlike Multinomial NB.
Use Cases: Short text classification, binary feature data, sentiment analysis with binary bag-of-words.
from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer
# Example with text data
texts = ["good movie", "bad movie", "great film", "terrible film", "amazing", "awful"]
labels = [1, 0, 1, 0, 1, 0]
vectorizer = CountVectorizer(binary=True)
X_text = vectorizer.fit_transform(texts)
model = BernoulliNB(alpha=1.0, binarize=0.0)
model.fit(X_text, labels)
print(f"Classes: {model.classes_}")
8. Multinomial Naive Bayes
Description: Designed for count-based features (word counts, term frequencies). Models the distribution of word occurrences using the multinomial distribution. The standard choice for text classification.
Use Cases: Document classification, spam filtering, topic labeling, language detection.
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import make_pipeline
# Text classification pipeline
model = make_pipeline(
TfidfVectorizer(max_features=5000, ngram_range=(1, 2)),
MultinomialNB(alpha=0.1)
)
# Example with 20newsgroups or similar text data
# model.fit(X_train_text, y_train_text)
print("MultinomialNB: ideal for text classification with TF-IDF features")
9. Gradient Boosting Classifier
Description: Sequentially builds an ensemble of weak decision trees, where each new tree corrects the errors of the ensemble so far. Uses gradient descent on the loss function to determine the direction of improvement.
Use Cases: Credit risk modeling, churn prediction, click-through rate prediction, any structured data classification task.
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(
n_estimators=200,
learning_rate=0.1,
max_depth=3,
subsample=0.8,
random_state=42
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
10. AdaBoost
Description: Adaptive Boosting combines multiple weak classifiers (typically decision stumps) by reweighting misclassified samples at each iteration. Samples that are hard to classify get higher weights, forcing subsequent learners to focus on them.
Use Cases: Face detection (Viola-Jones), binary classification, when a simple model needs boosting.
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
model = AdaBoostClassifier(
estimator=DecisionTreeClassifier(max_depth=1),
n_estimators=200,
learning_rate=0.1,
algorithm='SAMME',
random_state=42
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
11. XGBoost Classifier
Description: Extreme Gradient Boosting for classification. Adds L1/L2 regularization, handles missing values natively, supports parallel processing, and includes built-in cross-validation.
Use Cases: Competition-winning models, production ML, structured/tabular data, imbalanced classes (with scale_pos_weight).
import xgboost as xgb
model = xgb.XGBClassifier(
n_estimators=300,
learning_rate=0.05,
max_depth=5,
reg_alpha=0.1,
reg_lambda=1.0,
subsample=0.8,
colsample_bytree=0.8,
eval_metric='logloss',
random_state=42,
n_jobs=-1
)
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
12. LightGBM Classifier
Description: Microsoft's gradient boosting framework using histogram-based learning and leaf-wise tree growth. Faster than XGBoost on large datasets while achieving comparable or better accuracy.
Use Cases: Large-scale classification, real-time systems, when training speed matters.
import lightgbm as lgb
model = lgb.LGBMClassifier(
n_estimators=300,
learning_rate=0.05,
num_leaves=31,
max_depth=-1,
min_child_samples=20,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
n_jobs=-1,
verbose=-1
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
13. CatBoost Classifier
Description: Yandex's gradient boosting library with native categorical feature handling, ordered boosting to prevent target leakage, and symmetric tree structure for fast GPU-based inference.
Use Cases: Datasets with many categorical features, minimal preprocessing, production deployment.
from catboost import CatBoostClassifier
model = CatBoostClassifier(
iterations=300,
learning_rate=0.05,
depth=6,
l2_leaf_reg=3,
random_seed=42,
verbose=0
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
14. Stochastic Gradient Descent (SGD) Classifier
Description: Implements linear classifiers (SVM, logistic regression) using stochastic gradient descent optimization. Processes one sample at a time, making it extremely efficient for very large datasets and online learning.
Use Cases: Very large datasets (millions of samples), online/incremental learning, text classification at scale.
from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
SGDClassifier(
loss='hinge', # SVM loss; use 'log_loss' for logistic
alpha=0.0001, # Regularization
max_iter=1000,
tol=1e-3,
random_state=42
)
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
15. Perceptron
Description: The simplest neural network -- a single-layer linear classifier. Updates weights only when a misclassification occurs. Converges for linearly separable data but cannot handle XOR-like problems.
Use Cases: Linearly separable problems, educational purposes, extremely fast baseline classifier.
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
Perceptron(
max_iter=1000,
eta0=0.1,
tol=1e-3,
random_state=42
)
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
16. Passive Aggressive Classifier
Description: An online learning algorithm that remains passive for correct classifications and becomes aggressive for misclassifications. Updates the model just enough to correct each mistake, controlled by the parameter C.
Use Cases: Online learning from streaming data, text classification, when data arrives sequentially.
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
model = make_pipeline(
StandardScaler(),
PassiveAggressiveClassifier(
C=1.0, # Aggressiveness parameter
max_iter=1000,
tol=1e-3,
random_state=42
)
)
model.fit(X_train, y_train)
print(f"Accuracy: {model.score(X_test, y_test):.4f}")
# Partial fit for online learning
# model.named_steps['passiveaggressiveclassifier'].partial_fit(X_new, y_new)
17. Naive Bayes (General)
Description: The Naive Bayes family applies Bayes' theorem with the "naive" assumption that all features are conditionally independent given the class. Despite this strong assumption, Naive Bayes classifiers work remarkably well in practice, especially for text data.
Use Cases: All variants are fast, work well with small training data, and provide probabilistic outputs. Choose Gaussian for continuous features, Multinomial for counts, and Bernoulli for binary features.
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score
# Compare all three Naive Bayes variants
classifiers = {
'GaussianNB': GaussianNB(),
'BernoulliNB': BernoulliNB(),
}
for name, clf in classifiers.items():
clf.fit(X_train, y_train)
acc = clf.score(X_test, y_test)
print(f"{name}: Accuracy = {acc:.4f}")