Dimensionality Reduction Algorithms (10)

Algorithms for reducing feature space while preserving information

Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation while retaining the most important structure. These techniques are essential for visualization, noise reduction, computation speedup, and overcoming the curse of dimensionality.

Quick Reference Table

AlgorithmLinear?Supervised?PreservesBest For
PCAYesNoVarianceGeneral-purpose reduction
Kernel PCANoNoVariance (in kernel space)Non-linear manifolds
LDAYesYesClass separabilityPre-classification feature extraction
t-SNENoNoLocal structureVisualization (2D/3D)
UMAPNoNoLocal + global structureVisualization, general reduction
ICAYesNoStatistical independenceSignal separation
Factor AnalysisYesNoLatent factorsPsychology, social science
NMFYesNoNon-negative partsTopic modeling, image parts
IsomapNoNoGeodesic distancesNon-linear manifolds
LLENoNoLocal geometryManifold unfolding

1. Principal Component Analysis (PCA)

Description: Finds orthogonal axes (principal components) that capture the maximum variance in the data. Projects data onto these axes to reduce dimensionality. The first component captures the most variance, the second captures the next most (orthogonal to the first), and so on.

Use Cases: Feature reduction, data visualization, noise filtering, preprocessing for ML models, image compression.

from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
import numpy as np

X, y = load_digits(return_X_y=True)  # 64 features (8x8 images)

# Reduce to 2D for visualization
pca_2d = PCA(n_components=2)
X_2d = pca_2d.fit_transform(X)
print(f"Explained variance (2 components): {pca_2d.explained_variance_ratio_.sum():.4f}")

# Find number of components for 95% variance
pca_95 = PCA(n_components=0.95)
X_95 = pca_95.fit_transform(X)
print(f"Components for 95% variance: {pca_95.n_components_}")
print(f"Shape: {X.shape} -> {X_95.shape}")

2. Kernel PCA

Description: Extends PCA to capture non-linear relationships by applying the kernel trick. Maps data into a higher-dimensional space using a kernel function, then performs PCA in that space.

Use Cases: Non-linear feature extraction, when standard PCA cannot capture the data structure (e.g., concentric circles).

from sklearn.decomposition import KernelPCA

model = KernelPCA(
    n_components=2,
    kernel='rbf',        # 'linear', 'poly', 'rbf', 'sigmoid', 'cosine'
    gamma=0.01,
    random_state=42
)
X_kpca = model.fit_transform(X)
print(f"Transformed shape: {X_kpca.shape}")

3. Linear Discriminant Analysis (LDA)

Description: A supervised dimensionality reduction method that finds linear combinations of features that best separate classes. Maximizes the ratio of between-class variance to within-class variance. Can reduce to at most (n_classes - 1) dimensions.

Use Cases: Pre-processing for classification, face recognition, feature extraction when class labels are available.

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

model = LinearDiscriminantAnalysis(n_components=2)
X_lda = model.fit_transform(X, y)

print(f"Transformed shape: {X_lda.shape}")
print(f"Explained variance ratio: {model.explained_variance_ratio_}")
print(f"Max components: {min(len(set(y)) - 1, X.shape[1])}")

4. t-SNE

Description: t-distributed Stochastic Neighbor Embedding. A non-linear technique that models pairwise similarities in high-dimensional space and finds a low-dimensional embedding that preserves these similarities. Excels at revealing local cluster structure for visualization.

Use Cases: Data visualization in 2D/3D, exploring cluster structure, visualizing embeddings (word2vec, image features).

from sklearn.manifold import TSNE

model = TSNE(
    n_components=2,
    perplexity=30,          # Balance between local and global aspects
    learning_rate='auto',
    n_iter=1000,
    random_state=42
)
X_tsne = model.fit_transform(X)

print(f"Transformed shape: {X_tsne.shape}")
print(f"KL divergence: {model.kl_divergence_:.4f}")
# Note: t-SNE is for visualization only; cannot transform new data

5. UMAP

Description: Uniform Manifold Approximation and Projection. A non-linear reduction method based on manifold learning and topological data analysis. Faster than t-SNE and better at preserving global structure while still revealing local clusters.

Use Cases: Visualization (faster and more scalable than t-SNE), general-purpose dimensionality reduction, clustering preprocessing.

import umap

model = umap.UMAP(
    n_components=2,
    n_neighbors=15,          # Size of local neighborhood
    min_dist=0.1,            # Minimum distance between points
    metric='euclidean',
    random_state=42
)
X_umap = model.fit_transform(X)

print(f"Transformed shape: {X_umap.shape}")
# UMAP can transform new data (unlike t-SNE)
# X_new_umap = model.transform(X_new)

6. Independent Component Analysis (ICA)

Description: Separates a multivariate signal into independent non-Gaussian components. Unlike PCA (which finds uncorrelated components), ICA finds statistically independent components. The classic cocktail party problem: separating individual voices from mixed recordings.

Use Cases: Blind source separation, EEG/fMRI signal processing, feature extraction where independence matters.

from sklearn.decomposition import FastICA

model = FastICA(
    n_components=10,
    algorithm='parallel',    # 'parallel' or 'deflation'
    whiten='unit-variance',
    max_iter=200,
    random_state=42
)
X_ica = model.fit_transform(X)

print(f"Transformed shape: {X_ica.shape}")
print(f"Mixing matrix shape: {model.mixing_.shape}")
print(f"Number of iterations: {model.n_iter_}")

7. Factor Analysis

Description: A statistical method that models observed variables as linear combinations of a smaller number of latent (unobserved) factors plus noise. Unlike PCA, Factor Analysis models the noise (unique variance) for each variable separately.

Use Cases: Psychology (intelligence factors), social science surveys, identifying latent constructs behind observed variables.

from sklearn.decomposition import FactorAnalysis

model = FactorAnalysis(
    n_components=10,
    max_iter=1000,
    random_state=42
)
X_fa = model.fit_transform(X)

print(f"Transformed shape: {X_fa.shape}")
print(f"Log likelihood: {model.score(X):.4f}")
print(f"Noise variance (first 5): {model.noise_variance_[:5]}")

8. Non-negative Matrix Factorization (NMF)

Description: Decomposes a non-negative matrix V into two non-negative matrices W and H (V ≈ WH). The non-negativity constraint produces parts-based, additive representations that are often more interpretable than PCA.

Use Cases: Topic modeling (document-term matrices), face part decomposition, audio source separation, recommendation systems.

from sklearn.decomposition import NMF
from sklearn.datasets import load_digits

X_digits, _ = load_digits(return_X_y=True)

model = NMF(
    n_components=10,
    init='nndsvda',          # Non-negative Double SVD
    max_iter=500,
    random_state=42
)
W = model.fit_transform(X_digits)  # Document-topic matrix
H = model.components_              # Topic-term matrix

print(f"W shape (samples x components): {W.shape}")
print(f"H shape (components x features): {H.shape}")
print(f"Reconstruction error: {model.reconstruction_err_:.4f}")

9. Isomap

Description: Isometric Mapping. A non-linear reduction method that preserves geodesic distances (shortest paths along the manifold) rather than straight-line Euclidean distances. Builds a k-nearest-neighbor graph and uses shortest-path distances for embedding.

Use Cases: Unfolding curved manifolds (Swiss roll), when the true geometry is non-linear and distances along the surface matter.

from sklearn.manifold import Isomap

model = Isomap(
    n_neighbors=10,          # Neighbors for graph construction
    n_components=2
)
X_iso = model.fit_transform(X)

print(f"Transformed shape: {X_iso.shape}")
print(f"Reconstruction error: {model.reconstruction_error():.4f}")

10. Locally Linear Embedding (LLE)

Description: Preserves local geometry by representing each point as a linear combination of its nearest neighbors, then finding a lower-dimensional embedding that preserves these local relationships. Unfolds manifolds while maintaining local structure.

Use Cases: Manifold learning, unfolding curved surfaces, visualization of high-dimensional manifold data.

from sklearn.manifold import LocallyLinearEmbedding

model = LocallyLinearEmbedding(
    n_neighbors=10,
    n_components=2,
    method='standard',       # 'standard', 'hessian', 'modified', 'ltsa'
    random_state=42
)
X_lle = model.fit_transform(X)

print(f"Transformed shape: {X_lle.shape}")
print(f"Reconstruction error: {model.reconstruction_error_:.6f}")