Advanced

Distribution Shift

Learn to detect when production data diverges from training data, understand the types of distribution shift, and implement monitoring systems that alert you before model performance degrades.

Types of Distribution Shift

Distribution shift occurs when the data your model encounters in production differs from its training data. Understanding the specific type of shift is crucial for choosing the right mitigation strategy.

📈

Covariate Shift

Input distribution P(X) changes but the relationship P(Y|X) stays the same. Example: a model trained on professional photos encounters smartphone images.

🔄

Concept Drift

The relationship P(Y|X) changes over time. Example: what counts as spam evolves as spammers adapt their techniques.

🚫

Prior Probability Shift

The class distribution P(Y) changes. Example: seasonal changes cause different proportions of product categories in sales data.

Out-of-Distribution Detection

OOD detection identifies inputs that are significantly different from the training distribution. These inputs are likely to produce unreliable predictions and should be flagged or rejected.

Python - OOD Detection with Energy Score
import torch
import torch.nn.functional as F

def energy_score(model, inputs, temperature=1.0):
    """Compute energy score for OOD detection.
    Lower energy = more likely in-distribution.
    Higher energy = more likely out-of-distribution."""
    with torch.no_grad():
        logits = model(inputs)
        energy = -temperature * torch.logsumexp(
            logits / temperature, dim=1
        )
    return energy

def detect_ood(model, inputs, threshold=-5.0):
    """Flag inputs as OOD if energy exceeds threshold."""
    energies = energy_score(model, inputs)
    is_ood = energies > threshold
    return is_ood, energies

Detecting Covariate Shift

Statistical tests and monitoring tools can detect when input distributions change:

MethodHow It WorksBest For
KS TestCompares CDFs of reference and production feature distributionsUnivariate numerical features
MMD (Maximum Mean Discrepancy)Measures distance between distributions in a kernel spaceHigh-dimensional data, embeddings
Population Stability IndexQuantifies shift in binned distributionsTabular data, credit scoring
Domain ClassifierTrains a classifier to distinguish training from production dataComplex multivariate shifts

Domain Adaptation Strategies

When distribution shift is detected, several strategies can help maintain model performance:

  1. Data Augmentation

    Augment training data to cover a wider range of variations. Use transformations that simulate the types of shift you expect in production.

  2. Importance Weighting

    Re-weight training samples to match the production distribution. Samples similar to production data receive higher weights during training.

  3. Domain-Adversarial Training

    Train the model to learn domain-invariant features by adding an adversarial domain classifier that the feature extractor learns to fool.

  4. Continual Learning

    Periodically retrain or fine-tune the model on recent production data while preserving performance on older data.

Production tip: Set up automated drift monitoring from day one. Tools like Evidently AI, WhyLabs, and NannyML can continuously track feature distributions and model performance, alerting you before degradation becomes critical.