Best Practices Intermediate

Working with probabilities in code requires careful attention to numerical precision. Probabilities are often very small numbers that can underflow to zero, and naive implementations can produce incorrect results. This lesson covers the practical wisdom for robust probabilistic ML.

Work in Log-Space

Golden Rule: Always work with log-probabilities instead of raw probabilities. Multiplying tiny probabilities causes underflow (numbers too small for floating point). Adding log-probabilities is both numerically stable and computationally efficient.
Python
import numpy as np

# BAD: multiplying small probabilities
probs = [1e-100, 1e-200, 1e-150]
product = np.prod(probs)  # 0.0 (underflow!)

# GOOD: add log-probabilities
log_probs = np.log(probs)
log_product = np.sum(log_probs)  # -1035.0 (correct!)

# LogSumExp trick for stable softmax
def log_sum_exp(x):
    c = np.max(x)
    return c + np.log(np.sum(np.exp(x - c)))

def stable_softmax(x):
    x_shifted = x - np.max(x)  # Prevent overflow
    exp_x = np.exp(x_shifted)
    return exp_x / np.sum(exp_x)

Practical Tips

  1. Add epsilon to prevent log(0)

    Always use np.log(p + 1e-8) instead of np.log(p) to avoid -inf values.

  2. Use framework-provided functions

    PyTorch's F.cross_entropy and F.log_softmax are numerically stable. Do not implement your own unless necessary.

  3. Set random seeds for reproducibility

    Use np.random.seed(42) and torch.manual_seed(42) for reproducible experiments.

  4. Validate probability distributions

    Check that probabilities sum to 1 (discrete) or integrate to 1 (continuous). Off-by-one errors in normalization are common.

Common Mistakes

Watch Out For:
  • Confusing PDF with probability: For continuous distributions, P(X = x) = 0. The PDF gives density, not probability.
  • Ignoring class imbalance: Prior probabilities matter. A classifier can achieve 99% accuracy by always predicting the majority class.
  • Assuming independence: Naive Bayes assumes feature independence, which rarely holds. Know when this assumption helps vs hurts.
  • Overfitting with MLE: MLE with small data overfits. Use MAP or regularization.
  • Forgetting base rates: Always consider prior probabilities when interpreting classifier outputs.

Useful Libraries

Library Use Case
scipy.stats Distribution functions, statistical tests
PyTorch Distributions Differentiable probability distributions for deep learning
Pyro / NumPyro Probabilistic programming, Bayesian inference
scikit-learn Naive Bayes, GMMs, density estimation

Course Complete!

You have completed the Probability for AI course. Continue with Optimization for ML to learn how these mathematical foundations come together in training algorithms.

Next Course: Optimization for ML →