Best Practices Intermediate

Working with probabilities in code requires careful attention to numerical precision. Probabilities are often very small numbers that can underflow to zero, and naive implementations can produce incorrect results. This lesson covers the practical wisdom for robust probabilistic ML.

Work in Log-Space

Golden Rule: Always work with log-probabilities instead of raw probabilities. Multiplying tiny probabilities causes underflow (numbers too small for floating point). Adding log-probabilities is both numerically stable and computationally efficient.

Python

import numpy as np

# BAD: multiplying small probabilities
probs = [1e-100, 1e-200, 1e-150]
product = np.prod(probs)  # 0.0 (underflow!)

# GOOD: add log-probabilities
log_probs = np.log(probs)
log_product = np.sum(log_probs)  # -1035.0 (correct!)

# LogSumExp trick for stable softmax
def log_sum_exp(x):
    c = np.max(x)
    return c + np.log(np.sum(np.exp(x - c)))

def stable_softmax(x):
    x_shifted = x - np.max(x)  # Prevent overflow
    exp_x = np.exp(x_shifted)
    return exp_x / np.sum(exp_x)

Practical Tips

Add epsilon to prevent log(0)
Always use np.log(p + 1e-8) instead of np.log(p) to avoid -inf values.
Use framework-provided functions
PyTorch's F.cross_entropy and F.log_softmax are numerically stable. Do not implement your own unless necessary.
Set random seeds for reproducibility
Use np.random.seed(42) and torch.manual_seed(42) for reproducible experiments.
Validate probability distributions
Check that probabilities sum to 1 (discrete) or integrate to 1 (continuous). Off-by-one errors in normalization are common.

Common Mistakes

Watch Out For:

Confusing PDF with probability: For continuous distributions, P(X = x) = 0. The PDF gives density, not probability.
Ignoring class imbalance: Prior probabilities matter. A classifier can achieve 99% accuracy by always predicting the majority class.
Assuming independence: Naive Bayes assumes feature independence, which rarely holds. Know when this assumption helps vs hurts.
Overfitting with MLE: MLE with small data overfits. Use MAP or regularization.
Forgetting base rates: Always consider prior probabilities when interpreting classifier outputs.

Useful Libraries

Library	Use Case
scipy.stats	Distribution functions, statistical tests
PyTorch Distributions	Differentiable probability distributions for deep learning
Pyro / NumPyro	Probabilistic programming, Bayesian inference
scikit-learn	Naive Bayes, GMMs, density estimation

Course Complete!

You have completed the Probability for AI course. Continue with Optimization for ML to learn how these mathematical foundations come together in training algorithms.

Next Course: Optimization for ML →

← MLE / MAP Course Overview →