Best Practices Intermediate
Working with probabilities in code requires careful attention to numerical precision. Probabilities are often very small numbers that can underflow to zero, and naive implementations can produce incorrect results. This lesson covers the practical wisdom for robust probabilistic ML.
Work in Log-Space
import numpy as np # BAD: multiplying small probabilities probs = [1e-100, 1e-200, 1e-150] product = np.prod(probs) # 0.0 (underflow!) # GOOD: add log-probabilities log_probs = np.log(probs) log_product = np.sum(log_probs) # -1035.0 (correct!) # LogSumExp trick for stable softmax def log_sum_exp(x): c = np.max(x) return c + np.log(np.sum(np.exp(x - c))) def stable_softmax(x): x_shifted = x - np.max(x) # Prevent overflow exp_x = np.exp(x_shifted) return exp_x / np.sum(exp_x)
Practical Tips
-
Add epsilon to prevent log(0)
Always use
np.log(p + 1e-8)instead ofnp.log(p)to avoid -inf values. -
Use framework-provided functions
PyTorch's
F.cross_entropyandF.log_softmaxare numerically stable. Do not implement your own unless necessary. -
Set random seeds for reproducibility
Use
np.random.seed(42)andtorch.manual_seed(42)for reproducible experiments. -
Validate probability distributions
Check that probabilities sum to 1 (discrete) or integrate to 1 (continuous). Off-by-one errors in normalization are common.
Common Mistakes
- Confusing PDF with probability: For continuous distributions, P(X = x) = 0. The PDF gives density, not probability.
- Ignoring class imbalance: Prior probabilities matter. A classifier can achieve 99% accuracy by always predicting the majority class.
- Assuming independence: Naive Bayes assumes feature independence, which rarely holds. Know when this assumption helps vs hurts.
- Overfitting with MLE: MLE with small data overfits. Use MAP or regularization.
- Forgetting base rates: Always consider prior probabilities when interpreting classifier outputs.
Useful Libraries
| Library | Use Case |
|---|---|
| scipy.stats | Distribution functions, statistical tests |
| PyTorch Distributions | Differentiable probability distributions for deep learning |
| Pyro / NumPyro | Probabilistic programming, Bayesian inference |
| scikit-learn | Naive Bayes, GMMs, density estimation |
Course Complete!
You have completed the Probability for AI course. Continue with Optimization for ML to learn how these mathematical foundations come together in training algorithms.
Next Course: Optimization for ML →
Lilly Tech Systems