Best Practices Intermediate
Applying calculus correctly in ML requires practical knowledge beyond the theory. This lesson covers gradient checking, numerical stability, leveraging autograd frameworks, and the most common mistakes practitioners make.
Gradient Checking
Always verify your analytical gradients against numerical approximations when implementing custom layers:
import numpy as np def gradient_check(f, grad_f, x, eps=1e-5): """Compare analytical gradient with numerical approximation.""" analytical = grad_f(x) numerical = np.zeros_like(x) for i in range(len(x)): x_plus = x.copy(); x_plus[i] += eps x_minus = x.copy(); x_minus[i] -= eps numerical[i] = (f(x_plus) - f(x_minus)) / (2 * eps) diff = np.linalg.norm(analytical - numerical) diff /= np.linalg.norm(analytical) + np.linalg.norm(numerical) + 1e-8 return diff < 1e-5, diff # Should be True
Practical Tips
-
Use autograd frameworks
PyTorch, JAX, and TensorFlow compute gradients automatically. Only implement manual gradients for custom operations or learning purposes.
-
Monitor gradient norms
Track the L2 norm of gradients during training. Exploding (>1000) or vanishing (<1e-7) gradients indicate architecture or hyperparameter issues.
-
Use gradient clipping
Clip gradient norms to a maximum value (typically 1.0 or 5.0) to prevent exploding gradients, especially in RNNs.
-
Choose activations wisely
ReLU avoids vanishing gradients for positive inputs. Sigmoid and tanh can cause vanishing gradients in deep networks.
Common Pitfalls
- Forgetting to zero gradients: In PyTorch, call
optimizer.zero_grad()before each backward pass. - In-place operations: Modifying tensors in-place can break the computational graph.
- Detaching when you should not: Using
.detach()or.item()stops gradient flow. - Wrong loss reduction: Make sure your loss function uses the correct mean/sum reduction.
- NaN gradients: Often caused by log(0), division by zero, or overflow. Add epsilon values to denominators.
Course Complete!
You have completed the Calculus for ML course. Continue with Probability for AI to round out your mathematical foundation.
Next Course: Probability for AI →