Best Practices Intermediate

Applying calculus correctly in ML requires practical knowledge beyond the theory. This lesson covers gradient checking, numerical stability, leveraging autograd frameworks, and the most common mistakes practitioners make.

Gradient Checking

Always verify your analytical gradients against numerical approximations when implementing custom layers:

Python
import numpy as np

def gradient_check(f, grad_f, x, eps=1e-5):
    """Compare analytical gradient with numerical approximation."""
    analytical = grad_f(x)
    numerical = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy(); x_plus[i] += eps
        x_minus = x.copy(); x_minus[i] -= eps
        numerical[i] = (f(x_plus) - f(x_minus)) / (2 * eps)

    diff = np.linalg.norm(analytical - numerical)
    diff /= np.linalg.norm(analytical) + np.linalg.norm(numerical) + 1e-8
    return diff < 1e-5, diff  # Should be True

Practical Tips

  1. Use autograd frameworks

    PyTorch, JAX, and TensorFlow compute gradients automatically. Only implement manual gradients for custom operations or learning purposes.

  2. Monitor gradient norms

    Track the L2 norm of gradients during training. Exploding (>1000) or vanishing (<1e-7) gradients indicate architecture or hyperparameter issues.

  3. Use gradient clipping

    Clip gradient norms to a maximum value (typically 1.0 or 5.0) to prevent exploding gradients, especially in RNNs.

  4. Choose activations wisely

    ReLU avoids vanishing gradients for positive inputs. Sigmoid and tanh can cause vanishing gradients in deep networks.

Common Pitfalls

Watch Out For:
  • Forgetting to zero gradients: In PyTorch, call optimizer.zero_grad() before each backward pass.
  • In-place operations: Modifying tensors in-place can break the computational graph.
  • Detaching when you should not: Using .detach() or .item() stops gradient flow.
  • Wrong loss reduction: Make sure your loss function uses the correct mean/sum reduction.
  • NaN gradients: Often caused by log(0), division by zero, or overflow. Add epsilon values to denominators.

Course Complete!

You have completed the Calculus for ML course. Continue with Probability for AI to round out your mathematical foundation.

Next Course: Probability for AI →