Quick Reference & Tips
Your one-page reference for NumPy in ML coding interviews. Print this, bookmark it, and review it before every interview. Covers the most important functions, patterns, and common mistakes.
NumPy Cheat Sheet
Array Creation
import numpy as np
# From data
np.array([1, 2, 3]) # 1D array
np.array([[1, 2], [3, 4]]) # 2D array
# Initialized arrays
np.zeros((N, D)) # all zeros
np.ones((N, D)) # all ones
np.full((N, D), fill_value=7) # all sevens
np.eye(N) # identity matrix
np.empty((N, D)) # uninitialized (fast)
# Sequences
np.arange(start, stop, step) # like range()
np.linspace(start, stop, num) # evenly spaced
# Random
rng = np.random.default_rng(42) # modern API (preferred)
rng.standard_normal((N, D)) # standard normal
rng.uniform(low, high, (N, D)) # uniform
rng.integers(low, high, (N,)) # random integers
rng.choice(arr, size=k, replace=False) # sampling
rng.permutation(N) # random permutation
rng.shuffle(arr) # in-place shuffle
Shape Operations
# Reshape (returns view when possible)
x.reshape(3, 4) # explicit shape
x.reshape(3, -1) # infer one dimension
x.ravel() # flatten to 1D (view)
x.flatten() # flatten to 1D (copy)
# Transpose
x.T # matrix transpose
x.transpose(0, 2, 1) # permute axes
# Add/remove dimensions
x[None, :] # add axis at front: (D,) -> (1, D)
x[:, None] # add axis at back: (D,) -> (D, 1)
x[np.newaxis, :] # same as x[None, :]
x.squeeze() # remove size-1 dimensions
np.expand_dims(x, axis=0) # add axis
# Joining
np.concatenate([a, b], axis=0) # join along existing axis
np.stack([a, b], axis=0) # join along new axis
np.vstack([a, b]) # vertical stack
np.hstack([a, b]) # horizontal stack
np.column_stack([a, b]) # stack as columns
# Splitting
np.split(x, 3, axis=0) # split into 3 equal parts
np.array_split(x, 3, axis=0) # split (allows unequal)
Essential Operations
# Reductions (always specify axis!)
x.sum(axis=0) # sum per column
x.mean(axis=1) # mean per row
x.std(axis=0, ddof=1) # sample std per column
x.min(axis=1) # min per row
x.max(axis=0) # max per column
x.argmin(axis=1) # index of min per row
x.argmax(axis=0) # index of max per column
x.cumsum(axis=0) # cumulative sum
# Linear algebra
a @ b # matrix multiply (recommended)
np.dot(a, b) # dot product / matmul
np.linalg.norm(x, axis=1) # L2 norm per row
np.linalg.inv(A) # matrix inverse
np.linalg.solve(A, b) # solve Ax = b (preferred over inv)
np.linalg.eigh(A) # eigenvalues (symmetric)
np.linalg.svd(A, full_matrices=False) # SVD (compact)
np.linalg.det(A) # determinant
# Element-wise
np.maximum(x, 0) # ReLU
np.exp(x) # exponential
np.log(x) # natural log
np.clip(x, a_min, a_max) # clamp values
np.where(cond, x, y) # conditional select
np.sign(x) # sign function
# Sorting and searching
np.sort(x, axis=1) # sort per row
np.argsort(x, axis=1) # indices that would sort
np.argpartition(x, k) # O(n) partial sort
np.searchsorted(sorted_arr, values) # binary search
np.unique(x, return_counts=True) # unique values
Common ML Patterns
| Task | NumPy Pattern |
|---|---|
| Softmax | e = np.exp(z - z.max(axis=1, keepdims=True)); e / e.sum(axis=1, keepdims=True) |
| Cross-entropy | -np.log(probs[np.arange(N), labels] + 1e-12).mean() |
| ReLU | np.maximum(0, x) |
| Sigmoid | 1 / (1 + np.exp(-x)) |
| L2 normalize | x / np.linalg.norm(x, axis=1, keepdims=True) |
| Z-score | (X - X.mean(axis=0)) / X.std(axis=0) |
| Min-Max scale | (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) |
| One-hot encode | oh = np.zeros((N, C)); oh[np.arange(N), labels] = 1 |
| Euclidean distance | np.sqrt(A_sq + B_sq.T - 2 * A @ B.T) (expanded form) |
| Cosine similarity | (A / ||A||) @ (B / ||B||).T |
| Batch matmul | np.einsum('bij,bjk->bik', A, B) |
| Top-K | np.argpartition(x, -k)[-k:] |
| Moving average | cs = np.cumsum(x); cs = np.insert(cs,0,0); (cs[w:]-cs[:-w])/w |
| Causal mask | np.triu(np.ones((T, T), dtype=bool), k=1) |
Broadcasting Rules
Broadcasting is how NumPy handles operations between arrays of different shapes. The rules are simple but critical to internalize.
# BROADCASTING RULES:
# 1. Arrays are compared shape from RIGHT to LEFT
# 2. Dimensions are compatible if they are EQUAL or one is 1
# 3. Missing dimensions are treated as 1
# Examples:
# (5, 3) + (3,) -> (5, 3) + (1, 3) -> (5, 3) OK
# (5, 3) + (5, 1) -> (5, 3) OK
# (5, 3) + (5,) -> ERROR! 3 != 5
# Common patterns:
# (N, D) - (D,) -> subtract per-feature value from all samples
# (N, D) - (N, 1) -> subtract per-sample value from all features
# (N, 1) * (1, M) -> outer product: (N, M)
# (N, 1, D) - (1, M, D) -> pairwise differences: (N, M, D)
# Fix shape mismatches:
# If you have shape (N,) and need (N, 1):
x = x[:, None] # or x[:, np.newaxis] or x.reshape(-1, 1)
# If you have shape (D,) and need (1, D):
x = x[None, :] # or x[np.newaxis, :] or x.reshape(1, -1)
Performance Tips
1. Avoid Python Loops
Slow: for i in range(N): result[i] = f(X[i]). Fast: result = f(X). NumPy operations run in optimized C code. Every loop you eliminate can give 10-100x speedup.
2. Use Expanded Distance Form
Slow: np.linalg.norm(A[:, None] - B[None, :], axis=2) creates (M, N, D) array. Fast: ||a||^2 + ||b||^2 - 2*a.b uses matrix multiply.
3. Prefer Views Over Copies
Slicing, reshape, and transpose return views (shared memory). Fancy indexing and boolean indexing return copies. Use np.shares_memory(a, b) to check.
4. Use argpartition for Top-K
np.argpartition(x, k) is O(n) for finding the k smallest/largest elements. np.argsort is O(n log n). For k << n, partition is much faster.
5. Use eigh for Symmetric Matrices
np.linalg.eigh is 2x faster than np.linalg.eig for symmetric/Hermitian matrices (like covariance matrices) and guarantees real eigenvalues.
6. Use solve Instead of inv
np.linalg.solve(A, b) is faster and more numerically stable than np.linalg.inv(A) @ b. Never compute the explicit inverse unless you need it for multiple right-hand sides.
Common Mistakes to Avoid
1. Forgetting keepdims
Bug: X - X.mean(axis=1) fails because (N, D) - (N,) does not broadcast as intended. Fix: X - X.mean(axis=1, keepdims=True) gives (N, D) - (N, 1).
2. Unstable Softmax
Bug: np.exp(x) / np.exp(x).sum() overflows for x > 709. Fix: Always subtract the max first: np.exp(x - x.max()).
3. Wrong Axis
Bug: Using axis=0 when you mean axis=1. Remember: for (samples, features), axis=0 is per-feature, axis=1 is per-sample.
4. Modifying Views
Bug: row = X[0]; row[0] = 99 also modifies X because row is a view. Fix: Use .copy() if you need to modify independently.
5. Division by Zero
Bug: X / X.std(axis=0) crashes if any feature has zero variance. Fix: std = X.std(axis=0); std[std == 0] = 1.0.
6. Using int When You Need float
Bug: np.array([1, 2, 3]) / 2 gives [0.5, 1.0, 1.5] in Python 3, but np.array([1, 2, 3]) // 2 gives [0, 1, 1]. Be explicit about dtype.
Frequently Asked Questions
np.dot is the oldest and most general: for 1D arrays it computes dot product, for 2D arrays it computes matrix multiply, but for higher dimensions it follows unusual contraction rules. np.matmul (and the @ operator) follow standard linear algebra conventions: for 2D it is matrix multiply, and for higher dimensions it treats the array as a batch of matrices. Best practice: Use @ for matrix multiplication, np.dot(a, b) only for 1D dot products, and np.einsum for anything more complex. In interviews, @ is the clearest notation.
Use np.einsum when the operation involves multiple contractions or would require multiple separate NumPy calls. Common examples: batch matrix multiply ('bij,bjk->bik'), bilinear forms ('i,ij,j->'), attention scores ('bhid,bhjd->bhij'). For simple matrix multiply, @ is clearer. For simple element-wise operations, use standard operators. In interviews, using einsum for the right problems shows deep NumPy fluency, but using it for simple operations looks like over-engineering. Performance-wise, einsum is competitive with explicit operations and sometimes faster because it can optimize the computation order.
NumPy propagates NaN through operations: np.mean([1, np.nan, 3]) returns nan. Solutions: (1) Use nan-safe functions: np.nanmean, np.nanstd, np.nansum, np.nanmax, np.nanmin. (2) Filter NaNs: x[~np.isnan(x)]. (3) Replace NaNs: np.nan_to_num(x, nan=0.0) or x[np.isnan(x)] = 0. (4) For ML data, use boolean masking: valid = ~np.isnan(X).any(axis=1); X_clean = X[valid]. In production ML pipelines, always check for NaN before computing loss or gradients — a single NaN will poison the entire training run.
The np.random.randn / np.random.seed API uses a global random state, which makes code harder to reason about and not thread-safe. The modern API np.random.default_rng(seed) creates an independent generator with its own state. Always use the modern API in production and interviews: rng = np.random.default_rng(42); samples = rng.standard_normal((N, D)). This shows you are aware of best practices. The old API still works but is considered legacy.
Three main reasons: (1) C implementation: NumPy operations are implemented in optimized C code, not interpreted Python. A for loop in Python has interpreter overhead per iteration (bytecode dispatch, type checking, reference counting). (2) BLAS/LAPACK: Linear algebra operations (@, solve, svd) call BLAS/LAPACK libraries (OpenBLAS, MKL) that use SIMD instructions (AVX, SSE) and multi-threading. A single A @ B can use all CPU cores. (3) Memory locality: NumPy stores data in contiguous memory blocks, which plays well with CPU caches. Python lists store pointers to scattered objects. The actual speedup depends on the operation: simple element-wise ops get ~10-50x, matrix multiply gets ~100-1000x (due to BLAS).
In interviews: (1) If the problem says "implement using NumPy" or "no deep learning frameworks," use NumPy. (2) If the problem involves training a model with autograd, use PyTorch. (3) If the problem is about data preprocessing or feature engineering, NumPy is typically expected. In practice: NumPy is for CPU-only computation, data preprocessing, and non-differentiable operations. PyTorch is for GPU computation, automatic differentiation, and anything that needs gradients. The APIs are nearly identical: np.array becomes torch.tensor, np.sum becomes torch.sum, @ works the same way. If you know NumPy well, PyTorch is a 30-minute transition.
Based on publicly shared interview experiences: (1) Softmax implementation (test numerical stability), (2) Cross-entropy loss (test fancy indexing + log-sum-exp), (3) Pairwise distance matrix (test broadcasting + memory awareness), (4) Batch normalization (test axis awareness + running stats), (5) KNN from scratch (test complete implementation), (6) Linear regression normal equation (test matrix operations + regularization), (7) PCA from scratch (test eigendecomposition), (8) "Vectorize this loop" problems (test core NumPy fluency). The common thread: every problem has both a loop-based and a vectorized solution, and you are expected to write the vectorized one.
Course Summary
- Lesson 1: NumPy fluency is the most tested skill in ML interviews — think in arrays, not loops
- Lesson 2: Reshape, broadcast, fancy index, boolean mask, stack, and split — the fundamental operations
- Lesson 3: Dot product, matmul, inverse, determinant, eigenvalues, SVD — linear algebra for ML
- Lesson 4: Statistics along axes, percentiles, correlation, normalization, z-scores, covariance
- Lesson 5: Softmax, cross-entropy, gradient descent, batch norm, cosine similarity — the real interview questions
- Lesson 6: Euclidean, Manhattan, cosine distances, pairwise matrices, KNN prediction
- Lesson 7: Loop elimination, einsum, memory efficiency, broadcasting tricks, advanced indexing
- Lesson 8 (this): Cheat sheet, patterns, performance tips, and FAQ for quick review
keepdims=True to avoid broadcasting bugs, and (5) test with edge cases (empty arrays, single elements, all zeros, very large values). This shows the systematic thinking that top companies value.
Lilly Tech Systems