Matrix Operations
Six fundamental matrix operations implemented from scratch in Python. These are the building blocks of all linear algebra — every subsequent lesson in this course depends on them. No NumPy allowed. You must understand what happens inside these operations at the loop level.
Problem 1: Matrix Multiplication
Problem: Given two matrices A (m x n) and B (n x p), compute their product C (m x p) where C[i][j] = sum of A[i][k] * B[k][j] for k = 0 to n-1.
Why it matters: Matrix multiplication is the single most important operation in ML. Neural network forward passes, attention mechanisms, and linear transformations are all matrix multiplications.
def matrix_multiply(A, B):
"""
Multiply matrices A (m x n) and B (n x p) -> C (m x p).
Time: O(m * n * p), Space: O(m * p)
"""
m, n = len(A), len(A[0])
n2, p = len(B), len(B[0])
if n != n2:
raise ValueError(f"Incompatible dimensions: {m}x{n} and {n2}x{p}")
C = [[0.0] * p for _ in range(m)]
for i in range(m):
for k in range(n): # loop order ikj is cache-friendly
if A[i][k] == 0: # skip zero elements (sparse speedup)
continue
for j in range(p):
C[i][j] += A[i][k] * B[k][j]
return C
# Test
A = [[1, 2, 3], [4, 5, 6]] # 2x3
B = [[7, 8], [9, 10], [11, 12]] # 3x2
result = matrix_multiply(A, B)
# [[58, 64], [139, 154]] # 2x2
# Verify: 1*7 + 2*9 + 3*11 = 7 + 18 + 33 = 58 ✓
Problem 2: Matrix Transpose
Problem: Given matrix A (m x n), compute A^T (n x m) where A^T[j][i] = A[i][j].
def transpose(A):
"""
Transpose matrix A (m x n) -> A^T (n x m).
Time: O(m * n), Space: O(m * n)
"""
m, n = len(A), len(A[0])
return [[A[i][j] for i in range(m)] for j in range(n)]
# In-place transpose for square matrices (saves memory)
def transpose_inplace(A):
"""
Transpose square matrix A in place.
Time: O(n^2), Space: O(1)
"""
n = len(A)
for i in range(n):
for j in range(i + 1, n):
A[i][j], A[j][i] = A[j][i], A[i][j]
return A
# Test
A = [[1, 2, 3], [4, 5, 6]]
print(transpose(A))
# [[1, 4], [2, 5], [3, 6]]
# Properties: (A^T)^T = A, (AB)^T = B^T * A^T
Problem 3: Matrix Inverse (Gauss-Jordan Elimination)
Problem: Given a square matrix A (n x n), compute A^(-1) such that A * A^(-1) = I (identity matrix). Use Gauss-Jordan elimination.
Why it matters: Matrix inversion appears in solving linear systems, computing the normal equation for linear regression (w = (X^T X)^(-1) X^T y), and Kalman filters.
def matrix_inverse(A):
"""
Compute inverse of square matrix A using Gauss-Jordan elimination.
Time: O(n^3), Space: O(n^2)
"""
n = len(A)
# Create augmented matrix [A | I]
aug = [row[:] + [1.0 if i == j else 0.0 for j in range(n)]
for i, row in enumerate(A)]
for col in range(n):
# Find pivot: row with largest absolute value in this column
max_row = col
for row in range(col + 1, n):
if abs(aug[row][col]) > abs(aug[max_row][col]):
max_row = row
if abs(aug[max_row][col]) < 1e-12:
raise ValueError("Matrix is singular (not invertible)")
# Swap rows
aug[col], aug[max_row] = aug[max_row], aug[col]
# Scale pivot row so pivot element becomes 1
pivot = aug[col][col]
for j in range(2 * n):
aug[col][j] /= pivot
# Eliminate all other rows in this column
for row in range(n):
if row == col:
continue
factor = aug[row][col]
for j in range(2 * n):
aug[row][j] -= factor * aug[col][j]
# Extract inverse from right half of augmented matrix
inverse = [row[n:] for row in aug]
return inverse
# Test
A = [[2, 1], [5, 3]]
A_inv = matrix_inverse(A)
# [[3.0, -1.0], [-5.0, 2.0]]
# Verify: A * A_inv should be identity
product = matrix_multiply(A, A_inv)
# [[1.0, 0.0], [0.0, 1.0]] ✓
Problem 4: Matrix Rank (Row Echelon Form)
Problem: Compute the rank of matrix A — the number of linearly independent rows (or equivalently, columns). Use Gaussian elimination to convert to row echelon form and count non-zero rows.
def matrix_rank(A):
"""
Compute rank of matrix A via Gaussian elimination.
Time: O(m * n * min(m,n)), Space: O(m * n)
"""
m, n = len(A), len(A[0])
# Work on a copy to avoid modifying the original
M = [row[:] for row in A]
EPS = 1e-10
rank = 0
for col in range(n):
# Find pivot row
pivot_row = None
for row in range(rank, m):
if abs(M[row][col]) > EPS:
pivot_row = row
break
if pivot_row is None:
continue # no pivot in this column
# Swap pivot row to current rank position
M[rank], M[pivot_row] = M[pivot_row], M[rank]
# Eliminate below
pivot_val = M[rank][col]
for row in range(rank + 1, m):
if abs(M[row][col]) > EPS:
factor = M[row][col] / pivot_val
for j in range(col, n):
M[row][j] -= factor * M[rank][j]
rank += 1
return rank
# Test
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix_rank(A)) # 2 (third row is sum of first two minus one)
B = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
print(matrix_rank(B)) # 3 (identity matrix has full rank)
Problem 5: Matrix Trace
Problem: Compute the trace of a square matrix — the sum of its diagonal elements. The trace equals the sum of eigenvalues and is invariant under cyclic permutations: tr(ABC) = tr(CAB) = tr(BCA).
def trace(A):
"""
Compute trace of square matrix A.
Time: O(n), Space: O(1)
"""
n = len(A)
if n != len(A[0]):
raise ValueError("Trace is only defined for square matrices")
return sum(A[i][i] for i in range(n))
# Test
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(trace(A)) # 1 + 5 + 9 = 15
# Properties:
# tr(A + B) = tr(A) + tr(B)
# tr(cA) = c * tr(A)
# tr(A^T) = tr(A)
# tr(AB) = tr(BA) (cyclic property)
# tr(A) = sum of eigenvalues of A
Problem 6: Hadamard Product (Element-wise Multiplication)
Problem: Given two matrices A and B of the same dimensions, compute their Hadamard product C where C[i][j] = A[i][j] * B[i][j]. This is NOT matrix multiplication — it is element-wise multiplication.
Why it matters: The Hadamard product appears in attention mechanisms (masking), LSTM/GRU gates (element-wise gating), and batch normalization.
def hadamard_product(A, B):
"""
Element-wise multiplication of matrices A and B.
Both must have the same dimensions (m x n).
Time: O(m * n), Space: O(m * n)
"""
m, n = len(A), len(A[0])
if len(B) != m or len(B[0]) != n:
raise ValueError("Matrices must have same dimensions for Hadamard product")
return [[A[i][j] * B[i][j] for j in range(n)] for i in range(m)]
# Test
A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]
print(hadamard_product(A, B))
# [[5, 12], [21, 32]]
# Attention mask example:
scores = [[0.8, 0.5, 0.3], [0.2, 0.9, 0.1]]
mask = [[1, 1, 0 ], [1, 1, 1 ]] # mask out padding
masked = hadamard_product(scores, mask)
# [[0.8, 0.5, 0.0], [0.2, 0.9, 0.1]]
Putting It All Together: A Linear Regression Solver
Here is how these operations combine to solve the normal equation for linear regression: w = (X^T X)^(-1) X^T y.
def linear_regression_normal_equation(X, y):
"""
Solve linear regression using the normal equation.
X: m x n feature matrix (m samples, n features)
y: m x 1 target vector (as column matrix)
Returns: n x 1 weight vector
"""
Xt = transpose(X) # n x m
XtX = matrix_multiply(Xt, X) # n x n
XtX_inv = matrix_inverse(XtX) # n x n
Xty = matrix_multiply(Xt, y) # n x 1
w = matrix_multiply(XtX_inv, Xty) # n x 1
return w
# Example: y = 2x + 1
# X has bias column (ones) and feature column
X = [[1, 1], [1, 2], [1, 3], [1, 4]] # 4 samples, 2 features
y = [[3], [5], [7], [9]] # y = 2x + 1
w = linear_regression_normal_equation(X, y)
print(w) # [[1.0], [2.0]] -> intercept=1, slope=2 ✓
Key Takeaways
- Matrix multiplication is O(n^3) with the naive algorithm. Loop order ikj is more cache-friendly than ijk.
- Matrix inverse uses Gauss-Jordan elimination with partial pivoting for numerical stability. Always check for singular matrices.
- Matrix rank is computed via Gaussian elimination — count the non-zero rows in row echelon form.
- Trace is the simplest operation (O(n)) but has deep mathematical significance: it equals the sum of eigenvalues.
- Hadamard product is element-wise multiplication, not matrix multiplication. It appears in attention and gating mechanisms.
- These six operations are the building blocks for everything in this course.
Lilly Tech Systems