Intermediate

Matrix Operations

Six fundamental matrix operations implemented from scratch in Python. These are the building blocks of all linear algebra — every subsequent lesson in this course depends on them. No NumPy allowed. You must understand what happens inside these operations at the loop level.

Problem 1: Matrix Multiplication

Problem: Given two matrices A (m x n) and B (n x p), compute their product C (m x p) where C[i][j] = sum of A[i][k] * B[k][j] for k = 0 to n-1.

Why it matters: Matrix multiplication is the single most important operation in ML. Neural network forward passes, attention mechanisms, and linear transformations are all matrix multiplications.

def matrix_multiply(A, B):
    """
    Multiply matrices A (m x n) and B (n x p) -> C (m x p).
    Time: O(m * n * p), Space: O(m * p)
    """
    m, n = len(A), len(A[0])
    n2, p = len(B), len(B[0])

    if n != n2:
        raise ValueError(f"Incompatible dimensions: {m}x{n} and {n2}x{p}")

    C = [[0.0] * p for _ in range(m)]

    for i in range(m):
        for k in range(n):         # loop order ikj is cache-friendly
            if A[i][k] == 0:       # skip zero elements (sparse speedup)
                continue
            for j in range(p):
                C[i][j] += A[i][k] * B[k][j]

    return C

# Test
A = [[1, 2, 3], [4, 5, 6]]        # 2x3
B = [[7, 8], [9, 10], [11, 12]]   # 3x2
result = matrix_multiply(A, B)
# [[58, 64], [139, 154]]           # 2x2

# Verify: 1*7 + 2*9 + 3*11 = 7 + 18 + 33 = 58 ✓
💡
Interview tip: The loop order matters. The ikj order (shown above) is more cache-friendly than ijk because it accesses B row-by-row rather than column-by-column. Mentioning this in an interview demonstrates systems-level thinking.

Problem 2: Matrix Transpose

Problem: Given matrix A (m x n), compute A^T (n x m) where A^T[j][i] = A[i][j].

def transpose(A):
    """
    Transpose matrix A (m x n) -> A^T (n x m).
    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(A), len(A[0])
    return [[A[i][j] for i in range(m)] for j in range(n)]

# In-place transpose for square matrices (saves memory)
def transpose_inplace(A):
    """
    Transpose square matrix A in place.
    Time: O(n^2), Space: O(1)
    """
    n = len(A)
    for i in range(n):
        for j in range(i + 1, n):
            A[i][j], A[j][i] = A[j][i], A[i][j]
    return A

# Test
A = [[1, 2, 3], [4, 5, 6]]
print(transpose(A))
# [[1, 4], [2, 5], [3, 6]]

# Properties: (A^T)^T = A, (AB)^T = B^T * A^T

Problem 3: Matrix Inverse (Gauss-Jordan Elimination)

Problem: Given a square matrix A (n x n), compute A^(-1) such that A * A^(-1) = I (identity matrix). Use Gauss-Jordan elimination.

Why it matters: Matrix inversion appears in solving linear systems, computing the normal equation for linear regression (w = (X^T X)^(-1) X^T y), and Kalman filters.

def matrix_inverse(A):
    """
    Compute inverse of square matrix A using Gauss-Jordan elimination.
    Time: O(n^3), Space: O(n^2)
    """
    n = len(A)

    # Create augmented matrix [A | I]
    aug = [row[:] + [1.0 if i == j else 0.0 for j in range(n)]
           for i, row in enumerate(A)]

    for col in range(n):
        # Find pivot: row with largest absolute value in this column
        max_row = col
        for row in range(col + 1, n):
            if abs(aug[row][col]) > abs(aug[max_row][col]):
                max_row = row

        if abs(aug[max_row][col]) < 1e-12:
            raise ValueError("Matrix is singular (not invertible)")

        # Swap rows
        aug[col], aug[max_row] = aug[max_row], aug[col]

        # Scale pivot row so pivot element becomes 1
        pivot = aug[col][col]
        for j in range(2 * n):
            aug[col][j] /= pivot

        # Eliminate all other rows in this column
        for row in range(n):
            if row == col:
                continue
            factor = aug[row][col]
            for j in range(2 * n):
                aug[row][j] -= factor * aug[col][j]

    # Extract inverse from right half of augmented matrix
    inverse = [row[n:] for row in aug]
    return inverse

# Test
A = [[2, 1], [5, 3]]
A_inv = matrix_inverse(A)
# [[3.0, -1.0], [-5.0, 2.0]]

# Verify: A * A_inv should be identity
product = matrix_multiply(A, A_inv)
# [[1.0, 0.0], [0.0, 1.0]] ✓
📝
Partial pivoting: The pivot selection step (finding the row with the largest absolute value) is critical for numerical stability. Without it, small pivot values cause catastrophic cancellation errors. Always mention this in interviews.

Problem 4: Matrix Rank (Row Echelon Form)

Problem: Compute the rank of matrix A — the number of linearly independent rows (or equivalently, columns). Use Gaussian elimination to convert to row echelon form and count non-zero rows.

def matrix_rank(A):
    """
    Compute rank of matrix A via Gaussian elimination.
    Time: O(m * n * min(m,n)), Space: O(m * n)
    """
    m, n = len(A), len(A[0])
    # Work on a copy to avoid modifying the original
    M = [row[:] for row in A]
    EPS = 1e-10

    rank = 0
    for col in range(n):
        # Find pivot row
        pivot_row = None
        for row in range(rank, m):
            if abs(M[row][col]) > EPS:
                pivot_row = row
                break

        if pivot_row is None:
            continue  # no pivot in this column

        # Swap pivot row to current rank position
        M[rank], M[pivot_row] = M[pivot_row], M[rank]

        # Eliminate below
        pivot_val = M[rank][col]
        for row in range(rank + 1, m):
            if abs(M[row][col]) > EPS:
                factor = M[row][col] / pivot_val
                for j in range(col, n):
                    M[row][j] -= factor * M[rank][j]

        rank += 1

    return rank

# Test
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(matrix_rank(A))  # 2 (third row is sum of first two minus one)

B = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
print(matrix_rank(B))  # 3 (identity matrix has full rank)

Problem 5: Matrix Trace

Problem: Compute the trace of a square matrix — the sum of its diagonal elements. The trace equals the sum of eigenvalues and is invariant under cyclic permutations: tr(ABC) = tr(CAB) = tr(BCA).

def trace(A):
    """
    Compute trace of square matrix A.
    Time: O(n), Space: O(1)
    """
    n = len(A)
    if n != len(A[0]):
        raise ValueError("Trace is only defined for square matrices")
    return sum(A[i][i] for i in range(n))

# Test
A = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(trace(A))  # 1 + 5 + 9 = 15

# Properties:
# tr(A + B) = tr(A) + tr(B)
# tr(cA) = c * tr(A)
# tr(A^T) = tr(A)
# tr(AB) = tr(BA)  (cyclic property)
# tr(A) = sum of eigenvalues of A

Problem 6: Hadamard Product (Element-wise Multiplication)

Problem: Given two matrices A and B of the same dimensions, compute their Hadamard product C where C[i][j] = A[i][j] * B[i][j]. This is NOT matrix multiplication — it is element-wise multiplication.

Why it matters: The Hadamard product appears in attention mechanisms (masking), LSTM/GRU gates (element-wise gating), and batch normalization.

def hadamard_product(A, B):
    """
    Element-wise multiplication of matrices A and B.
    Both must have the same dimensions (m x n).
    Time: O(m * n), Space: O(m * n)
    """
    m, n = len(A), len(A[0])
    if len(B) != m or len(B[0]) != n:
        raise ValueError("Matrices must have same dimensions for Hadamard product")

    return [[A[i][j] * B[i][j] for j in range(n)] for i in range(m)]

# Test
A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]
print(hadamard_product(A, B))
# [[5, 12], [21, 32]]

# Attention mask example:
scores = [[0.8, 0.5, 0.3], [0.2, 0.9, 0.1]]
mask   = [[1,   1,   0  ], [1,   1,   1  ]]  # mask out padding
masked = hadamard_product(scores, mask)
# [[0.8, 0.5, 0.0], [0.2, 0.9, 0.1]]

Putting It All Together: A Linear Regression Solver

Here is how these operations combine to solve the normal equation for linear regression: w = (X^T X)^(-1) X^T y.

def linear_regression_normal_equation(X, y):
    """
    Solve linear regression using the normal equation.
    X: m x n feature matrix (m samples, n features)
    y: m x 1 target vector (as column matrix)
    Returns: n x 1 weight vector
    """
    Xt = transpose(X)                        # n x m
    XtX = matrix_multiply(Xt, X)             # n x n
    XtX_inv = matrix_inverse(XtX)            # n x n
    Xty = matrix_multiply(Xt, y)             # n x 1
    w = matrix_multiply(XtX_inv, Xty)        # n x 1
    return w

# Example: y = 2x + 1
# X has bias column (ones) and feature column
X = [[1, 1], [1, 2], [1, 3], [1, 4]]  # 4 samples, 2 features
y = [[3], [5], [7], [9]]               # y = 2x + 1

w = linear_regression_normal_equation(X, y)
print(w)  # [[1.0], [2.0]] -> intercept=1, slope=2 ✓

Key Takeaways

  • Matrix multiplication is O(n^3) with the naive algorithm. Loop order ikj is more cache-friendly than ijk.
  • Matrix inverse uses Gauss-Jordan elimination with partial pivoting for numerical stability. Always check for singular matrices.
  • Matrix rank is computed via Gaussian elimination — count the non-zero rows in row echelon form.
  • Trace is the simplest operation (O(n)) but has deep mathematical significance: it equals the sum of eigenvalues.
  • Hadamard product is element-wise multiplication, not matrix multiplication. It appears in attention and gating mechanisms.
  • These six operations are the building blocks for everything in this course.