Intermediate

ML Coding Round

The ML coding round tests whether you can translate ML concepts into working code under time pressure. Unlike standard software engineering coding rounds, the focus is on ML-specific implementations — not LeetCode-style problems.

What Interviewers Actually Evaluate

Interviewers use a scoring rubric that goes beyond "does it work." Here is what they look for:

Criteria	Weight	What Strong Looks Like
ML Knowledge	30%	Correct algorithm implementation, understands the math behind it, knows when to use it
Code Quality	25%	Clean, readable, well-structured code. Meaningful variable names. Modular functions.
Problem Solving	25%	Clarifies requirements before coding. Breaks problem into steps. Handles edge cases.
Communication	20%	Explains approach before coding. Thinks out loud. Discusses trade-offs when asked.

💡

Time management: You have 45 minutes. Spend the first 5 minutes clarifying the problem and outlining your approach on paper (or verbally). Spend 30 minutes coding. Save the last 10 minutes for testing and discussing extensions. Candidates who jump straight into coding often produce messy, incorrect solutions.

The 5 Most Common ML Coding Patterns

These patterns cover approximately 80% of ML coding interview questions. Master all five.

Pattern 1: Implement Gradient Descent from Scratch

Why they ask it: Tests whether you understand optimization — the foundation of all ML training.

import numpy as np

def linear_regression_gradient_descent(X, y, lr=0.01, epochs=1000):
    """
    Train a linear regression model using batch gradient descent.

    Args:
        X: Feature matrix (n_samples, n_features)
        y: Target vector (n_samples,)
        lr: Learning rate
        epochs: Number of training iterations

    Returns:
        weights: Learned weight vector
        bias: Learned bias term
        losses: List of MSE loss at each epoch
    """
    n_samples, n_features = X.shape
    weights = np.zeros(n_features)
    bias = 0.0
    losses = []

    for epoch in range(epochs):
        # Forward pass: compute predictions
        y_pred = X @ weights + bias

        # Compute loss (MSE)
        loss = np.mean((y_pred - y) ** 2)
        losses.append(loss)

        # Compute gradients
        error = y_pred - y
        dw = (2 / n_samples) * (X.T @ error)
        db = (2 / n_samples) * np.sum(error)

        # Update parameters
        weights -= lr * dw
        bias -= lr * db

    return weights, bias, losses

# INTERVIEWER FOLLOW-UPS:
# 1. "How would you add L2 regularization?"
#    Add lambda * weights to dw, and lambda * ||w||^2 to loss
#
# 2. "How would you convert this to stochastic GD?"
#    Sample a random mini-batch each iteration instead of using all data
#
# 3. "When would gradient descent fail?"
#    Non-convex loss surfaces, learning rate too high/low, feature scaling issues

Pattern 2: Build a Decision Tree Classifier

Why they ask it: Tests understanding of recursive algorithms, information theory, and tree-based models.

import numpy as np
from collections import Counter

class DecisionTreeNode:
    def __init__(self, feature=None, threshold=None,
                 left=None, right=None, value=None):
        self.feature = feature        # Index of feature to split on
        self.threshold = threshold    # Threshold value for split
        self.left = left              # Left subtree
        self.right = right            # Right subtree
        self.value = value            # Leaf prediction (class label)

class SimpleDecisionTree:
    def __init__(self, max_depth=10, min_samples=2):
        self.max_depth = max_depth
        self.min_samples = min_samples
        self.root = None

    def _gini(self, y):
        """Compute Gini impurity."""
        counts = Counter(y)
        n = len(y)
        return 1.0 - sum((c / n) ** 2 for c in counts.values())

    def _best_split(self, X, y):
        """Find the best feature and threshold to split on."""
        best_gain = -1
        best_feature, best_threshold = None, None
        parent_gini = self._gini(y)
        n = len(y)

        for feature_idx in range(X.shape[1]):
            thresholds = np.unique(X[:, feature_idx])
            for threshold in thresholds:
                left_mask = X[:, feature_idx] <= threshold
                right_mask = ~left_mask

                if sum(left_mask) == 0 or sum(right_mask) == 0:
                    continue

                # Weighted Gini of children
                left_gini = self._gini(y[left_mask])
                right_gini = self._gini(y[right_mask])
                weighted = (sum(left_mask) / n * left_gini +
                           sum(right_mask) / n * right_gini)
                gain = parent_gini - weighted

                if gain > best_gain:
                    best_gain = gain
                    best_feature = feature_idx
                    best_threshold = threshold

        return best_feature, best_threshold, best_gain

    def _build(self, X, y, depth):
        """Recursively build the tree."""
        # Stopping conditions
        if (depth >= self.max_depth or
            len(y) < self.min_samples or
            len(set(y)) == 1):
            return DecisionTreeNode(
                value=Counter(y).most_common(1)[0][0]
            )

        feature, threshold, gain = self._best_split(X, y)
        if gain <= 0:
            return DecisionTreeNode(
                value=Counter(y).most_common(1)[0][0]
            )

        left_mask = X[:, feature] <= threshold
        left = self._build(X[left_mask], y[left_mask], depth + 1)
        right = self._build(X[~left_mask], y[~left_mask], depth + 1)

        return DecisionTreeNode(feature=feature,
                                threshold=threshold,
                                left=left, right=right)

    def fit(self, X, y):
        self.root = self._build(X, y, depth=0)

    def _predict_one(self, x, node):
        if node.value is not None:
            return node.value
        if x[node.feature] <= node.threshold:
            return self._predict_one(x, node.left)
        return self._predict_one(x, node.right)

    def predict(self, X):
        return np.array([self._predict_one(x, self.root) for x in X])

Pattern 3: Build a Data Processing Pipeline

Why they ask it: Real ML work is 80% data. This tests your ability to handle messy, real-world data.

import numpy as np
import pandas as pd

def build_feature_pipeline(df, target_col, categorical_cols,
                           numerical_cols):
    """
    Build a complete feature engineering pipeline.

    Args:
        df: Raw DataFrame
        target_col: Name of target column
        categorical_cols: List of categorical column names
        numerical_cols: List of numerical column names

    Returns:
        X: Processed feature matrix
        y: Target vector
        pipeline_config: Dict of parameters for inference
    """
    pipeline_config = {}

    # Step 1: Handle missing values
    for col in numerical_cols:
        median_val = df[col].median()
        df[col] = df[col].fillna(median_val)
        pipeline_config[f'{col}_median'] = median_val

    for col in categorical_cols:
        mode_val = df[col].mode()[0]
        df[col] = df[col].fillna(mode_val)
        pipeline_config[f'{col}_mode'] = mode_val

    # Step 2: Encode categorical variables
    encoded_frames = []
    for col in categorical_cols:
        dummies = pd.get_dummies(df[col], prefix=col, drop_first=True)
        encoded_frames.append(dummies)
        pipeline_config[f'{col}_categories'] = list(dummies.columns)

    # Step 3: Scale numerical features
    for col in numerical_cols:
        mean_val = df[col].mean()
        std_val = df[col].std()
        df[col] = (df[col] - mean_val) / (std_val + 1e-8)
        pipeline_config[f'{col}_mean'] = mean_val
        pipeline_config[f'{col}_std'] = std_val

    # Step 4: Combine features
    numerical_features = df[numerical_cols]
    X = pd.concat([numerical_features] + encoded_frames, axis=1)
    y = df[target_col].values

    return X.values, y, pipeline_config

# KEY INTERVIEWER QUESTIONS:
# 1. "Why do you save the pipeline_config?"
#    For inference — you must apply the SAME transformations
#    (same medians, means, categories) to new data.
#
# 2. "What's wrong with using the test set statistics?"
#    Data leakage — the model learns information from the test set.
#
# 3. "How would you handle a category at inference time
#     that wasn't in the training data?"
#    Ignore it (treat as all zeros) or use a catch-all "other" bucket.

Pattern 4: Implement K-Means Clustering

Why they ask it: Tests understanding of unsupervised learning, iterative algorithms, and convergence.

import numpy as np

def kmeans(X, k, max_iters=100, tol=1e-4):
    """
    Implement K-Means clustering from scratch.

    Args:
        X: Data matrix (n_samples, n_features)
        k: Number of clusters
        max_iters: Maximum iterations
        tol: Convergence tolerance

    Returns:
        centroids: Final cluster centers (k, n_features)
        labels: Cluster assignment for each point (n_samples,)
    """
    n_samples, n_features = X.shape

    # Initialize centroids using random data points
    random_indices = np.random.choice(n_samples, k, replace=False)
    centroids = X[random_indices].copy()

    for iteration in range(max_iters):
        # Step 1: Assign each point to nearest centroid
        distances = np.zeros((n_samples, k))
        for i in range(k):
            distances[:, i] = np.linalg.norm(X - centroids[i], axis=1)
        labels = np.argmin(distances, axis=1)

        # Step 2: Update centroids
        new_centroids = np.zeros_like(centroids)
        for i in range(k):
            cluster_points = X[labels == i]
            if len(cluster_points) > 0:
                new_centroids[i] = cluster_points.mean(axis=0)
            else:
                # Handle empty cluster: reinitialize randomly
                new_centroids[i] = X[np.random.randint(n_samples)]

        # Step 3: Check for convergence
        shift = np.linalg.norm(new_centroids - centroids)
        centroids = new_centroids
        if shift < tol:
            break

    return centroids, labels

# FOLLOW-UP: "What are the limitations of K-Means?"
# 1. Must specify k in advance (use elbow method or silhouette)
# 2. Sensitive to initialization (use k-means++ instead)
# 3. Assumes spherical clusters (fails on non-convex shapes)
# 4. Sensitive to outliers (consider k-medoids instead)

Pattern 5: Implement a Simple Neural Network

Why they ask it: Tests understanding of backpropagation, activation functions, and the training loop.

import numpy as np

class SimpleNeuralNetwork:
    """Two-layer neural network for binary classification."""

    def __init__(self, input_size, hidden_size, lr=0.01):
        # Xavier initialization
        self.W1 = np.random.randn(input_size, hidden_size) * np.sqrt(2.0 / input_size)
        self.b1 = np.zeros(hidden_size)
        self.W2 = np.random.randn(hidden_size, 1) * np.sqrt(2.0 / hidden_size)
        self.b2 = np.zeros(1)
        self.lr = lr

    def _sigmoid(self, z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

    def _relu(self, z):
        return np.maximum(0, z)

    def _relu_derivative(self, z):
        return (z > 0).astype(float)

    def forward(self, X):
        self.z1 = X @ self.W1 + self.b1
        self.a1 = self._relu(self.z1)
        self.z2 = self.a1 @ self.W2 + self.b2
        self.a2 = self._sigmoid(self.z2)
        return self.a2

    def backward(self, X, y):
        n = X.shape[0]
        y = y.reshape(-1, 1)

        # Output layer gradients
        dz2 = self.a2 - y                          # (n, 1)
        dW2 = (1 / n) * (self.a1.T @ dz2)          # (hidden, 1)
        db2 = (1 / n) * np.sum(dz2, axis=0)        # (1,)

        # Hidden layer gradients
        da1 = dz2 @ self.W2.T                       # (n, hidden)
        dz1 = da1 * self._relu_derivative(self.z1)  # (n, hidden)
        dW1 = (1 / n) * (X.T @ dz1)                 # (input, hidden)
        db1 = (1 / n) * np.sum(dz1, axis=0)         # (hidden,)

        # Update weights
        self.W2 -= self.lr * dW2
        self.b2 -= self.lr * db2
        self.W1 -= self.lr * dW1
        self.b1 -= self.lr * db1

    def train(self, X, y, epochs=100):
        losses = []
        for epoch in range(epochs):
            y_pred = self.forward(X)
            loss = -np.mean(y * np.log(y_pred + 1e-8) +
                           (1 - y) * np.log(1 - y_pred + 1e-8))
            losses.append(loss)
            self.backward(X, y)
        return losses

    def predict(self, X):
        return (self.forward(X) >= 0.5).astype(int).flatten()

Coding Style That Impresses Interviewers

Your code style signals your experience level. Follow these conventions:

Do	Do Not
Write docstrings with Args and Returns	Skip documentation entirely
Use descriptive variable names (`n_samples`, `learning_rate`)	Use single letters (`a`, `b`, `x1`)
Handle edge cases (empty arrays, division by zero)	Assume inputs are always clean
Use NumPy vectorization	Write nested for-loops over arrays
Explain your approach before coding	Code in silence for 30 minutes
Test with a simple example at the end	Say "I think it works" without testing

⚠

Common trap: Many candidates try to use sklearn in ML coding interviews. Unless the interviewer explicitly says you can use libraries, implement the algorithm from scratch. The entire point of this round is to test whether you understand the internals.

Time Management Strategy

45-MINUTE ML CODING ROUND BREAKDOWN
=====================================
[0:00 - 0:05]  Read problem, ask clarifying questions
               - "Can I assume the data fits in memory?"
               - "Should I handle multi-class or just binary?"
               - "Is there a specific metric I should optimize?"

[0:05 - 0:10]  Outline approach (pseudocode or verbal)
               - "I'll implement this in three steps..."
               - Get interviewer buy-in before coding

[0:10 - 0:35]  Write code
               - Start with the main function signature
               - Build incrementally (get a basic version first)
               - Comment non-obvious steps

[0:35 - 0:40]  Test with a simple example
               - Walk through your code with a 3-4 row dataset
               - Verify the output makes sense

[0:40 - 0:45]  Discuss extensions and trade-offs
               - "If I had more time, I would add..."
               - "The time complexity is O(n*k*features)..."
               - "In production, I would use sklearn but..."

← Previous Phone Screen & Recruiter Next → ML Theory Deep Dive