Intermediate
Implement Linear & Logistic Regression
The most frequently asked ML coding question. Build both models from scratch with NumPy, understand every line of the gradient descent update, and learn how to add regularization when the interviewer asks the follow-up.
Interview Question #1: “Implement linear regression from scratch using only NumPy. Your class should have
fit(X, y) and predict(X) methods. Use gradient descent for optimization.”Linear Regression: Complete Solution
The Math You Must Know
Linear regression models the relationship y = Xw + b where:
- X is the feature matrix of shape (n_samples, n_features)
- w is the weight vector of shape (n_features,)
- b is the bias (scalar)
- y is the target vector of shape (n_samples,)
The loss function is Mean Squared Error (MSE):
L = (1 / 2n) * sum((y_pred - y_true)^2)
The gradients are:
dL/dw = (1/n) * X.T @ (y_pred - y_true)
dL/db = (1/n) * sum(y_pred - y_true)
Full Implementation
import numpy as np
class LinearRegression:
"""
Linear regression using gradient descent.
Interview-ready implementation with full documentation.
"""
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.lr = learning_rate
self.n_iter = n_iterations
self.weights = None
self.bias = None
self.loss_history = []
def fit(self, X, y):
"""
Train the model using gradient descent.
Parameters:
X: np.ndarray of shape (n_samples, n_features)
y: np.ndarray of shape (n_samples,)
"""
n_samples, n_features = X.shape
# Initialize parameters
self.weights = np.zeros(n_features)
self.bias = 0.0
for i in range(self.n_iter):
# Forward pass
y_pred = X @ self.weights + self.bias # (n_samples,)
# Compute loss (MSE)
loss = np.mean((y_pred - y) ** 2) / 2
self.loss_history.append(loss)
# Compute gradients
dw = (1 / n_samples) * (X.T @ (y_pred - y)) # (n_features,)
db = (1 / n_samples) * np.sum(y_pred - y) # scalar
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
def predict(self, X):
"""Predict target values for input X."""
return X @ self.weights + self.bias
def score(self, X, y):
"""Return R-squared score."""
y_pred = self.predict(X)
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - np.mean(y)) ** 2)
return 1 - (ss_res / ss_tot)
# ---- Test with known data ----
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X.squeeze() + np.random.randn(100) * 0.5 # y = 4 + 3x + noise
model = LinearRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X, y)
print(f"Weight: {model.weights[0]:.4f}") # Should be close to 3.0
print(f"Bias: {model.bias:.4f}") # Should be close to 4.0
print(f"R^2: {model.score(X, y):.4f}") # Should be close to 1.0
What interviewers look for:
- Correct gradient computation — the transpose of X multiplied by the error, divided by n
- Proper shape handling — no accidental broadcasting errors
- Loss tracking — shows you understand convergence
- Clean API matching scikit-learn conventions (fit/predict/score)
Common follow-up: “Your gradient descent is not converging. What could be wrong?” Answer: (1) Learning rate too high — try reducing by 10x. (2) Features not normalized — different scales cause zigzagging. (3) Too few iterations. Always mention feature normalization.
Interview Question #2: “Now implement logistic regression for binary classification. Include the sigmoid function and binary cross-entropy loss. Use the same gradient descent approach.”
Logistic Regression: Complete Solution
Key Differences from Linear Regression
| Aspect | Linear Regression | Logistic Regression |
|---|---|---|
| Output | Continuous value | Probability (0 to 1) |
| Activation | None (identity) | Sigmoid function |
| Loss function | Mean Squared Error | Binary Cross-Entropy |
| Gradient formula | Same structure | Same structure (surprisingly) |
Full Implementation
import numpy as np
class LogisticRegression:
"""
Logistic regression using gradient descent.
Handles binary classification with sigmoid activation.
"""
def __init__(self, learning_rate=0.01, n_iterations=1000):
self.lr = learning_rate
self.n_iter = n_iterations
self.weights = None
self.bias = None
self.loss_history = []
def _sigmoid(self, z):
"""
Numerically stable sigmoid function.
Avoids overflow by clipping input values.
"""
z = np.clip(z, -500, 500) # Prevent overflow
return 1.0 / (1.0 + np.exp(-z))
def fit(self, X, y):
"""
Train logistic regression using gradient descent.
Parameters:
X: np.ndarray of shape (n_samples, n_features)
y: np.ndarray of shape (n_samples,) with values 0 or 1
"""
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0.0
for i in range(self.n_iter):
# Forward pass
z = X @ self.weights + self.bias # (n_samples,)
y_pred = self._sigmoid(z) # (n_samples,)
# Binary cross-entropy loss
# Add epsilon to prevent log(0)
eps = 1e-15
loss = -np.mean(
y * np.log(y_pred + eps) +
(1 - y) * np.log(1 - y_pred + eps)
)
self.loss_history.append(loss)
# Gradients (same form as linear regression!)
dw = (1 / n_samples) * (X.T @ (y_pred - y))
db = (1 / n_samples) * np.sum(y_pred - y)
# Update
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
def predict_proba(self, X):
"""Return probability of class 1."""
z = X @ self.weights + self.bias
return self._sigmoid(z)
def predict(self, X, threshold=0.5):
"""Return class labels (0 or 1)."""
return (self.predict_proba(X) >= threshold).astype(int)
def accuracy(self, X, y):
"""Return classification accuracy."""
return np.mean(self.predict(X) == y)
# ---- Test with separable data ----
np.random.seed(42)
# Class 0: centered at (-1, -1), Class 1: centered at (1, 1)
X_class0 = np.random.randn(50, 2) + np.array([-1, -1])
X_class1 = np.random.randn(50, 2) + np.array([1, 1])
X = np.vstack([X_class0, X_class1])
y = np.array([0] * 50 + [1] * 50)
model = LogisticRegression(learning_rate=0.1, n_iterations=1000)
model.fit(X, y)
print(f"Weights: {model.weights}")
print(f"Bias: {model.bias:.4f}")
print(f"Accuracy: {model.accuracy(X, y):.2%}") # Should be >90%
print(f"Predict [2,2]: {model.predict(np.array([[2, 2]]))}") # Should be 1
print(f"Predict [-2,-2]: {model.predict(np.array([[-2, -2]]))}") # Should be 0
What interviewers look for:
- Numerical stability in the sigmoid — clipping z to prevent exp overflow
- Epsilon in log — preventing log(0) which gives -infinity
- Understanding that the gradient formula is identical to linear regression (a non-obvious insight)
- Separate
predict_probaandpredictmethods showing understanding of thresholds
Interview Question #3 (Follow-up): “Add L2 regularization (Ridge) to your logistic regression. Explain why regularization helps and what happens as lambda increases.”
Adding L2 Regularization
Regularization prevents overfitting by penalizing large weights. The regularized loss is:
L_reg = L + (lambda / 2n) * sum(w^2)
The gradient changes to include the regularization term:
dw_reg = dw + (lambda / n) * w
class LogisticRegressionL2:
"""Logistic regression with L2 (Ridge) regularization."""
def __init__(self, learning_rate=0.01, n_iterations=1000, reg_lambda=0.1):
self.lr = learning_rate
self.n_iter = n_iterations
self.reg_lambda = reg_lambda
self.weights = None
self.bias = None
def _sigmoid(self, z):
z = np.clip(z, -500, 500)
return 1.0 / (1.0 + np.exp(-z))
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0.0
for _ in range(self.n_iter):
z = X @ self.weights + self.bias
y_pred = self._sigmoid(z)
# Gradients WITH regularization
dw = (1 / n_samples) * (X.T @ (y_pred - y)) + \
(self.reg_lambda / n_samples) * self.weights
db = (1 / n_samples) * np.sum(y_pred - y)
# Note: bias is NOT regularized (standard practice)
self.weights -= self.lr * dw
self.bias -= self.lr * db
return self
def predict(self, X, threshold=0.5):
z = X @ self.weights + self.bias
return (self._sigmoid(z) >= threshold).astype(int)
# Compare regularized vs unregularized
model_noreg = LogisticRegressionL2(learning_rate=0.1, n_iterations=1000, reg_lambda=0.0)
model_reg = LogisticRegressionL2(learning_rate=0.1, n_iterations=1000, reg_lambda=1.0)
model_noreg.fit(X, y)
model_reg.fit(X, y)
print(f"No regularization - weights magnitude: {np.linalg.norm(model_noreg.weights):.4f}")
print(f"With L2 (lambda=1) - weights magnitude: {np.linalg.norm(model_reg.weights):.4f}")
# Regularized weights should have smaller magnitude
Interview tip: When explaining regularization, always mention: (1) It prevents overfitting by constraining weight magnitudes. (2) The bias term is not regularized. (3) As lambda increases, weights shrink toward zero, making the model simpler but potentially underfitting. (4) L1 (Lasso) promotes sparsity while L2 (Ridge) distributes weight across features.
Interview Question #4: “Implement the closed-form solution (Normal Equation) for linear regression. When would you prefer this over gradient descent?”
Normal Equation: Closed-Form Solution
class LinearRegressionClosed:
"""Linear regression using the Normal Equation (closed-form)."""
def __init__(self):
self.weights = None
self.bias = None
def fit(self, X, y):
"""
Solve w = (X^T X)^(-1) X^T y directly.
We prepend a column of ones to X to handle the bias.
"""
n_samples = X.shape[0]
# Add bias column
X_b = np.column_stack([np.ones(n_samples), X])
# Normal equation: w = (X^T X)^(-1) X^T y
# Using np.linalg.solve for numerical stability instead of inverse
XtX = X_b.T @ X_b
Xty = X_b.T @ y
theta = np.linalg.solve(XtX, Xty)
self.bias = theta[0]
self.weights = theta[1:]
return self
def predict(self, X):
return X @ self.weights + self.bias
# Test - should match gradient descent solution
model_closed = LinearRegressionClosed()
model_closed.fit(X, y) # Using same X, y from earlier
print(f"Closed-form weight: {model_closed.weights[0]:.4f}")
print(f"Closed-form bias: {model_closed.bias:.4f}")
When to use Normal Equation vs Gradient Descent:
| Factor | Normal Equation | Gradient Descent |
|---|---|---|
| Time complexity | O(n*d^2 + d^3) | O(n*d*k) where k = iterations |
| Best when | d < 10,000 features | d > 10,000 features |
| Feature scaling needed? | No | Yes (for convergence) |
| Hyperparameters | None | Learning rate, iterations |
| Singular matrix? | Can fail | Always works |
| Works for logistic? | No (non-linear loss) | Yes |
Bonus point: Tell the interviewer you use
np.linalg.solve instead of np.linalg.inv because computing the explicit inverse is numerically less stable and slower. This shows real-world engineering awareness.
Lilly Tech Systems