Beginner

Framework for ML System Design

A structured 4-step approach that works for any ML system design question. Learn how to manage your 45 minutes, what interviewers evaluate at each stage, and the scoring rubric that determines your hire/no-hire decision.

Why ML System Design Is Different

Unlike traditional system design interviews that focus on scaling web services, ML system design interviews evaluate your ability to reason about data, models, metrics, and the unique challenges of deploying machine learning in production. The interviewer wants to see that you can:

Translate a vague business problem into a concrete ML formulation
Design data pipelines, feature stores, and training infrastructure
Choose appropriate model architectures and justify your decisions
Define meaningful offline and online metrics
Address serving latency, model freshness, and feedback loops
Discuss trade-offs rather than presenting a single “right answer”

The 4-Step Framework

This framework works for every ML system design question. The key is allocating your time correctly and going deep in the areas where the interviewer signals interest.

Step 1: Clarify Requirements & Define the Problem (5–8 minutes)

Never jump into architecture. Start by asking clarifying questions to nail down what the system needs to do.

💡

Questions to always ask:

Who are the users? What is the scale (DAU, QPS, data volume)?
What is the business objective? How does ML improve it?
What are the latency requirements? Real-time vs. batch?
What data is available? Labels? User interactions?
Are there fairness, privacy, or regulatory constraints?

Then formulate the ML problem clearly:

# Problem formulation template
# Business goal:    Increase user engagement on news feed
# ML formulation:   Predict P(user clicks post | user, post, context)
# Input:            User features, post features, context features
# Output:           Click probability score [0, 1]
# Training data:    Historical click/impression logs
# Optimization:     Binary cross-entropy loss

Step 2: High-Level Architecture (8–10 minutes)

Draw the system architecture showing how data flows from raw inputs to model predictions served to users. Cover these components:

Component	Purpose	Key Questions
Data Pipeline	Collect, clean, and store training data	Batch vs. streaming? Schema evolution? Data quality checks?
Feature Store	Compute and serve features consistently	Online vs. offline features? Feature freshness? Point-in-time correctness?
Training Pipeline	Train and validate models	Training frequency? Distributed training? Hyperparameter tuning?
Model Registry	Version, validate, and promote models	A/B test integration? Rollback strategy? Approval workflow?
Serving Infrastructure	Serve predictions at scale	Latency budget? Batching? Caching? Fallback models?
Monitoring	Track model health and business metrics	Data drift? Model decay? Alert thresholds?

Step 3: Deep Dive into Key Components (15–20 minutes)

This is where you spend the most time. The interviewer will usually signal which area to go deep on. Common deep dives include:

📊

Feature Engineering

User features (demographics, history, engagement patterns), item features (content, metadata, popularity), context features (time, device, location), and cross features.

🧠

Model Architecture

Why this model over alternatives? Two-tower vs. cross-network? How to handle sparse categorical features? Cold start problem?

📈

Metrics & Evaluation

Offline metrics (AUC, NDCG, precision@k), online metrics (CTR, engagement time, revenue), and how to run A/B tests correctly.

⚡

Serving & Scaling

Latency optimization, model distillation, feature caching, candidate generation + ranking pipeline, and graceful degradation.

Step 4: Trade-Offs & Extensions (5–7 minutes)

Wrap up by discussing trade-offs you made and potential extensions. This demonstrates senior-level thinking.

Accuracy vs. latency: “We could use a transformer-based model for better accuracy, but the serving latency would increase from 10ms to 200ms. A distilled model offers a good middle ground.”
Freshness vs. cost: “Real-time feature computation gives us the freshest signals but costs 10x more than hourly batch updates. We can use a hybrid approach.”
Complexity vs. maintainability: “A multi-task learning setup could improve performance by 3%, but it makes debugging and iteration significantly harder.”
Fairness: “We need to audit the model for bias across demographics and implement fairness constraints in the loss function.”

Time Management: The 45-Minute Blueprint

Phase	Time	What to Cover	Common Mistakes
Clarify	5–8 min	Requirements, constraints, ML formulation	Skipping this and jumping to architecture
Architecture	8–10 min	Data flow, components, high-level diagram	Spending too long on infrastructure details
Deep Dive	15–20 min	Features, model, metrics, serving	Staying too shallow across all areas
Trade-Offs	5–7 min	Alternatives, extensions, limitations	Running out of time before reaching this

⚠

Critical mistake: Many candidates spend 25+ minutes on architecture and never get to the deep dive. Interviewers consistently report that candidates who go deep on features, model design, and metrics score significantly higher than those who draw comprehensive but shallow diagrams.

The Scoring Rubric

Most FAANG companies use a rubric similar to this. Understanding it helps you allocate effort to the highest-impact areas.

Criterion	Weight	Strong Signal	Weak Signal
Problem Formulation	20%	Clear ML objective, correct loss function, identifies edge cases	Vague problem statement, wrong optimization target
System Architecture	20%	Complete data flow, addresses scale, mentions monitoring	Missing components, no consideration of scale
ML Depth	25%	Thoughtful feature engineering, model justification, training strategy	Black-box model choice, no feature discussion
Metrics & Evaluation	15%	Both offline and online metrics, A/B test design	Only mentions accuracy, no online metrics
Trade-Off Analysis	10%	Multiple alternatives discussed with pros/cons	Presents only one approach as the answer
Communication	10%	Structured, responds to hints, manages time well	Rambles, ignores interviewer signals

What Interviewers Look For at Each Level

Junior / New Grad (L3–L4)

Can formulate the problem as an ML task
Proposes reasonable features and model choice
Mentions basic offline metrics
Aware of training/serving split

Mid-Level (L4–L5)

Everything above, plus:
Designs complete data and serving pipelines
Discusses online metrics and A/B testing
Addresses cold start, data freshness, and model retraining
Proposes multiple approaches with trade-offs

Senior / Staff (L5–L6+)

Everything above, plus:
Considers organizational and cross-team implications
Designs for iteration speed: experiment framework, feature platform
Addresses fairness, privacy, and regulatory requirements
Proposes a phased rollout plan (v1 simple, v2 advanced)

Practice Checklist

Before your interview, make sure you can answer “yes” to each of these:

☐ Can I formulate any business problem as an ML optimization objective?
☐ Can I draw a complete system architecture in under 10 minutes?
☐ Can I list 15+ features for any ML system off the top of my head?
☐ Can I explain why I chose this model over 3 alternatives?
☐ Can I define both offline and online metrics and explain how to A/B test?
☐ Can I discuss at least 3 trade-offs for any design decision?
☐ Can I explain how the system handles failure modes and edge cases?

Next → Design a News Feed Ranking System