Beginner

Framework for ML System Design

A structured 4-step approach that works for any ML system design question. Learn how to manage your 45 minutes, what interviewers evaluate at each stage, and the scoring rubric that determines your hire/no-hire decision.

Why ML System Design Is Different

Unlike traditional system design interviews that focus on scaling web services, ML system design interviews evaluate your ability to reason about data, models, metrics, and the unique challenges of deploying machine learning in production. The interviewer wants to see that you can:

  • Translate a vague business problem into a concrete ML formulation
  • Design data pipelines, feature stores, and training infrastructure
  • Choose appropriate model architectures and justify your decisions
  • Define meaningful offline and online metrics
  • Address serving latency, model freshness, and feedback loops
  • Discuss trade-offs rather than presenting a single “right answer”

The 4-Step Framework

This framework works for every ML system design question. The key is allocating your time correctly and going deep in the areas where the interviewer signals interest.

Step 1: Clarify Requirements & Define the Problem (5–8 minutes)

Never jump into architecture. Start by asking clarifying questions to nail down what the system needs to do.

💡
Questions to always ask:
  • Who are the users? What is the scale (DAU, QPS, data volume)?
  • What is the business objective? How does ML improve it?
  • What are the latency requirements? Real-time vs. batch?
  • What data is available? Labels? User interactions?
  • Are there fairness, privacy, or regulatory constraints?

Then formulate the ML problem clearly:

# Problem formulation template
# Business goal:    Increase user engagement on news feed
# ML formulation:   Predict P(user clicks post | user, post, context)
# Input:            User features, post features, context features
# Output:           Click probability score [0, 1]
# Training data:    Historical click/impression logs
# Optimization:     Binary cross-entropy loss

Step 2: High-Level Architecture (8–10 minutes)

Draw the system architecture showing how data flows from raw inputs to model predictions served to users. Cover these components:

ComponentPurposeKey Questions
Data PipelineCollect, clean, and store training dataBatch vs. streaming? Schema evolution? Data quality checks?
Feature StoreCompute and serve features consistentlyOnline vs. offline features? Feature freshness? Point-in-time correctness?
Training PipelineTrain and validate modelsTraining frequency? Distributed training? Hyperparameter tuning?
Model RegistryVersion, validate, and promote modelsA/B test integration? Rollback strategy? Approval workflow?
Serving InfrastructureServe predictions at scaleLatency budget? Batching? Caching? Fallback models?
MonitoringTrack model health and business metricsData drift? Model decay? Alert thresholds?

Step 3: Deep Dive into Key Components (15–20 minutes)

This is where you spend the most time. The interviewer will usually signal which area to go deep on. Common deep dives include:

📊

Feature Engineering

User features (demographics, history, engagement patterns), item features (content, metadata, popularity), context features (time, device, location), and cross features.

🧠

Model Architecture

Why this model over alternatives? Two-tower vs. cross-network? How to handle sparse categorical features? Cold start problem?

📈

Metrics & Evaluation

Offline metrics (AUC, NDCG, precision@k), online metrics (CTR, engagement time, revenue), and how to run A/B tests correctly.

Serving & Scaling

Latency optimization, model distillation, feature caching, candidate generation + ranking pipeline, and graceful degradation.

Step 4: Trade-Offs & Extensions (5–7 minutes)

Wrap up by discussing trade-offs you made and potential extensions. This demonstrates senior-level thinking.

  • Accuracy vs. latency: “We could use a transformer-based model for better accuracy, but the serving latency would increase from 10ms to 200ms. A distilled model offers a good middle ground.”
  • Freshness vs. cost: “Real-time feature computation gives us the freshest signals but costs 10x more than hourly batch updates. We can use a hybrid approach.”
  • Complexity vs. maintainability: “A multi-task learning setup could improve performance by 3%, but it makes debugging and iteration significantly harder.”
  • Fairness: “We need to audit the model for bias across demographics and implement fairness constraints in the loss function.”

Time Management: The 45-Minute Blueprint

PhaseTimeWhat to CoverCommon Mistakes
Clarify5–8 minRequirements, constraints, ML formulationSkipping this and jumping to architecture
Architecture8–10 minData flow, components, high-level diagramSpending too long on infrastructure details
Deep Dive15–20 minFeatures, model, metrics, servingStaying too shallow across all areas
Trade-Offs5–7 minAlternatives, extensions, limitationsRunning out of time before reaching this
Critical mistake: Many candidates spend 25+ minutes on architecture and never get to the deep dive. Interviewers consistently report that candidates who go deep on features, model design, and metrics score significantly higher than those who draw comprehensive but shallow diagrams.

The Scoring Rubric

Most FAANG companies use a rubric similar to this. Understanding it helps you allocate effort to the highest-impact areas.

CriterionWeightStrong SignalWeak Signal
Problem Formulation20%Clear ML objective, correct loss function, identifies edge casesVague problem statement, wrong optimization target
System Architecture20%Complete data flow, addresses scale, mentions monitoringMissing components, no consideration of scale
ML Depth25%Thoughtful feature engineering, model justification, training strategyBlack-box model choice, no feature discussion
Metrics & Evaluation15%Both offline and online metrics, A/B test designOnly mentions accuracy, no online metrics
Trade-Off Analysis10%Multiple alternatives discussed with pros/consPresents only one approach as the answer
Communication10%Structured, responds to hints, manages time wellRambles, ignores interviewer signals

What Interviewers Look For at Each Level

Junior / New Grad (L3–L4)

  • Can formulate the problem as an ML task
  • Proposes reasonable features and model choice
  • Mentions basic offline metrics
  • Aware of training/serving split

Mid-Level (L4–L5)

  • Everything above, plus:
  • Designs complete data and serving pipelines
  • Discusses online metrics and A/B testing
  • Addresses cold start, data freshness, and model retraining
  • Proposes multiple approaches with trade-offs

Senior / Staff (L5–L6+)

  • Everything above, plus:
  • Considers organizational and cross-team implications
  • Designs for iteration speed: experiment framework, feature platform
  • Addresses fairness, privacy, and regulatory requirements
  • Proposes a phased rollout plan (v1 simple, v2 advanced)

Practice Checklist

Before your interview, make sure you can answer “yes” to each of these:

  • ☐ Can I formulate any business problem as an ML optimization objective?
  • ☐ Can I draw a complete system architecture in under 10 minutes?
  • ☐ Can I list 15+ features for any ML system off the top of my head?
  • ☐ Can I explain why I chose this model over 3 alternatives?
  • ☐ Can I define both offline and online metrics and explain how to A/B test?
  • ☐ Can I discuss at least 3 trade-offs for any design decision?
  • ☐ Can I explain how the system handles failure modes and edge cases?