Design a News Feed Ranking System
A complete ML system design walkthrough for one of the most commonly asked interview questions. Learn how to design a personalized news feed ranking system like those used at Facebook, LinkedIn, and Twitter.
Step 1: Clarify Requirements
Before designing anything, ask the interviewer these questions:
- “What types of content appear in the feed?” — Posts, photos, videos, ads, stories
- “What is the scale?” — 500M DAU, each user follows ~200 accounts
- “What are we optimizing for?” — User engagement (clicks, likes, comments, shares, time spent)
- “Latency requirements?” — Feed must render in under 200ms
- “Any content policy constraints?” — Misinformation demotion, diversity requirements
ML Problem Formulation
# Problem formulation
# Business goal: Maximize user engagement (composite metric)
# ML task: Multi-objective ranking
# Predict: P(click), P(like), P(comment), P(share), P(hide)
# Final score: weighted_sum(predictions) = w1*P(click) + w2*P(like) + ...
# Training data: Historical user-post interaction logs
# Label: Binary labels for each action type
Step 2: High-Level Architecture
The system uses a two-stage architecture to handle the scale of ranking thousands of candidate posts per user request.
# Architecture overview (draw this on whiteboard)
#
# [User Request] --> [Candidate Generation] --> [Ranking] --> [Re-ranking] --> [Feed]
# | | |
# ~10,000 posts ~500 posts ~50 posts
# (fast retrieval) (ML scoring) (policy + diversity)
#
# Supporting systems:
# [Feature Store] - serves user/post features to ranking
# [Training Pipeline] - daily model retraining on interaction logs
# [Monitoring] - tracks CTR, engagement, model drift
Why Two Stages?
| Stage | Purpose | Model Complexity | Latency Budget |
|---|---|---|---|
| Candidate Generation | Narrow from millions to ~10K candidates | Simple (ANN, collaborative filtering) | ~20ms |
| Ranking | Score ~500 candidates with rich features | Complex (deep neural network) | ~50ms |
| Re-ranking | Apply business rules, diversity, freshness | Rule-based + lightweight ML | ~10ms |
Step 3: Deep Dive — Feature Engineering
Features are the most impactful part of any ranking system. Group them into categories:
User Features
| Feature | Type | Description |
|---|---|---|
| user_age_bucket | Categorical | Age group: 18–24, 25–34, 35–44, etc. |
| user_country | Categorical | User’s country (hashed embedding) |
| avg_session_duration | Numerical | Average session length in last 7 days |
| posts_liked_7d | Numerical | Number of posts liked in the last 7 days |
| content_type_affinity | Embedding | Vector representing preference for photos/videos/text |
| topic_interests | Embedding | Learned topic interest vector from interaction history |
Post Features
| Feature | Type | Description |
|---|---|---|
| post_age_hours | Numerical | Hours since post was created |
| content_type | Categorical | Photo, video, text, link, poll |
| author_follower_count | Numerical | Log-transformed follower count of author |
| historical_ctr | Numerical | Click-through rate of this post so far |
| text_embedding | Embedding | BERT embedding of post text |
| has_media | Binary | Whether post contains image/video |
Context Features
| Feature | Type | Description |
|---|---|---|
| time_of_day | Cyclical | sin/cos encoded hour of day |
| day_of_week | Categorical | Monday through Sunday |
| device_type | Categorical | Mobile, tablet, desktop |
| connection_type | Categorical | WiFi, 4G, 5G (affects video auto-play) |
Cross Features (User x Post Interactions)
| Feature | Type | Description |
|---|---|---|
| user_author_interaction_count | Numerical | How many times user interacted with this author |
| user_topic_affinity_score | Numerical | Dot product of user topic vector and post topic vector |
| friend_interaction_count | Numerical | How many of user’s friends interacted with this post |
Deep Dive — Ranking Model Architecture
The ranking model is a multi-task deep neural network that predicts multiple engagement types simultaneously.
# Model architecture (simplified)
#
# Input features
# |
# [Embedding Layer] -- converts sparse categoricals to dense vectors
# |
# [Feature Interaction Layer] -- DCN (Deep & Cross Network) or DeepFM
# |
# [Shared Hidden Layers] -- 3 layers of 512, 256, 128 units (ReLU)
# |
# [Task-Specific Towers]
# |-- P(click) -- sigmoid output
# |-- P(like) -- sigmoid output
# |-- P(comment) -- sigmoid output
# |-- P(share) -- sigmoid output
# |-- P(hide) -- sigmoid output (negative signal)
#
# Final score = w1*P(click) + w2*P(like) + w3*P(comment)
# + w4*P(share) - w5*P(hide)
Model Alternatives and Trade-Offs
| Model | Pros | Cons | When to Use |
|---|---|---|---|
| Logistic Regression | Fast, interpretable, easy to debug | Limited feature interactions | V1 baseline, high-QPS systems |
| GBDT (XGBoost) | Handles non-linear interactions, robust | Hard to serve at low latency | Offline scoring, re-ranking |
| Deep & Cross Network | Automatic feature crosses, good accuracy | More complex to train | Production ranking at scale |
| Transformer Ranker | Captures sequence patterns, state-of-the-art | High latency, expensive | When accuracy matters most |
Deep Dive — Serving Infrastructure
Candidate Generation
Use multiple candidate generators in parallel for coverage:
- Friends’ posts: Fetch recent posts from followed accounts (simple database query)
- Collaborative filtering: “Users similar to you also engaged with these posts” (ANN index like FAISS)
- Content-based: Posts similar to what user engaged with recently (embedding similarity)
- Trending/viral: Posts with high engagement velocity (popularity-based)
Serving Flow
# Serving flow (latency breakdown)
#
# 1. User opens app [0ms]
# 2. Fetch user features from Feature Store [5ms]
# 3. Candidate generation (parallel) [20ms]
# - Friends' posts: 15ms
# - CF candidates: 18ms
# - Content-based: 12ms
# 4. Fetch post features for candidates [10ms]
# 5. Compute cross features [5ms]
# 6. Run ranking model (batch inference) [30ms]
# 7. Re-ranking (diversity, freshness) [10ms]
# 8. Return ranked feed [Total: ~80ms]
Deep Dive — Metrics & Evaluation
Offline Metrics
| Metric | What It Measures | Target |
|---|---|---|
| AUC-ROC | Ranking quality of click prediction | > 0.80 |
| NDCG@10 | Quality of top-10 ranked items | > 0.65 |
| Log Loss | Calibration of probability predictions | < 0.45 |
| Precision@5 | Fraction of top-5 that user engages with | > 0.30 |
Online Metrics (A/B Test)
| Metric | What It Measures | Guardrail |
|---|---|---|
| CTR | Fraction of impressions that get clicks | Must not decrease |
| Session Duration | Average time spent per session | Primary success metric |
| Daily Active Users | Long-term retention signal | Must not decrease |
| Content Diversity | Variety of topics/authors in feed | Must not decrease by >5% |
| Negative Feedback Rate | Hide/report/unfollow actions | Must not increase |
Step 4: Trade-Offs & Extensions
Engagement vs. Well-Being
Optimizing purely for clicks can promote clickbait and outrage content. Use a composite score that penalizes regretful clicks (user hides post after clicking).
Freshness vs. Relevance
Highly relevant older posts compete with newer but less relevant posts. Apply a time decay multiplier to balance freshness with relevance score.
Personalization vs. Filter Bubble
Strong personalization can create echo chambers. Inject a diversity bonus to surface content outside the user’s typical interests.
Cold Start Problem
New users and new posts lack interaction data. Use content-based features and popularity signals until enough interaction data is collected.
Lilly Tech Systems