Intermediate

Design a News Feed Ranking System

A complete ML system design walkthrough for one of the most commonly asked interview questions. Learn how to design a personalized news feed ranking system like those used at Facebook, LinkedIn, and Twitter.

Step 1: Clarify Requirements

Before designing anything, ask the interviewer these questions:

📝

Sample dialogue:

“What types of content appear in the feed?” — Posts, photos, videos, ads, stories
“What is the scale?” — 500M DAU, each user follows ~200 accounts
“What are we optimizing for?” — User engagement (clicks, likes, comments, shares, time spent)
“Latency requirements?” — Feed must render in under 200ms
“Any content policy constraints?” — Misinformation demotion, diversity requirements

ML Problem Formulation

# Problem formulation
# Business goal:    Maximize user engagement (composite metric)
# ML task:          Multi-objective ranking
# Predict:          P(click), P(like), P(comment), P(share), P(hide)
# Final score:      weighted_sum(predictions) = w1*P(click) + w2*P(like) + ...
# Training data:    Historical user-post interaction logs
# Label:            Binary labels for each action type

Step 2: High-Level Architecture

The system uses a two-stage architecture to handle the scale of ranking thousands of candidate posts per user request.

# Architecture overview (draw this on whiteboard)
#
# [User Request] --> [Candidate Generation] --> [Ranking] --> [Re-ranking] --> [Feed]
#                          |                       |              |
#                    ~10,000 posts            ~500 posts     ~50 posts
#                    (fast retrieval)        (ML scoring)   (policy + diversity)
#
# Supporting systems:
# [Feature Store] - serves user/post features to ranking
# [Training Pipeline] - daily model retraining on interaction logs
# [Monitoring] - tracks CTR, engagement, model drift

Why Two Stages?

Stage	Purpose	Model Complexity	Latency Budget
Candidate Generation	Narrow from millions to ~10K candidates	Simple (ANN, collaborative filtering)	~20ms
Ranking	Score ~500 candidates with rich features	Complex (deep neural network)	~50ms
Re-ranking	Apply business rules, diversity, freshness	Rule-based + lightweight ML	~10ms

Step 3: Deep Dive — Feature Engineering

Features are the most impactful part of any ranking system. Group them into categories:

User Features

Feature	Type	Description
user_age_bucket	Categorical	Age group: 18–24, 25–34, 35–44, etc.
user_country	Categorical	User’s country (hashed embedding)
avg_session_duration	Numerical	Average session length in last 7 days
posts_liked_7d	Numerical	Number of posts liked in the last 7 days
content_type_affinity	Embedding	Vector representing preference for photos/videos/text
topic_interests	Embedding	Learned topic interest vector from interaction history

Post Features

Feature	Type	Description
post_age_hours	Numerical	Hours since post was created
content_type	Categorical	Photo, video, text, link, poll
author_follower_count	Numerical	Log-transformed follower count of author
historical_ctr	Numerical	Click-through rate of this post so far
text_embedding	Embedding	BERT embedding of post text
has_media	Binary	Whether post contains image/video

Context Features

Feature	Type	Description
time_of_day	Cyclical	sin/cos encoded hour of day
day_of_week	Categorical	Monday through Sunday
device_type	Categorical	Mobile, tablet, desktop
connection_type	Categorical	WiFi, 4G, 5G (affects video auto-play)

Cross Features (User x Post Interactions)

Feature	Type	Description
user_author_interaction_count	Numerical	How many times user interacted with this author
user_topic_affinity_score	Numerical	Dot product of user topic vector and post topic vector
friend_interaction_count	Numerical	How many of user’s friends interacted with this post

Deep Dive — Ranking Model Architecture

The ranking model is a multi-task deep neural network that predicts multiple engagement types simultaneously.

# Model architecture (simplified)
#
# Input features
#   |
# [Embedding Layer] -- converts sparse categoricals to dense vectors
#   |
# [Feature Interaction Layer] -- DCN (Deep & Cross Network) or DeepFM
#   |
# [Shared Hidden Layers] -- 3 layers of 512, 256, 128 units (ReLU)
#   |
# [Task-Specific Towers]
#   |-- P(click)    -- sigmoid output
#   |-- P(like)     -- sigmoid output
#   |-- P(comment)  -- sigmoid output
#   |-- P(share)    -- sigmoid output
#   |-- P(hide)     -- sigmoid output (negative signal)
#
# Final score = w1*P(click) + w2*P(like) + w3*P(comment)
#             + w4*P(share) - w5*P(hide)

💡

Why multi-task learning? Training a single model with multiple heads is more parameter-efficient than separate models. The shared layers learn general engagement patterns, while task-specific towers capture what makes a user click vs. comment. This approach also reduces serving latency since one forward pass produces all predictions.

Model Alternatives and Trade-Offs

Model	Pros	Cons	When to Use
Logistic Regression	Fast, interpretable, easy to debug	Limited feature interactions	V1 baseline, high-QPS systems
GBDT (XGBoost)	Handles non-linear interactions, robust	Hard to serve at low latency	Offline scoring, re-ranking
Deep & Cross Network	Automatic feature crosses, good accuracy	More complex to train	Production ranking at scale
Transformer Ranker	Captures sequence patterns, state-of-the-art	High latency, expensive	When accuracy matters most

Deep Dive — Serving Infrastructure

Candidate Generation

Use multiple candidate generators in parallel for coverage:

Friends’ posts: Fetch recent posts from followed accounts (simple database query)
Collaborative filtering: “Users similar to you also engaged with these posts” (ANN index like FAISS)
Content-based: Posts similar to what user engaged with recently (embedding similarity)
Trending/viral: Posts with high engagement velocity (popularity-based)

Serving Flow

# Serving flow (latency breakdown)
#
# 1. User opens app                        [0ms]
# 2. Fetch user features from Feature Store [5ms]
# 3. Candidate generation (parallel)        [20ms]
#    - Friends' posts: 15ms
#    - CF candidates: 18ms
#    - Content-based: 12ms
# 4. Fetch post features for candidates     [10ms]
# 5. Compute cross features                 [5ms]
# 6. Run ranking model (batch inference)    [30ms]
# 7. Re-ranking (diversity, freshness)      [10ms]
# 8. Return ranked feed                     [Total: ~80ms]

Deep Dive — Metrics & Evaluation

Offline Metrics

Metric	What It Measures	Target
AUC-ROC	Ranking quality of click prediction	> 0.80
NDCG@10	Quality of top-10 ranked items	> 0.65
Log Loss	Calibration of probability predictions	< 0.45
Precision@5	Fraction of top-5 that user engages with	> 0.30

Online Metrics (A/B Test)

Metric	What It Measures	Guardrail
CTR	Fraction of impressions that get clicks	Must not decrease
Session Duration	Average time spent per session	Primary success metric
Daily Active Users	Long-term retention signal	Must not decrease
Content Diversity	Variety of topics/authors in feed	Must not decrease by >5%
Negative Feedback Rate	Hide/report/unfollow actions	Must not increase

Step 4: Trade-Offs & Extensions

⚖

Engagement vs. Well-Being

Optimizing purely for clicks can promote clickbait and outrage content. Use a composite score that penalizes regretful clicks (user hides post after clicking).

🔄

Freshness vs. Relevance

Highly relevant older posts compete with newer but less relevant posts. Apply a time decay multiplier to balance freshness with relevance score.

🌐

Personalization vs. Filter Bubble

Strong personalization can create echo chambers. Inject a diversity bonus to surface content outside the user’s typical interests.

❄

Cold Start Problem

New users and new posts lack interaction data. Use content-based features and popularity signals until enough interaction data is collected.

💡

Interview tip: Mentioning the tension between engagement optimization and user well-being demonstrates product maturity and awareness of responsible AI — a strong signal at senior levels.

← Previous Framework for ML System Design Next → Design Search Autocomplete with ML