Intermediate

Design Email Spam Detection

A complete walkthrough of designing an ML-powered email spam detection system. Learn how to handle adversarial attacks, build feedback loops, and balance precision and recall in a system where false positives cost real money.

Step 1: Clarify Requirements

📝

Key clarifications:

“Scale?” — 1B emails/day, 50K emails/second peak
“What counts as spam?” — Unsolicited commercial email, phishing, malware, scams
“Latency requirements?” — Must classify before delivery, under 200ms
“What matters more: precision or recall?” — High precision is critical (false positives = lost legitimate email)
“Do we handle images and attachments?” — Yes, multi-modal analysis

ML Problem Formulation

# Problem formulation
# Business goal:    Protect users from unwanted and dangerous email
# ML task:          Binary classification (spam vs. not spam)
# Input:            Email content + metadata + sender reputation
# Output:           Spam probability [0, 1] + category (promo, phishing, scam)
# Training data:    Historical emails with spam/ham labels + user feedback
# Loss function:    Weighted binary cross-entropy (higher weight on FP)
# Key constraint:   False positive rate must be < 0.01%

⚠

Critical insight: In spam detection, a false positive (legitimate email marked as spam) is much worse than a false negative (spam reaching inbox). A false positive can cause a user to miss an important business email or job offer. Discuss this asymmetry explicitly with the interviewer.

Step 2: High-Level Architecture

# Architecture: Multi-Layer Defense
#
# [Incoming Email]
#   |
# [Layer 1: IP/Domain Reputation] --> Block known bad senders (rule-based, ~5ms)
#   |
# [Layer 2: Content Analysis]     --> ML classification (features + model, ~50ms)
#   |
# [Layer 3: Link/Attachment Scan] --> URL reputation + malware scan (~100ms)
#   |
# [Decision Engine]               --> Combine scores, apply thresholds
#   |
# [Inbox / Spam Folder / Quarantine / Block]
#
# Feedback loop:
# [User marks spam/not spam] --> [Label Pipeline] --> [Model Retraining]

Step 3: Deep Dive — Feature Engineering

Email Header Features

Feature	Type	Why It Matters
sender_domain_age	Numerical	Newly registered domains are often used for spam campaigns
spf_dkim_dmarc_pass	Binary (3)	Email authentication failure is a strong spam signal
sender_reputation_score	Numerical	Aggregate score based on historical spam rate from this sender
num_recipients	Numerical	Mass emails sent to many recipients suggest spam
reply_to_mismatch	Binary	Reply-To different from From header is suspicious
received_hop_count	Numerical	Unusual routing can indicate spam relay

Content Features

Feature	Type	Why It Matters
text_embedding	Dense vector	Semantic representation of email body (BERT or distilled model)
subject_embedding	Dense vector	Subject lines like “You won!” or “Act now” are spam signals
url_count	Numerical	Excessive links suggest promotional or phishing email
url_domain_reputation	Numerical	Links to known malicious domains
html_to_text_ratio	Numerical	High HTML ratio with hidden text is a spam technique
image_to_text_ratio	Numerical	Image-only emails bypass text-based filters
urgency_keywords	Numerical	Count of urgency words: “urgent,” “immediate,” “limited time”
attachment_type	Categorical	Executable attachments (.exe, .js) are high-risk

Behavioral Features

Feature	Type	Why It Matters
sender_in_contacts	Binary	Emails from contacts are almost never spam
user_opened_from_sender	Binary	User has previously engaged with this sender
similar_emails_spam_rate	Numerical	What fraction of users marked similar emails as spam
sender_first_email	Binary	First email from unknown sender is higher risk

Deep Dive — Model Architecture

Multi-Layer Model Strategy

# Layer 1: Rule-based filters (fast, high precision)
#   - Blocklist matching (IP, domain, URL)
#   - SPF/DKIM/DMARC failure rules
#   - Known spam template fingerprints
#   Catches: ~60% of spam at 0% FP rate
#
# Layer 2: Lightweight model (medium speed, broad coverage)
#   - Gradient Boosted Trees (LightGBM)
#   - Header + metadata features only (no content parsing)
#   - Latency: ~5ms
#   Catches: ~25% of remaining spam
#
# Layer 3: Deep content model (slower, highest accuracy)
#   - Fine-tuned DistilBERT for email text
#   - Multi-modal: text + image features (if attachments)
#   - Latency: ~40ms
#   Catches: ~10% of remaining spam (sophisticated attacks)
#
# Final decision: Ensemble of all layers with calibrated thresholds

Model Selection Trade-Offs

Model	Precision	Recall	Latency	Best For
Logistic Regression	High	Medium	0.1ms	V1 baseline, feature validation
LightGBM	High	High	1ms	Production metadata classifier
DistilBERT	Very High	Very High	40ms	Content deep analysis
CNN for images	High	Medium	30ms	Image-based spam detection

Deep Dive — Feedback Loops

Spam detection is an adversarial problem. Spammers constantly adapt, so your model must too.

The Feedback Loop Pipeline

# Feedback signals (ordered by reliability)
#
# 1. User clicks "Report Spam"       -> Strong spam label
# 2. User clicks "Not Spam"          -> Strong ham label
# 3. User never opens email          -> Weak spam signal
# 4. User opens and clicks links     -> Weak ham signal
# 5. Multiple users report same email -> Very strong spam signal
#
# Pipeline:
# [User Action] --> [Label Aggregator] --> [Quality Filter]
#                                              |
#                   [Remove noisy labels]  <----+
#                                              |
#                   [Add to training set] --> [Retrain Model (daily)]
#                                              |
#                   [A/B Test new model] --> [Promote if better]

💡

Label quality matters: Some users mark promotional emails (newsletters they subscribed to) as spam. This is a noisy label. Use consensus: only apply the spam label if multiple users report the same email template as spam, or if the user had no prior relationship with the sender.

Deep Dive — Adversarial Robustness

Spammers actively try to evade your classifier. Discuss these attack vectors and defenses:

Attack	Technique	Defense
Obfuscation	“V1@gra” instead of “Viagra,” invisible Unicode characters	Text normalization, character-level models
Image spam	Embed spam text in images to bypass text classifiers	OCR + image classification model
Snowshoe	Distribute spam across many IPs/domains to avoid blocklists	Cluster analysis on email templates, not individual senders
Compromised accounts	Send spam from legitimate hacked accounts	Behavioral anomaly detection: sudden change in sending patterns
Adversarial text	Add benign text to fool classifiers (“padding attack”)	Focus on structural features, not just content; ensemble methods

Metrics & Evaluation

Offline Metrics

Metric	Target	Why This Target
Precision	> 99.9%	False positives must be extremely rare
Recall	> 98%	Most spam should be caught
F2-Score	> 0.99	Weighted toward recall but precision-constrained
AUC-ROC	> 0.995	Overall discrimination ability
FPR at 98% recall	< 0.01%	Precision constraint at operating point

Online Metrics

Metric	Description	Alert Threshold
User spam reports/day	Users marking inbox emails as spam	> 5% increase triggers investigation
“Not spam” rescues/day	Users moving emails out of spam folder	> 3% increase triggers rollback
Spam folder volume	Total emails going to spam	Sudden spike may indicate false positives
Model latency p99	99th percentile classification time	> 200ms triggers alert

Step 4: Trade-Offs & Extensions

⚖

Precision vs. Recall

Use a high threshold (e.g., 0.95) for sending to spam folder and a very high threshold (0.999) for blocking entirely. Two thresholds give you a “maybe spam” zone for user review.

🔄

Global vs. Personal Models

A global model catches universal spam. A per-user model learns individual preferences (e.g., newsletters). Combine both with a blending layer.

⏰

Real-Time vs. Batch Retraining

Daily retraining is sufficient for most spam. But for zero-day phishing campaigns, you need an online learning component that updates within hours.

🛡

Privacy-Preserving Classification

End-to-end encrypted email services cannot inspect content. Use header-only features and sender reputation, or run classification on-device.

← Previous Design Search Autocomplete with ML Next → Design Ride ETA Prediction