Intermediate

Design Email Spam Detection

A complete walkthrough of designing an ML-powered email spam detection system. Learn how to handle adversarial attacks, build feedback loops, and balance precision and recall in a system where false positives cost real money.

Step 1: Clarify Requirements

📝
Key clarifications:
  • “Scale?” — 1B emails/day, 50K emails/second peak
  • “What counts as spam?” — Unsolicited commercial email, phishing, malware, scams
  • “Latency requirements?” — Must classify before delivery, under 200ms
  • “What matters more: precision or recall?” — High precision is critical (false positives = lost legitimate email)
  • “Do we handle images and attachments?” — Yes, multi-modal analysis

ML Problem Formulation

# Problem formulation
# Business goal:    Protect users from unwanted and dangerous email
# ML task:          Binary classification (spam vs. not spam)
# Input:            Email content + metadata + sender reputation
# Output:           Spam probability [0, 1] + category (promo, phishing, scam)
# Training data:    Historical emails with spam/ham labels + user feedback
# Loss function:    Weighted binary cross-entropy (higher weight on FP)
# Key constraint:   False positive rate must be < 0.01%
Critical insight: In spam detection, a false positive (legitimate email marked as spam) is much worse than a false negative (spam reaching inbox). A false positive can cause a user to miss an important business email or job offer. Discuss this asymmetry explicitly with the interviewer.

Step 2: High-Level Architecture

# Architecture: Multi-Layer Defense
#
# [Incoming Email]
#   |
# [Layer 1: IP/Domain Reputation] --> Block known bad senders (rule-based, ~5ms)
#   |
# [Layer 2: Content Analysis]     --> ML classification (features + model, ~50ms)
#   |
# [Layer 3: Link/Attachment Scan] --> URL reputation + malware scan (~100ms)
#   |
# [Decision Engine]               --> Combine scores, apply thresholds
#   |
# [Inbox / Spam Folder / Quarantine / Block]
#
# Feedback loop:
# [User marks spam/not spam] --> [Label Pipeline] --> [Model Retraining]

Step 3: Deep Dive — Feature Engineering

Email Header Features

FeatureTypeWhy It Matters
sender_domain_ageNumericalNewly registered domains are often used for spam campaigns
spf_dkim_dmarc_passBinary (3)Email authentication failure is a strong spam signal
sender_reputation_scoreNumericalAggregate score based on historical spam rate from this sender
num_recipientsNumericalMass emails sent to many recipients suggest spam
reply_to_mismatchBinaryReply-To different from From header is suspicious
received_hop_countNumericalUnusual routing can indicate spam relay

Content Features

FeatureTypeWhy It Matters
text_embeddingDense vectorSemantic representation of email body (BERT or distilled model)
subject_embeddingDense vectorSubject lines like “You won!” or “Act now” are spam signals
url_countNumericalExcessive links suggest promotional or phishing email
url_domain_reputationNumericalLinks to known malicious domains
html_to_text_ratioNumericalHigh HTML ratio with hidden text is a spam technique
image_to_text_ratioNumericalImage-only emails bypass text-based filters
urgency_keywordsNumericalCount of urgency words: “urgent,” “immediate,” “limited time”
attachment_typeCategoricalExecutable attachments (.exe, .js) are high-risk

Behavioral Features

FeatureTypeWhy It Matters
sender_in_contactsBinaryEmails from contacts are almost never spam
user_opened_from_senderBinaryUser has previously engaged with this sender
similar_emails_spam_rateNumericalWhat fraction of users marked similar emails as spam
sender_first_emailBinaryFirst email from unknown sender is higher risk

Deep Dive — Model Architecture

Multi-Layer Model Strategy

# Layer 1: Rule-based filters (fast, high precision)
#   - Blocklist matching (IP, domain, URL)
#   - SPF/DKIM/DMARC failure rules
#   - Known spam template fingerprints
#   Catches: ~60% of spam at 0% FP rate
#
# Layer 2: Lightweight model (medium speed, broad coverage)
#   - Gradient Boosted Trees (LightGBM)
#   - Header + metadata features only (no content parsing)
#   - Latency: ~5ms
#   Catches: ~25% of remaining spam
#
# Layer 3: Deep content model (slower, highest accuracy)
#   - Fine-tuned DistilBERT for email text
#   - Multi-modal: text + image features (if attachments)
#   - Latency: ~40ms
#   Catches: ~10% of remaining spam (sophisticated attacks)
#
# Final decision: Ensemble of all layers with calibrated thresholds

Model Selection Trade-Offs

ModelPrecisionRecallLatencyBest For
Logistic RegressionHighMedium0.1msV1 baseline, feature validation
LightGBMHighHigh1msProduction metadata classifier
DistilBERTVery HighVery High40msContent deep analysis
CNN for imagesHighMedium30msImage-based spam detection

Deep Dive — Feedback Loops

Spam detection is an adversarial problem. Spammers constantly adapt, so your model must too.

The Feedback Loop Pipeline

# Feedback signals (ordered by reliability)
#
# 1. User clicks "Report Spam"       -> Strong spam label
# 2. User clicks "Not Spam"          -> Strong ham label
# 3. User never opens email          -> Weak spam signal
# 4. User opens and clicks links     -> Weak ham signal
# 5. Multiple users report same email -> Very strong spam signal
#
# Pipeline:
# [User Action] --> [Label Aggregator] --> [Quality Filter]
#                                              |
#                   [Remove noisy labels]  <----+
#                                              |
#                   [Add to training set] --> [Retrain Model (daily)]
#                                              |
#                   [A/B Test new model] --> [Promote if better]
💡
Label quality matters: Some users mark promotional emails (newsletters they subscribed to) as spam. This is a noisy label. Use consensus: only apply the spam label if multiple users report the same email template as spam, or if the user had no prior relationship with the sender.

Deep Dive — Adversarial Robustness

Spammers actively try to evade your classifier. Discuss these attack vectors and defenses:

AttackTechniqueDefense
Obfuscation“V1@gra” instead of “Viagra,” invisible Unicode charactersText normalization, character-level models
Image spamEmbed spam text in images to bypass text classifiersOCR + image classification model
SnowshoeDistribute spam across many IPs/domains to avoid blocklistsCluster analysis on email templates, not individual senders
Compromised accountsSend spam from legitimate hacked accountsBehavioral anomaly detection: sudden change in sending patterns
Adversarial textAdd benign text to fool classifiers (“padding attack”)Focus on structural features, not just content; ensemble methods

Metrics & Evaluation

Offline Metrics

MetricTargetWhy This Target
Precision> 99.9%False positives must be extremely rare
Recall> 98%Most spam should be caught
F2-Score> 0.99Weighted toward recall but precision-constrained
AUC-ROC> 0.995Overall discrimination ability
FPR at 98% recall< 0.01%Precision constraint at operating point

Online Metrics

MetricDescriptionAlert Threshold
User spam reports/dayUsers marking inbox emails as spam> 5% increase triggers investigation
“Not spam” rescues/dayUsers moving emails out of spam folder> 3% increase triggers rollback
Spam folder volumeTotal emails going to spamSudden spike may indicate false positives
Model latency p9999th percentile classification time> 200ms triggers alert

Step 4: Trade-Offs & Extensions

Precision vs. Recall

Use a high threshold (e.g., 0.95) for sending to spam folder and a very high threshold (0.999) for blocking entirely. Two thresholds give you a “maybe spam” zone for user review.

🔄

Global vs. Personal Models

A global model catches universal spam. A per-user model learns individual preferences (e.g., newsletters). Combine both with a blending layer.

Real-Time vs. Batch Retraining

Daily retraining is sufficient for most spam. But for zero-day phishing campaigns, you need an online learning component that updates within hours.

🛡

Privacy-Preserving Classification

End-to-end encrypted email services cannot inspect content. Use header-only features and sender reputation, or run classification on-device.