Design Email Spam Detection
A complete walkthrough of designing an ML-powered email spam detection system. Learn how to handle adversarial attacks, build feedback loops, and balance precision and recall in a system where false positives cost real money.
Step 1: Clarify Requirements
- “Scale?” — 1B emails/day, 50K emails/second peak
- “What counts as spam?” — Unsolicited commercial email, phishing, malware, scams
- “Latency requirements?” — Must classify before delivery, under 200ms
- “What matters more: precision or recall?” — High precision is critical (false positives = lost legitimate email)
- “Do we handle images and attachments?” — Yes, multi-modal analysis
ML Problem Formulation
# Problem formulation
# Business goal: Protect users from unwanted and dangerous email
# ML task: Binary classification (spam vs. not spam)
# Input: Email content + metadata + sender reputation
# Output: Spam probability [0, 1] + category (promo, phishing, scam)
# Training data: Historical emails with spam/ham labels + user feedback
# Loss function: Weighted binary cross-entropy (higher weight on FP)
# Key constraint: False positive rate must be < 0.01%
Step 2: High-Level Architecture
# Architecture: Multi-Layer Defense
#
# [Incoming Email]
# |
# [Layer 1: IP/Domain Reputation] --> Block known bad senders (rule-based, ~5ms)
# |
# [Layer 2: Content Analysis] --> ML classification (features + model, ~50ms)
# |
# [Layer 3: Link/Attachment Scan] --> URL reputation + malware scan (~100ms)
# |
# [Decision Engine] --> Combine scores, apply thresholds
# |
# [Inbox / Spam Folder / Quarantine / Block]
#
# Feedback loop:
# [User marks spam/not spam] --> [Label Pipeline] --> [Model Retraining]
Step 3: Deep Dive — Feature Engineering
Email Header Features
| Feature | Type | Why It Matters |
|---|---|---|
| sender_domain_age | Numerical | Newly registered domains are often used for spam campaigns |
| spf_dkim_dmarc_pass | Binary (3) | Email authentication failure is a strong spam signal |
| sender_reputation_score | Numerical | Aggregate score based on historical spam rate from this sender |
| num_recipients | Numerical | Mass emails sent to many recipients suggest spam |
| reply_to_mismatch | Binary | Reply-To different from From header is suspicious |
| received_hop_count | Numerical | Unusual routing can indicate spam relay |
Content Features
| Feature | Type | Why It Matters |
|---|---|---|
| text_embedding | Dense vector | Semantic representation of email body (BERT or distilled model) |
| subject_embedding | Dense vector | Subject lines like “You won!” or “Act now” are spam signals |
| url_count | Numerical | Excessive links suggest promotional or phishing email |
| url_domain_reputation | Numerical | Links to known malicious domains |
| html_to_text_ratio | Numerical | High HTML ratio with hidden text is a spam technique |
| image_to_text_ratio | Numerical | Image-only emails bypass text-based filters |
| urgency_keywords | Numerical | Count of urgency words: “urgent,” “immediate,” “limited time” |
| attachment_type | Categorical | Executable attachments (.exe, .js) are high-risk |
Behavioral Features
| Feature | Type | Why It Matters |
|---|---|---|
| sender_in_contacts | Binary | Emails from contacts are almost never spam |
| user_opened_from_sender | Binary | User has previously engaged with this sender |
| similar_emails_spam_rate | Numerical | What fraction of users marked similar emails as spam |
| sender_first_email | Binary | First email from unknown sender is higher risk |
Deep Dive — Model Architecture
Multi-Layer Model Strategy
# Layer 1: Rule-based filters (fast, high precision)
# - Blocklist matching (IP, domain, URL)
# - SPF/DKIM/DMARC failure rules
# - Known spam template fingerprints
# Catches: ~60% of spam at 0% FP rate
#
# Layer 2: Lightweight model (medium speed, broad coverage)
# - Gradient Boosted Trees (LightGBM)
# - Header + metadata features only (no content parsing)
# - Latency: ~5ms
# Catches: ~25% of remaining spam
#
# Layer 3: Deep content model (slower, highest accuracy)
# - Fine-tuned DistilBERT for email text
# - Multi-modal: text + image features (if attachments)
# - Latency: ~40ms
# Catches: ~10% of remaining spam (sophisticated attacks)
#
# Final decision: Ensemble of all layers with calibrated thresholds
Model Selection Trade-Offs
| Model | Precision | Recall | Latency | Best For |
|---|---|---|---|---|
| Logistic Regression | High | Medium | 0.1ms | V1 baseline, feature validation |
| LightGBM | High | High | 1ms | Production metadata classifier |
| DistilBERT | Very High | Very High | 40ms | Content deep analysis |
| CNN for images | High | Medium | 30ms | Image-based spam detection |
Deep Dive — Feedback Loops
Spam detection is an adversarial problem. Spammers constantly adapt, so your model must too.
The Feedback Loop Pipeline
# Feedback signals (ordered by reliability)
#
# 1. User clicks "Report Spam" -> Strong spam label
# 2. User clicks "Not Spam" -> Strong ham label
# 3. User never opens email -> Weak spam signal
# 4. User opens and clicks links -> Weak ham signal
# 5. Multiple users report same email -> Very strong spam signal
#
# Pipeline:
# [User Action] --> [Label Aggregator] --> [Quality Filter]
# |
# [Remove noisy labels] <----+
# |
# [Add to training set] --> [Retrain Model (daily)]
# |
# [A/B Test new model] --> [Promote if better]
Deep Dive — Adversarial Robustness
Spammers actively try to evade your classifier. Discuss these attack vectors and defenses:
| Attack | Technique | Defense |
|---|---|---|
| Obfuscation | “V1@gra” instead of “Viagra,” invisible Unicode characters | Text normalization, character-level models |
| Image spam | Embed spam text in images to bypass text classifiers | OCR + image classification model |
| Snowshoe | Distribute spam across many IPs/domains to avoid blocklists | Cluster analysis on email templates, not individual senders |
| Compromised accounts | Send spam from legitimate hacked accounts | Behavioral anomaly detection: sudden change in sending patterns |
| Adversarial text | Add benign text to fool classifiers (“padding attack”) | Focus on structural features, not just content; ensemble methods |
Metrics & Evaluation
Offline Metrics
| Metric | Target | Why This Target |
|---|---|---|
| Precision | > 99.9% | False positives must be extremely rare |
| Recall | > 98% | Most spam should be caught |
| F2-Score | > 0.99 | Weighted toward recall but precision-constrained |
| AUC-ROC | > 0.995 | Overall discrimination ability |
| FPR at 98% recall | < 0.01% | Precision constraint at operating point |
Online Metrics
| Metric | Description | Alert Threshold |
|---|---|---|
| User spam reports/day | Users marking inbox emails as spam | > 5% increase triggers investigation |
| “Not spam” rescues/day | Users moving emails out of spam folder | > 3% increase triggers rollback |
| Spam folder volume | Total emails going to spam | Sudden spike may indicate false positives |
| Model latency p99 | 99th percentile classification time | > 200ms triggers alert |
Step 4: Trade-Offs & Extensions
Precision vs. Recall
Use a high threshold (e.g., 0.95) for sending to spam folder and a very high threshold (0.999) for blocking entirely. Two thresholds give you a “maybe spam” zone for user review.
Global vs. Personal Models
A global model catches universal spam. A per-user model learns individual preferences (e.g., newsletters). Combine both with a blending layer.
Real-Time vs. Batch Retraining
Daily retraining is sufficient for most spam. But for zero-day phishing campaigns, you need an online learning component that updates within hours.
Privacy-Preserving Classification
End-to-end encrypted email services cannot inspect content. Use header-only features and sender reputation, or run classification on-device.
Lilly Tech Systems