AI Content Moderation Policy

Master AI content moderation policy as a first-class discipline. 50 deep dives across 300 lessons covering moderation foundations (policy vs law, speech trade-offs, actors, history, the moderation stack), policy design (writing standards, taxonomy, edge cases, versioning, contextual policy, localization), harmful content categories (CSAM, violence / extremism, hate speech, harassment, mis & disinformation, NSFW, self-harm), AI / ML moderation systems (classifier stack, LLM-based moderation, multimodal, hashing & fingerprinting, proactive detection, adversarial evasion, moderation evaluation), human review & ops (workflow, reviewer wellness, queue management, inter-rater agreement, escalation, vendor management), user-facing mechanisms (reporting, appeals, notifications, transparency, counter-speech, creator channels), trust & safety operations (org structure, metrics & SLAs, crisis response, T&S incident response, coordinated harm, enforcement actions), and governance / law / transparency (EU DSA, Section 230 / CDA / NetzDG / UK Online Safety Act, transparency reports, law-enforcement requests, oversight boards, regulator interactions).

50Topics
300Lessons
8Categories
100%Free

AI content moderation policy is the discipline of deciding what stays up on a platform, what comes down, what gets labelled, and how those decisions are explained — and proving that the system is consistent, lawful, and humane. It sits at the intersection of platform-policy writing, applied ethics, intermediary-liability law (Section 230, the EU DSA, the UK Online Safety Act, NetzDG, India IT Rules), the trust & safety operations stack (queues, reviewers, escalations, appeals), the AI / ML moderation stack (toxicity / hate / NSFW classifiers, LLM moderation, hashing, multimodal), and the transparency machinery that runs in production (statements of reasons, transparency reports, oversight boards, regulator inspections). Over the last five years it has stopped being a back-office function and has become an operating commitment subject to direct regulatory duties, multi-million-euro fines, civil-society scrutiny, and journalist investigation.

This track is written for the practitioners doing this work day to day: trust & safety policy leads, T&S operations leaders, ML engineers building moderation systems, integrity / civic teams, appeals and oversight-liaison teams, lawyers translating intermediary-liability law into operational controls, and program leads stitching the program together. Every topic explains the underlying moderation discipline (drawing on Gillespie, Roberts, Klonick, Suzor, Douek and the canonical T&S literature, Santa Clara Principles, the Christchurch Call, the DSA, and hard-won production experience), the practical artefacts and rituals that operationalise it (policy specs, reviewer guidance, runbooks, transparency reports, audit packets), and the failure modes where moderation work quietly breaks down in practice. The aim is that a reader can stand up a credible content-moderation function, integrate it with engineering and governance, and defend it to boards, regulators, journalists, oversight boards, and the users the system actually affects.

All Topics

50 AI content moderation topics organized into 8 categories. Each has 6 detailed lessons with frameworks, templates, and operational patterns.

Content Moderation Foundations

Policy Design

Harmful Content Categories

🛡

CSAM & Child-Safety Content Policy

Engineer for child-safety content policy. Learn the legal landscape, NCMEC reporting, hashing-based detection (PhotoDNA), grooming detection, and the strict-liability operating posture.

6 Lessons

Violence & Violent Extremism

Engineer for violent-extremism content policy. Learn the legal landscape, GIFCT hash-sharing, glorification vs documentation, and the crisis-response posture for live-streamed attacks.

6 Lessons
🗡

Hate Speech & Slurs

Write hate-speech policy that holds. Learn the protected-characteristic list, slurs vs reclamation, contextual irony, the bias problem in hate-speech classifiers, and reviewer guidance.

6 Lessons
🚫

Harassment & Bullying

Write harassment and bullying policy. Learn the targeting requirement, repeated-conduct vs single-incident, doxxing, dogpiling, public-figure carve-outs, and the victim-reporter UX.

6 Lessons
📢

Mis & Disinformation

Engineer for mis & disinformation policy. Learn the misinformation-vs-disinformation split, election integrity, health misinformation, AI-generated content, fact-checking, and labels.

6 Lessons
🔒

NSFW & Adult Content Policy

Engineer for adult-content policy. Learn the legal landscape, age-gating, consent and non-consensual intimate imagery (NCII), creator-side adult content, and the surface-level policy split.

6 Lessons
💐

Self-Harm & Suicide Content Policy

Engineer for self-harm and suicide content policy. Learn safe-messaging guidelines, crisis-response interstitials, recovery-content carve-outs, and reviewer-wellness specifics.

6 Lessons

AI / ML Moderation Systems

🧠

Moderation Classifier Stack

Operate the moderation classifier stack. Learn toxicity, hate, NSFW, violence and self-harm classifiers, calibration per surface, threshold tuning, and the eval-vs-policy alignment.

6 Lessons
🧠

LLM-Based Moderation

Use LLMs for moderation. Learn the OpenAI Moderation API and friends, prompt-based classification, in-context policy delivery, the cost / latency profile, and the failure modes.

6 Lessons
📷

Multimodal Moderation (Image, Video, Audio)

Moderate multimodal content. Learn image classifiers, video sampling and live-stream moderation, audio transcription and classification, OCR-based moderation, and synthetic-media detection.

6 Lessons
🔐

Hashing & Fingerprinting

Use hashing for known-bad content. Learn perceptual hashing (PhotoDNA, PDQ, TMK), audio fingerprinting, hash-sharing programs (NCMEC, GIFCT, Tech Coalition), and false-positive handling.

6 Lessons
🔍

Proactive Detection at Scale

Detect violations before users report them. Learn the proactive-detection pipeline, sampling strategies, recall vs precision trade-offs at scale, and the proactive-vs-reactive ratio metric.

6 Lessons
🤖

Adversarial Evasion & Cat-and-Mouse

Defend moderation against adversarial evasion. Learn obfuscation techniques (leetspeak, image edits, splice, shadow accounts), red-team programs, and the adversarial-cycle metric.

6 Lessons
📊

Moderation Evaluation & Benchmarks

Evaluate moderation systems credibly. Learn precision / recall / F1 per policy area, slice-eval by language and demographic, golden-set design, drift detection, and external benchmarks.

6 Lessons

Human Review & Operations

User-Facing Mechanisms

Trust & Safety Operations

Governance, Law & Transparency