LLM-Based Moderation

Use LLMs for moderation. Learn the OpenAI Moderation API, Perspective API, Llama Guard, Anthropic policy classifiers and similar, prompt-based classification with policy embedded in context, in-context policy delivery for hard cases, the cost / latency profile vs purpose-built classifiers, and the failure modes (jailbreak, drift, overconfident hallucination).