Jailbreak Taxonomy

Learn the jailbreak taxonomy as a defender so eval coverage matches the threat. Cover persona / role-play, hypothetical / fictional framings, encoding-based, multi-turn priming, indirect / context-based, latent-space (universal adversarial suffix style), and image / multimodal jailbreaks. For each, learn the conceptual signature, why models are susceptible, and the defence layer that addresses it.

6
Lessons
📋
Templates
Practitioner-Ready
100%
Free

Lessons in This Topic

Work through these 6 lessons in order, or jump to whichever is most relevant.