Jailbreak Taxonomy

Learn the jailbreak taxonomy as a defender so eval coverage matches the threat. Cover persona / role-play, hypothetical / fictional framings, encoding-based, multi-turn priming, indirect / context-based, latent-space (universal adversarial suffix style), and image / multimodal jailbreaks. For each, learn the conceptual signature, why models are susceptible, and the defence layer that addresses it.