Prompt Attack Evaluation

Evaluate prompt-attack robustness credibly. Learn the canonical benchmarks (HarmBench, AdvBench, JailbreakBench, MLCommons AILuminate, AIR-Bench), the eval-set rotation discipline (do not let the model train on your benchmark), scoring rubrics (refusal / partial-refusal / unsafe), regression discipline across releases, slice eval per language / category, and the failure mode of single-benchmark thinking.

Start Topic → View All Lessons

Lessons

📋

Templates

✅

Practitioner-Ready

100%

Free