AI Safety Engineering

Master AI safety engineering as a first-class discipline. 50 deep dives across 300 lessons covering safety foundations (hazards, risks, safety cases, functional safety, failure modes, hazard analysis), safety by design (requirements, safe-by-default architectures, defense in depth, kill switches, graceful degradation, containment), alignment & specification (reward design, RLHF safety, constitutional AI, goal misgeneralization, deception risk), robustness & reliability (adversarial robustness, distribution shift, OOD detection, uncertainty, redundancy), safety evaluation & testing (red teaming, dangerous capabilities, jailbreak testing, frontier evals, agentic evals), runtime safety & monitoring (runtime monitors, anomaly detection, circuit breakers, rollback, incident detection), safety governance & ops (safety committee, policies, incident response, post-incident review, dashboards), and frontier AI safety (RSPs, capability thresholds, evals-based commitments, compute governance, dual-use).

50Topics
300Lessons
8Categories
100%Free

AI safety engineering is the discipline of making AI systems behave as intended, fail safely when they do not, and stay within operating envelopes that humans can actually supervise. It sits at the intersection of classical safety engineering (hazard analysis, safety cases, defense in depth, functional safety), machine-learning engineering (robustness, uncertainty, evaluation, monitoring), and the newer alignment literature (specification, reward modeling, goal misgeneralization, deception, frontier-model evaluation). Over the last three years the field has stopped being a research side-project and has become an operational commitment for any organisation running AI at scale. Responsible scaling policies, pre-deployment safety evaluations, runtime safety monitors, and safety cases are now standard fare in frontier-lab system cards, regulator guidance, and customer contracts.

This track is written for the practitioners doing this work day to day: AI safety engineers, ML platform engineers integrating safety controls into pipelines, reliability engineers running AI in production, red-team leads, safety eval authors, safety policy owners, incident-response commanders, and program leads stitching the program together. Every topic explains the underlying safety-engineering discipline (drawing on IEC 61508, ISO 26262, SOTIF, STPA, the NIST AI RMF, the frontier-lab safety literature, and hard-won production experience), the practical artefacts and rituals that operationalise it (safety requirements, safety cases, runbooks, evaluation harnesses, dashboards, incident reviews), and the failure modes where safety engineering breaks down in practice. The goal is that a reader can stand up a credible AI safety-engineering function, integrate it with engineering and governance, and defend it to boards, regulators, and customers.

All Topics

50 AI safety engineering topics organized into 8 categories. Each has 6 detailed lessons with frameworks, templates, and operational patterns.

AI Safety Foundations

Safety by Design

Alignment & Specification

Robustness & Reliability

Safety Evaluation & Testing

📊

Safety Evaluation Frameworks

Pick and run safety-evaluation frameworks. Learn the eval taxonomy, framework survey (HELM, Eleuther, Inspect, UK AISI patterns), and the custom-eval design ritual.

6 Lessons
🤖

Safety Red Teaming

Run an AI safety red-team program. Learn recruitment, campaign structure, attack library maintenance, external red-team vendors, and the path from finding to fix.

6 Lessons

Dangerous Capability Evals

Run dangerous-capability evaluations. Learn the canonical categories (CBRN uplift, cyber, autonomy, persuasion), eval design rules, elicitation discipline, and result disclosure.

6 Lessons
🔐

Jailbreak & Prompt Injection Testing

Test for jailbreaks and prompt injection. Learn direct vs indirect injection, universal adversarial suffixes, tooling (Garak, PyRIT), defence verification, and regression discipline.

6 Lessons
🌟

Frontier-Model Safety Evals

Run the safety-eval suite frontier labs publish. Learn the canonical suites (Anthropic, OpenAI, DeepMind, UK AISI), capability elicitation, and how to read frontier safety reports.

6 Lessons
🤖

Agentic & Long-Horizon Evals

Evaluate agentic and long-horizon AI. Learn task-based evals (METR, SWE-bench), autonomy benchmarks, long-horizon reliability tests, and the agent-eval harness pattern.

6 Lessons
📚

Safety Benchmarks Landscape

Navigate the public safety-benchmark landscape. Learn the canonical benchmarks (HarmBench, ToxicChat, TruthfulQA, MLCommons AILuminate, AIR-Bench), their limits, and how to pick a credible basket.

6 Lessons

Runtime Safety & Monitoring

Safety Governance & Ops

Frontier AI Safety