AI Safety Engineering
Master AI safety engineering as a first-class discipline. 50 deep dives across 300 lessons covering safety foundations (hazards, risks, safety cases, functional safety, failure modes, hazard analysis), safety by design (requirements, safe-by-default architectures, defense in depth, kill switches, graceful degradation, containment), alignment & specification (reward design, RLHF safety, constitutional AI, goal misgeneralization, deception risk), robustness & reliability (adversarial robustness, distribution shift, OOD detection, uncertainty, redundancy), safety evaluation & testing (red teaming, dangerous capabilities, jailbreak testing, frontier evals, agentic evals), runtime safety & monitoring (runtime monitors, anomaly detection, circuit breakers, rollback, incident detection), safety governance & ops (safety committee, policies, incident response, post-incident review, dashboards), and frontier AI safety (RSPs, capability thresholds, evals-based commitments, compute governance, dual-use).
AI safety engineering is the discipline of making AI systems behave as intended, fail safely when they do not, and stay within operating envelopes that humans can actually supervise. It sits at the intersection of classical safety engineering (hazard analysis, safety cases, defense in depth, functional safety), machine-learning engineering (robustness, uncertainty, evaluation, monitoring), and the newer alignment literature (specification, reward modeling, goal misgeneralization, deception, frontier-model evaluation). Over the last three years the field has stopped being a research side-project and has become an operational commitment for any organisation running AI at scale. Responsible scaling policies, pre-deployment safety evaluations, runtime safety monitors, and safety cases are now standard fare in frontier-lab system cards, regulator guidance, and customer contracts.
This track is written for the practitioners doing this work day to day: AI safety engineers, ML platform engineers integrating safety controls into pipelines, reliability engineers running AI in production, red-team leads, safety eval authors, safety policy owners, incident-response commanders, and program leads stitching the program together. Every topic explains the underlying safety-engineering discipline (drawing on IEC 61508, ISO 26262, SOTIF, STPA, the NIST AI RMF, the frontier-lab safety literature, and hard-won production experience), the practical artefacts and rituals that operationalise it (safety requirements, safety cases, runbooks, evaluation harnesses, dashboards, incident reviews), and the failure modes where safety engineering breaks down in practice. The goal is that a reader can stand up a credible AI safety-engineering function, integrate it with engineering and governance, and defend it to boards, regulators, and customers.
All Topics
50 AI safety engineering topics organized into 8 categories. Each has 6 detailed lessons with frameworks, templates, and operational patterns.
AI Safety Foundations
Safety Engineering Overview
Master what AI safety engineering actually is. Learn the scope, the lineage from classical safety engineering, the deliverables, and the operating model most mature teams end up with.
6 LessonsAI Hazards & Risks Taxonomy
Build a working taxonomy of AI hazards and risks. Learn the axes (capability-driven, misuse, specification, robustness, systemic) and how to map each to concrete harm scenarios.
6 LessonsSafety Cases for AI
Write a safety case for an AI system. Learn GSN/CAE structure, claims-arguments-evidence discipline, how to handle defeaters, and how to keep the case living.
6 LessonsFunctional Safety & AI
Apply functional-safety standards to AI systems. Learn IEC 61508, ISO 26262, ISO/PAS 21448 SOTIF, and the ISO/IEC TR 5469 bridge for AI in safety-related systems.
6 LessonsAI Failure Modes & Effects
Catalogue AI failure modes and their effects. Learn ML-specific FMEA, the common taxonomies (Raji et al., Hendrycks), and how to attach mitigations to each mode.
6 LessonsHazard Analysis (STPA/HAZOP)
Run systems-theoretic hazard analysis for AI. Learn STPA for AI, HAZOP-style deviations, control-structure diagrams, and how to extract safety requirements from the analysis.
6 LessonsSafety by Design
Safety Requirements Engineering
Write safety requirements that engineers can actually implement. Learn SMART safety requirements, allocation to components, verification method per requirement, and traceability.
6 LessonsSafe-by-Default Architectures
Design AI systems that are safe when they misbehave. Learn safe-default patterns (deny, minimum capability, bounded autonomy), scope shrinking, and the hazardous-default anti-pattern.
6 LessonsDefense in Depth for AI
Layer safety controls so no single failure causes harm. Learn the defense-in-depth lattice, control independence, Swiss-cheese model, and how to avoid correlated-failure traps.
6 LessonsKill Switches & Emergency Stop
Build credible kill switches. Learn the kill-switch hierarchy (per-request, per-session, per-feature, per-model, per-region), authority to press, verification, and drill cadence.
6 LessonsGraceful Degradation
Design AI systems that degrade gracefully under stress. Learn fallback ladders, confidence-triggered fallback, capability stepping-down, and user-visible degradation messaging.
6 LessonsShutdown & Containment
Contain and shut down AI systems safely. Learn capability containment (sandbox, tool restrictions, egress control), shutdown-ability as a design constraint, and corrigibility-in-deployment.
6 LessonsAlignment & Specification
Alignment Problem Overview
Understand the alignment problem as a safety-engineering concern. Learn the specification, robustness, and assurance framing, and the practitioner-relevant subset of the literature.
6 LessonsSpecification & Reward Design
Write specifications and reward signals that capture what you actually want. Learn specification gaming, reward hacking, proxy failures, and the specification-review ritual.
6 LessonsReward Modeling
Train and operate reward models as safety artefacts. Learn preference data collection, reward-model calibration, distribution-shift in the reward, and reward-model auditing.
6 LessonsRLHF Safety
Run RLHF as a safety-aware process. Learn the RLHF pipeline safety controls, refusal behaviours, sycophancy mitigation, and the RLHF eval battery.
6 LessonsConstitutional AI & Safety
Apply Constitutional AI (CAI) and policy-driven alignment. Learn constitution authoring, self-critique pipelines, RLAIF trade-offs, and constitution maintenance.
6 LessonsGoal Misgeneralization
Detect and mitigate goal misgeneralization. Learn the difference from capability misgeneralization, diagnostic eval design, out-of-distribution goal probing, and remediation.
6 LessonsDeception & Scheming Risk
Take deception and scheming seriously as an engineering concern. Learn the deception taxonomy, evals for deceptive behaviour, interpretability probes, and mitigation patterns.
6 LessonsRobustness & Reliability
Adversarial Robustness
Build adversarial robustness into AI systems. Learn the threat models, empirical vs certified robustness, adversarial training, and the robustness-eval cadence.
6 LessonsDistribution Shift Handling
Handle distribution shift in production AI. Learn the shift taxonomy (covariate, label, concept), shift detection, domain-adaptation patterns, and shift-triggered retraining.
6 LessonsOut-of-Distribution Detection
Detect out-of-distribution inputs at inference time. Learn score-based methods, generative methods, evaluation protocols (near-OOD vs far-OOD), and the OOD-to-action path.
6 LessonsUncertainty Quantification
Quantify model uncertainty so downstream systems can act on it. Learn aleatoric vs epistemic, calibration, conformal prediction, and the uncertainty-to-policy mapping.
6 LessonsRedundancy & Fault Tolerance
Design AI systems with redundancy and fault tolerance. Learn ensemble-as-redundancy, disagreement-triggered escalation, N-version strategies, and ensemble correlated-failure traps.
6 LessonsReliability Engineering for AI
Apply SRE principles to AI systems. Learn AI-specific SLIs/SLOs/SLAs, error budgets that include quality and safety, AI-aware post-incident reviews, and AI reliability runbooks.
6 LessonsSafety Evaluation & Testing
Safety Evaluation Frameworks
Pick and run safety-evaluation frameworks. Learn the eval taxonomy, framework survey (HELM, Eleuther, Inspect, UK AISI patterns), and the custom-eval design ritual.
6 LessonsSafety Red Teaming
Run an AI safety red-team program. Learn recruitment, campaign structure, attack library maintenance, external red-team vendors, and the path from finding to fix.
6 LessonsDangerous Capability Evals
Run dangerous-capability evaluations. Learn the canonical categories (CBRN uplift, cyber, autonomy, persuasion), eval design rules, elicitation discipline, and result disclosure.
6 LessonsJailbreak & Prompt Injection Testing
Test for jailbreaks and prompt injection. Learn direct vs indirect injection, universal adversarial suffixes, tooling (Garak, PyRIT), defence verification, and regression discipline.
6 LessonsFrontier-Model Safety Evals
Run the safety-eval suite frontier labs publish. Learn the canonical suites (Anthropic, OpenAI, DeepMind, UK AISI), capability elicitation, and how to read frontier safety reports.
6 LessonsAgentic & Long-Horizon Evals
Evaluate agentic and long-horizon AI. Learn task-based evals (METR, SWE-bench), autonomy benchmarks, long-horizon reliability tests, and the agent-eval harness pattern.
6 LessonsSafety Benchmarks Landscape
Navigate the public safety-benchmark landscape. Learn the canonical benchmarks (HarmBench, ToxicChat, TruthfulQA, MLCommons AILuminate, AIR-Bench), their limits, and how to pick a credible basket.
6 LessonsRuntime Safety & Monitoring
Runtime Safety Monitors
Build runtime safety monitors for AI. Learn the monitor taxonomy, policy-check monitors, output-filter monitors, statistical monitors, and the monitor-of-monitors pattern.
6 LessonsAnomaly Detection in Production
Detect anomalous AI behaviour in production. Learn baselines, multivariate anomaly detectors, seasonality handling, alert tuning, and the anomaly-to-investigation path.
6 LessonsCircuit Breakers & Safe Fallbacks
Break the circuit before harm spreads. Learn per-feature circuit breakers, automatic vs human-triggered breakers, the half-open retry pattern, and breaker-drill cadence.
6 LessonsRollback & Canary Patterns
Ship AI changes with safe rollback and canary. Learn canary slicing, quality + safety canary gates, rollback triggers, and the rollback-versus-roll-forward decision.
6 LessonsIncident Detection & Triage
Detect and triage AI incidents quickly. Learn AI incident definitions, severity ladders, intake pathways, on-call rotations, and the link to user comms and regulator reporting.
6 LessonsSafety Telemetry & Observability
Instrument AI systems for safety observability. Learn the telemetry schema, PII-safe logging, sampling, the safety-data lake, and dashboards for different audiences.
6 LessonsSafety Governance & Ops
Safety Committee & Board
Set up a safety committee that actually governs. Learn membership, authority, decision rights, cadence, escalation from working groups, and the link to the board.
6 LessonsAI Safety Policies
Write AI safety policies engineers can follow. Learn the policy hierarchy (principles → policy → standard → procedure), exception handling, review cadence, and policy-to-control mapping.
6 LessonsAI Incident Response
Run AI incident response. Learn the IR phases, command structure, tech track, comms track, user and regulator notification, and the handoff to post-incident review.
6 LessonsPost-Incident Review
Run blameless post-incident reviews that actually produce fixes. Learn the PIR template, contributing-factor analysis, action-item discipline, tracking to closure, and annual pattern review.
6 LessonsSafety Dashboards & Reporting
Build safety dashboards for every audience. Learn the KPI set, the engineering / product / committee / board dashboards, and the reporting cadence that survives budget cycles.
6 LessonsSafety-Focused Model Cards
Author safety-focused model and system cards. Learn the safety-section template, known-limitations discipline, deployment-condition statements, and versioning across releases.
6 LessonsFrontier AI Safety
Frontier AI Risk Overview
Get up to speed on frontier-AI risk. Learn the risk categories regulators and labs take seriously, the lineage from Bostrom / Russell, and the engineering-relevant subset.
6 LessonsResponsible Scaling Policies
Read and (if you work at a lab) author a Responsible Scaling Policy. Learn the RSP structure, capability tiers, commitments, evaluation discipline, and external assurance.
6 LessonsCapability Thresholds & Red Lines
Define capability thresholds and red lines. Learn threshold authoring, operationalising thresholds with evals, the red-line discipline, and the link to pause commitments.
6 LessonsEvals-Based Safety Commitments
Make and keep evals-based safety commitments. Learn commitment design, the eval-credibility problem, external evaluation, and commitment reporting to regulators and the public.
6 LessonsCompute Governance & Safety
Understand compute governance as a safety lever. Learn compute-thresholds in regulation, chip export controls, the compute-provider value chain, and KYC for compute.
6 LessonsDual-Use & Misuse Prevention
Prevent and respond to misuse of dual-use AI. Learn the misuse taxonomy, pre-release misuse evals, deployment controls (rate-limits, KYC, monitoring), and the misuse-response playbook.
6 Lessons
Lilly Tech Systems