Deception & Scheming Risk

Take deception and scheming seriously as an engineering concern rather than a thought experiment. Learn the deception taxonomy (sandbagging, strategic deception, scheming), the evaluation approaches that try to detect it (hidden-scratchpad probes, blue-team / red-team evals, mechanistic interpretability probes), and the mitigation and escalation patterns when signal appears.

6
Lessons
📋
Templates
Practitioner-Ready
100%
Free

Lessons in This Topic

Work through these 6 lessons in order, or jump to whichever is most relevant.