Deception & Scheming Risk
Take deception and scheming seriously as an engineering concern rather than a thought experiment. Learn the deception taxonomy (sandbagging, strategic deception, scheming), the evaluation approaches that try to detect it (hidden-scratchpad probes, blue-team / red-team evals, mechanistic interpretability probes), and the mitigation and escalation patterns when signal appears.
6
Lessons
📋
Templates
✅
Practitioner-Ready
100%
Free
Lessons in This Topic
Work through these 6 lessons in order, or jump to whichever is most relevant.
Lilly Tech Systems