Deception & Scheming Evaluation
Evaluate deception and scheming behaviour as practiced in frontier-lab safety teams and AISI work. Learn the conceptual taxonomy (sycophancy, strategic deception, alignment-faking, sandbagging), eval methodologies (sandboxed setups with hidden observers, behavioural probes), interpretability probes (research direction), the disclosure norm in system cards, and the engineering implications.
6
Lessons
📋
Templates
✅
Practitioner-Ready
100%
Free
Lessons in This Topic
Work through these 6 lessons in order, or jump to whichever is most relevant.
Lilly Tech Systems