Deception & Scheming Evaluation

Evaluate deception and scheming behaviour as practiced in frontier-lab safety teams and AISI work. Learn the conceptual taxonomy (sycophancy, strategic deception, alignment-faking, sandbagging), eval methodologies (sandboxed setups with hidden observers, behavioural probes), interpretability probes (research direction), the disclosure norm in system cards, and the engineering implications.

6
Lessons
📋
Templates
Practitioner-Ready
100%
Free

Lessons in This Topic

Work through these 6 lessons in order, or jump to whichever is most relevant.