Deception & Scheming Evaluation

Evaluate deception and scheming behaviour as practiced in frontier-lab safety teams and AISI work. Learn the conceptual taxonomy (sycophancy, strategic deception, alignment-faking, sandbagging), eval methodologies (sandboxed setups with hidden observers, behavioural probes), interpretability probes (research direction), the disclosure norm in system cards, and the engineering implications.