Autonomy & Self-Replication Evaluation

Evaluate autonomy and self-replication capability. Learn the canonical task families (METR-style autonomy tasks, self-replication and resource-acquisition tasks, long-horizon coding and operations), success-rate measurement, the sandbox-containment requirement, the link to RSP / responsible-scaling thresholds, and the public reporting expected of frontier labs.

6
Lessons
📋
Templates
Practitioner-Ready
100%
Free

Lessons in This Topic

Work through these 6 lessons in order, or jump to whichever is most relevant.