Autonomy & Self-Replication Evaluation
Evaluate autonomy and self-replication capability. Learn the canonical task families (METR-style autonomy tasks, self-replication and resource-acquisition tasks, long-horizon coding and operations), success-rate measurement, the sandbox-containment requirement, the link to RSP / responsible-scaling thresholds, and the public reporting expected of frontier labs.
6
Lessons
📋
Templates
✅
Practitioner-Ready
100%
Free
Lessons in This Topic
Work through these 6 lessons in order, or jump to whichever is most relevant.
Lilly Tech Systems