Advanced

AWS Backup & Recovery Safety Nets

Even with the best guardrails, accidents can happen. Build automated backup strategies and recovery procedures so you can restore any accidentally deleted resource within minutes.

AWS Backup: Centralized Backup Management

AWS Backup is a fully managed service that centralizes and automates backups across AWS services. It supports EC2, EBS, RDS, DynamoDB, EFS, S3, and more. For AI agent safety, AWS Backup is your last line of defense.

Bash — Create an AWS Backup vault and plan
# Create a backup vault (encrypted storage for backups)
aws backup create-backup-vault \
  --backup-vault-name production-vault \
  --encryption-key-arn arn:aws:kms:us-east-1:123456789012:key/mrk-xxxx

# Create a backup plan with daily backups and 35-day retention
aws backup create-backup-plan --backup-plan '{
  "BackupPlanName": "production-daily",
  "Rules": [
    {
      "RuleName": "DailyBackup",
      "TargetBackupVaultName": "production-vault",
      "ScheduleExpression": "cron(0 3 * * ? *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 180,
      "Lifecycle": {
        "DeleteAfterDays": 35
      },
      "CopyActions": [
        {
          "DestinationBackupVaultArn": "arn:aws:backup:us-west-2:123456789012:backup-vault:dr-vault",
          "Lifecycle": {
            "DeleteAfterDays": 90
          }
        }
      ]
    },
    {
      "RuleName": "HourlyBackup",
      "TargetBackupVaultName": "production-vault",
      "ScheduleExpression": "cron(0 * * * ? *)",
      "StartWindowMinutes": 60,
      "CompletionWindowMinutes": 120,
      "Lifecycle": {
        "DeleteAfterDays": 7
      }
    }
  ]
}'

Assign Resources to the Backup Plan

Bash — Assign resources by tag
# Assign all resources tagged Lifecycle=production to the backup plan
aws backup create-backup-selection \
  --backup-plan-id "PLAN-ID-FROM-PREVIOUS-COMMAND" \
  --backup-selection '{
    "SelectionName": "production-resources",
    "IamRoleArn": "arn:aws:iam::123456789012:role/AWSBackupDefaultServiceRole",
    "ListOfTags": [
      {
        "ConditionType": "STRINGEQUALS",
        "ConditionKey": "Lifecycle",
        "ConditionValue": "production"
      }
    ]
  }'
💡
Tag-based backup: By using tag-based resource selection, every new resource tagged Lifecycle=production is automatically included in the backup plan. No manual configuration needed when you add new resources.

Cross-Region Backup for Disaster Recovery

If an AI agent destroys resources in one region, cross-region backups ensure you have copies in a completely separate location. The AWS Backup plan above includes a CopyActions rule that copies daily backups to us-west-2.

HCL — Terraform: AWS Backup with cross-region copy
resource "aws_backup_vault" "primary" {
  name        = "production-vault"
  kms_key_arn = aws_kms_key.backup.arn
}

resource "aws_backup_vault" "dr" {
  provider    = aws.us_west_2
  name        = "dr-vault"
  kms_key_arn = aws_kms_key.backup_dr.arn
}

resource "aws_backup_plan" "production" {
  name = "production-daily"

  rule {
    rule_name         = "DailyBackup"
    target_vault_name = aws_backup_vault.primary.name
    schedule          = "cron(0 3 * * ? *)"

    lifecycle {
      delete_after = 35
    }

    copy_action {
      destination_vault_arn = aws_backup_vault.dr.arn
      lifecycle {
        delete_after = 90
      }
    }
  }

  rule {
    rule_name         = "HourlyBackup"
    target_vault_name = aws_backup_vault.primary.name
    schedule          = "cron(0 * * * ? *)"

    lifecycle {
      delete_after = 7
    }
  }
}

resource "aws_backup_selection" "production" {
  name         = "production-resources"
  iam_role_arn = aws_iam_role.backup.arn
  plan_id      = aws_backup_plan.production.id

  selection_tag {
    type  = "STRINGEQUALS"
    key   = "Lifecycle"
    value = "production"
  }
}

S3 Versioning and Cross-Region Replication

S3 versioning preserves every version of every object. Cross-region replication copies objects to a bucket in another region automatically.

Bash — Enable S3 cross-region replication
# Enable versioning on source bucket (required for replication)
aws s3api put-bucket-versioning \
  --bucket production-assets \
  --versioning-configuration Status=Enabled

# Enable versioning on destination bucket
aws s3api put-bucket-versioning \
  --bucket production-assets-replica \
  --versioning-configuration Status=Enabled

# Set up replication rule
aws s3api put-bucket-replication \
  --bucket production-assets \
  --replication-configuration '{
    "Role": "arn:aws:iam::123456789012:role/S3ReplicationRole",
    "Rules": [
      {
        "ID": "ReplicateAll",
        "Status": "Enabled",
        "Priority": 1,
        "Filter": {},
        "Destination": {
          "Bucket": "arn:aws:s3:::production-assets-replica",
          "StorageClass": "STANDARD_IA"
        },
        "DeleteMarkerReplication": {
          "Status": "Disabled"
        }
      }
    ]
  }'
Delete marker replication: Note that DeleteMarkerReplication is set to Disabled. This means if an AI agent deletes objects in the source bucket, the deletions will not be replicated to the destination. Your replica bucket retains all objects even if the source is wiped.

RDS Automated Backups and Point-in-Time Recovery

RDS automated backups take daily snapshots and capture transaction logs continuously. This allows you to restore to any second within the retention period.

Bash — Configure RDS automated backups
# Set backup retention to 35 days (maximum)
aws rds modify-db-instance \
  --db-instance-identifier production-postgres \
  --backup-retention-period 35 \
  --preferred-backup-window "03:00-04:00" \
  --apply-immediately

# Restore to a specific point in time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier production-postgres \
  --target-db-instance-identifier production-postgres-restored \
  --restore-time "2026-03-20T14:30:00Z" \
  --db-instance-class db.t3.medium \
  --tags Key=Name,Value=production-postgres-restored Key=Lifecycle,Value=recovery

# Restore from the latest automated backup
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier production-postgres \
  --target-db-instance-identifier production-postgres-restored \
  --use-latest-restorable-time \
  --db-instance-class db.t3.medium

EC2 AMI Automation

Regular AMI backups of your EC2 instances let you restore the entire instance (OS, applications, data) quickly.

Bash — Create and automate EC2 AMIs
# Create an AMI from a running instance
aws ec2 create-image \
  --instance-id i-0abc123def456789 \
  --name "production-web-$(date +%Y-%m-%d)" \
  --description "Daily backup of production web server" \
  --no-reboot \
  --tag-specifications 'ResourceType=image,Tags=[{Key=Name,Value=production-web-backup},{Key=Lifecycle,Value=backup}]'

# Use AWS Backup to automate AMI creation (preferred method)
# Or use Amazon Data Lifecycle Manager:
aws dlm create-lifecycle-policy \
  --description "Daily AMI backup for production instances" \
  --state ENABLED \
  --execution-role-arn arn:aws:iam::123456789012:role/AWSDataLifecycleManagerDefaultRole \
  --policy-details '{
    "PolicyType": "IMAGE_MANAGEMENT",
    "ResourceTypes": ["INSTANCE"],
    "TargetTags": [{"Key": "Lifecycle", "Value": "production"}],
    "Schedules": [{
      "Name": "DailyAMI",
      "CreateRule": {
        "Interval": 24,
        "IntervalUnit": "HOURS",
        "Times": ["03:00"]
      },
      "RetainRule": {
        "Count": 14
      },
      "CopyTags": true
    }]
  }'

Recovery Procedures When an AI Agent Deletes Resources

Despite all guardrails, an incident may occur. Here are step-by-step recovery procedures for each service:

EC2 Instance Recovery

1
Check CloudTrail to confirm what was deleted and when.
2
Find the latest AMI: aws ec2 describe-images --owners self --filters "Name=tag:Name,Values=production-web-backup" --query 'sort_by(Images, &CreationDate)[-1]'
3
Launch from AMI: aws ec2 run-instances --image-id ami-xxxx --instance-type t3.medium --disable-api-termination
4
Restore EBS volumes from snapshots if the instance had additional volumes.
5
Update DNS/Load Balancer to point to the new instance.

RDS Database Recovery

1
Restore from point-in-time: aws rds restore-db-instance-to-point-in-time --source-db-instance-identifier production-postgres --target-db-instance-identifier production-postgres-restored --use-latest-restorable-time
2
Wait for the new instance to become available (typically 10-30 minutes).
3
Update application connection strings to point to the new endpoint.
4
Enable deletion protection on the restored instance immediately.

S3 Object Recovery

1
If versioning was enabled: List versions with aws s3api list-object-versions --bucket production-assets and restore by deleting the delete marker.
2
If cross-region replication was enabled: Copy from the replica bucket.
3
If AWS Backup was configured: Restore from the backup vault.

Cost Considerations for Backup Strategies

StrategyMonthly Cost EstimateRecovery SpeedData Loss Risk
S3 Versioning Storage cost for all versions (~1.5-3x base) Instant (seconds) Zero (all versions preserved)
S3 Cross-Region Replication 2x storage + data transfer fees Instant from replica Near-zero (seconds of lag)
RDS Automated Backups Free up to DB size; $0.095/GB beyond 10-30 minutes Near-zero (transaction log coverage)
EC2 AMI (daily) EBS snapshot cost (~$0.05/GB/month) 5-15 minutes Up to 24 hours of changes
AWS Backup (hourly) Snapshot cost + cross-region transfer 5-30 minutes Up to 1 hour of changes
DynamoDB PITR ~25% of table storage cost Minutes to hours (depends on size) Near-zero (continuous backup)
💡
Cost vs risk: The cost of backups is almost always less than the cost of data loss. A single production database deletion could cost your company hours of downtime and lost revenue. Backup strategies that cost $50-200/month can save you from $10,000+ incidents.