Advanced

Monitoring Agent Activity

Detection is as important as prevention. This lesson covers how to set up real-time monitoring and alerting so you know immediately when an AI agent attempts or executes destructive operations in your cloud environment.

Real-Time Alerting on Destructive API Calls

The goal is simple: when any API call matching a destructive pattern is made by an AI agent's service account, trigger an immediate alert to the operations team. Here is how to set this up on each major cloud.

AWS: CloudTrail + EventBridge

AWS CloudTrail logs every API call. EventBridge can match specific patterns and trigger alerts in real time:

Terraform - EventBridge rule for destructive API calls
resource "aws_cloudwatch_event_rule" "destructive_api_alert" {
  name        = "ai-agent-destructive-api-calls"
  description = "Alert on delete/terminate operations by AI agent roles"

  event_pattern = jsonencode({
    source      = ["aws.ec2", "aws.rds", "aws.s3", "aws.cloudformation"]
    detail-type = ["AWS API Call via CloudTrail"]
    detail = {
      eventName = [
        { prefix = "Delete" },
        { prefix = "Terminate" },
        { prefix = "Remove" }
      ]
      userIdentity = {
        sessionContext = {
          sessionIssuer = {
            userName = [{ prefix = "ai-agent-" }]
          }
        }
      }
    }
  })
}

resource "aws_cloudwatch_event_target" "sns_alert" {
  rule      = aws_cloudwatch_event_rule.destructive_api_alert.name
  target_id = "send-to-sns"
  arn       = aws_sns_topic.security_alerts.arn
}

resource "aws_sns_topic" "security_alerts" {
  name = "ai-agent-security-alerts"
}

resource "aws_sns_topic_subscription" "ops_team_email" {
  topic_arn = aws_sns_topic.security_alerts.arn
  protocol  = "email"
  endpoint  = "ops-team@company.com"
}

Azure: Activity Log Alerts

Azure CLI - Create activity log alert for delete operations
# Create action group for notifications
az monitor action-group create \
  --name "AIAgentAlerts" \
  --resource-group monitoring-rg \
  --short-name "AgentAlert" \
  --email-receiver name="OpsTeam" email="ops@company.com" \
  --sms-receiver name="OnCall" country-code="1" phone="5551234567"

# Create alert for resource deletion events
az monitor activity-log alert create \
  --name "AI Agent Destructive Operation" \
  --resource-group monitoring-rg \
  --condition category=Administrative \
  --condition operationName="Microsoft.Compute/virtualMachines/delete" \
  --action-group "/subscriptions/{sub-id}/resourceGroups/monitoring-rg/providers/Microsoft.Insights/actionGroups/AIAgentAlerts" \
  --description "Alert when AI agent deletes VMs"

# Alert for resource group deletion
az monitor activity-log alert create \
  --name "Resource Group Deletion Alert" \
  --resource-group monitoring-rg \
  --condition category=Administrative \
  --condition operationName="Microsoft.Resources/subscriptions/resourceGroups/delete" \
  --action-group "/subscriptions/{sub-id}/resourceGroups/monitoring-rg/providers/Microsoft.Insights/actionGroups/AIAgentAlerts"

GCP: Cloud Audit Logs with Alerting

GCP - Log-based alert for destructive operations
# Create a log-based metric for delete operations by agent accounts
gcloud logging metrics create ai-agent-destructive-ops \
  --project=my-project \
  --description="Count of destructive operations by AI agent service accounts" \
  --log-filter='
    protoPayload.methodName=~"delete|destroy|remove"
    protoPayload.authenticationInfo.principalEmail=~"ai-agent"
    severity>=WARNING
  '

# Create an alerting policy based on the metric
gcloud alpha monitoring policies create \
  --display-name="AI Agent Destructive Operations" \
  --condition-display-name="Destructive ops detected" \
  --condition-filter='metric.type="logging.googleapis.com/user/ai-agent-destructive-ops"' \
  --condition-threshold-value=0 \
  --condition-threshold-comparison=COMPARISON_GT \
  --duration=60s \
  --notification-channels="projects/my-project/notificationChannels/123456"

Building Dashboards for AI Agent Activity

Create dedicated dashboards that give your operations team visibility into all AI agent activity:

API Call Volume

Track the total number of API calls made by AI agent service accounts over time. Sudden spikes indicate unusual agent behavior that warrants investigation.

Destructive vs Read-Only Ratio

Monitor the ratio of write/delete operations to read operations. AI agents performing mostly read operations are behaving normally; a shift toward writes signals risk.

Failed Authorization Attempts

Track AccessDenied and Forbidden errors from agent accounts. These indicate the agent tried to do something it was not permitted to do — your least-privilege policies are working.

Resource Change Timeline

Display a chronological timeline of all resources created, modified, and deleted by agent accounts. This provides a clear audit trail for incident investigation.

Anomaly Detection

Set up anomaly detection to catch unusual patterns that static rules might miss:

Anomaly Pattern What It Indicates Detection Method
Spike in API calls Agent stuck in a loop or executing bulk operations Statistical threshold on 5-minute rolling window
New API types called Agent accessing services it normally does not use Baseline comparison of API call types
Cross-region activity Agent operating outside expected regions Geo-based alert on API source region
Off-hours activity Agent running without developer oversight Time-based alert outside business hours
Rapid resource creation/deletion Agent creating then immediately destroying resources (thrashing) Correlation of create/delete events within time window

Incident Response Playbook

When monitoring detects a potentially dangerous AI agent action, follow this playbook:

  1. Immediately Revoke Agent Credentials

    Disable the AI agent's service account or IAM user. On AWS, use aws iam update-access-key --status Inactive. On Azure, disable the service principal. On GCP, disable the service account key.

  2. Assess the Blast Radius

    Query CloudTrail/Activity Logs/Audit Logs to identify every action the agent performed in the current session. Determine which resources were affected.

  3. Initiate Recovery

    For deleted resources, begin recovery from snapshots, backups, or Terraform state. Prioritize production-facing resources and databases.

  4. Root Cause Analysis

    Review the agent's conversation log, the commands it executed, and the user prompts that triggered the destructive behavior. Identify whether the issue was permissions, prompt misinterpretation, or missing safety controls.

  5. Implement Prevention

    Based on the RCA, add the missing guardrail: tighter IAM policies, additional resource protection, or updated agent configuration to prevent recurrence.

Pro Tip: Run fire drills. Periodically simulate an AI agent attempting destructive operations in a sandbox environment and verify that your alerts fire, your team responds within SLA, and your recovery procedures work as expected.
💡
Next Up: The final lesson brings everything together with a comprehensive best practices checklist, organization-level policies, and a FAQ section covering common questions about AI agent cloud safety.