Monitoring Agent Activity
Detection is as important as prevention. This lesson covers how to set up real-time monitoring and alerting so you know immediately when an AI agent attempts or executes destructive operations in your cloud environment.
Real-Time Alerting on Destructive API Calls
The goal is simple: when any API call matching a destructive pattern is made by an AI agent's service account, trigger an immediate alert to the operations team. Here is how to set this up on each major cloud.
AWS: CloudTrail + EventBridge
AWS CloudTrail logs every API call. EventBridge can match specific patterns and trigger alerts in real time:
resource "aws_cloudwatch_event_rule" "destructive_api_alert" {
name = "ai-agent-destructive-api-calls"
description = "Alert on delete/terminate operations by AI agent roles"
event_pattern = jsonencode({
source = ["aws.ec2", "aws.rds", "aws.s3", "aws.cloudformation"]
detail-type = ["AWS API Call via CloudTrail"]
detail = {
eventName = [
{ prefix = "Delete" },
{ prefix = "Terminate" },
{ prefix = "Remove" }
]
userIdentity = {
sessionContext = {
sessionIssuer = {
userName = [{ prefix = "ai-agent-" }]
}
}
}
}
})
}
resource "aws_cloudwatch_event_target" "sns_alert" {
rule = aws_cloudwatch_event_rule.destructive_api_alert.name
target_id = "send-to-sns"
arn = aws_sns_topic.security_alerts.arn
}
resource "aws_sns_topic" "security_alerts" {
name = "ai-agent-security-alerts"
}
resource "aws_sns_topic_subscription" "ops_team_email" {
topic_arn = aws_sns_topic.security_alerts.arn
protocol = "email"
endpoint = "ops-team@company.com"
}
Azure: Activity Log Alerts
# Create action group for notifications
az monitor action-group create \
--name "AIAgentAlerts" \
--resource-group monitoring-rg \
--short-name "AgentAlert" \
--email-receiver name="OpsTeam" email="ops@company.com" \
--sms-receiver name="OnCall" country-code="1" phone="5551234567"
# Create alert for resource deletion events
az monitor activity-log alert create \
--name "AI Agent Destructive Operation" \
--resource-group monitoring-rg \
--condition category=Administrative \
--condition operationName="Microsoft.Compute/virtualMachines/delete" \
--action-group "/subscriptions/{sub-id}/resourceGroups/monitoring-rg/providers/Microsoft.Insights/actionGroups/AIAgentAlerts" \
--description "Alert when AI agent deletes VMs"
# Alert for resource group deletion
az monitor activity-log alert create \
--name "Resource Group Deletion Alert" \
--resource-group monitoring-rg \
--condition category=Administrative \
--condition operationName="Microsoft.Resources/subscriptions/resourceGroups/delete" \
--action-group "/subscriptions/{sub-id}/resourceGroups/monitoring-rg/providers/Microsoft.Insights/actionGroups/AIAgentAlerts"
GCP: Cloud Audit Logs with Alerting
# Create a log-based metric for delete operations by agent accounts
gcloud logging metrics create ai-agent-destructive-ops \
--project=my-project \
--description="Count of destructive operations by AI agent service accounts" \
--log-filter='
protoPayload.methodName=~"delete|destroy|remove"
protoPayload.authenticationInfo.principalEmail=~"ai-agent"
severity>=WARNING
'
# Create an alerting policy based on the metric
gcloud alpha monitoring policies create \
--display-name="AI Agent Destructive Operations" \
--condition-display-name="Destructive ops detected" \
--condition-filter='metric.type="logging.googleapis.com/user/ai-agent-destructive-ops"' \
--condition-threshold-value=0 \
--condition-threshold-comparison=COMPARISON_GT \
--duration=60s \
--notification-channels="projects/my-project/notificationChannels/123456"
Building Dashboards for AI Agent Activity
Create dedicated dashboards that give your operations team visibility into all AI agent activity:
API Call Volume
Track the total number of API calls made by AI agent service accounts over time. Sudden spikes indicate unusual agent behavior that warrants investigation.
Destructive vs Read-Only Ratio
Monitor the ratio of write/delete operations to read operations. AI agents performing mostly read operations are behaving normally; a shift toward writes signals risk.
Failed Authorization Attempts
Track AccessDenied and Forbidden errors from agent accounts. These indicate the agent tried to do something it was not permitted to do — your least-privilege policies are working.
Resource Change Timeline
Display a chronological timeline of all resources created, modified, and deleted by agent accounts. This provides a clear audit trail for incident investigation.
Anomaly Detection
Set up anomaly detection to catch unusual patterns that static rules might miss:
| Anomaly Pattern | What It Indicates | Detection Method |
|---|---|---|
| Spike in API calls | Agent stuck in a loop or executing bulk operations | Statistical threshold on 5-minute rolling window |
| New API types called | Agent accessing services it normally does not use | Baseline comparison of API call types |
| Cross-region activity | Agent operating outside expected regions | Geo-based alert on API source region |
| Off-hours activity | Agent running without developer oversight | Time-based alert outside business hours |
| Rapid resource creation/deletion | Agent creating then immediately destroying resources (thrashing) | Correlation of create/delete events within time window |
Incident Response Playbook
When monitoring detects a potentially dangerous AI agent action, follow this playbook:
-
Immediately Revoke Agent Credentials
Disable the AI agent's service account or IAM user. On AWS, use
aws iam update-access-key --status Inactive. On Azure, disable the service principal. On GCP, disable the service account key. -
Assess the Blast Radius
Query CloudTrail/Activity Logs/Audit Logs to identify every action the agent performed in the current session. Determine which resources were affected.
-
Initiate Recovery
For deleted resources, begin recovery from snapshots, backups, or Terraform state. Prioritize production-facing resources and databases.
-
Root Cause Analysis
Review the agent's conversation log, the commands it executed, and the user prompts that triggered the destructive behavior. Identify whether the issue was permissions, prompt misinterpretation, or missing safety controls.
-
Implement Prevention
Based on the RCA, add the missing guardrail: tighter IAM policies, additional resource protection, or updated agent configuration to prevent recurrence.
Lilly Tech Systems