Advanced

Domain 4: ML Implementation & Operations (20%)

Deploying, monitoring, and operationalizing ML models on AWS — SageMaker endpoints, A/B testing, model monitoring, security best practices, and Auto Scaling.

💡
Exam weight: This domain accounts for 20% of your score (~13 questions). Focus on SageMaker deployment patterns, endpoint configurations, and security.

SageMaker Deployment Options

Understanding the different deployment options and when to use each is critical for the exam.

Real-Time Endpoints

  • Persistent HTTPS endpoints for synchronous predictions
  • Low-latency, always-on inference
  • Support Auto Scaling based on invocations or custom CloudWatch metrics
  • Use when: Application needs immediate predictions (web apps, APIs, chatbots)
  • Cost: Pay per instance-hour while endpoint is running (even with zero traffic)

Batch Transform

  • Process an entire dataset at once without persistent endpoint
  • Input from S3, output to S3
  • Use when: Predictions on a full dataset (nightly scoring, bulk processing)
  • Cost-effective for large batch jobs since instances shut down after completion
  • Supports data joining (associating predictions with original data IDs)

Serverless Inference

  • Automatically provisions and scales compute capacity
  • Scales to zero when not in use (no charge for idle time)
  • Use when: Unpredictable or intermittent traffic patterns
  • Cold start latency (seconds) — not suitable for latency-sensitive applications

Asynchronous Inference

  • Queues requests and processes them asynchronously
  • Scales to zero instances during idle periods
  • Use when: Large payloads (up to 1 GB), long processing times, tolerance for latency
  • Notifications via SNS when predictions complete
💡
Decision tree: Need immediate response? Real-time endpoint. Processing a whole dataset? Batch transform. Intermittent traffic, cost-sensitive? Serverless. Large payloads, can wait? Async inference.

A/B Testing and Production Variants

SageMaker supports running multiple model versions behind a single endpoint:

Production Variants

  • Deploy multiple models to a single endpoint with traffic splitting
  • Example: 90% traffic to Model A (current), 10% to Model B (new)
  • Each variant can use different instance types and counts
  • Gradually shift traffic as you gain confidence in the new model

Shadow Testing

  • Route a copy of production traffic to a shadow variant
  • Shadow variant makes predictions but results are NOT returned to users
  • Compare shadow predictions against production without risk
  • Use when: You want to validate a new model on real traffic before any users see its predictions

SageMaker Model Monitor

Monitors deployed models for quality degradation over time:

Monitoring Types

  • Data quality monitoring — Detects drift in input data distributions. Compares current data against a baseline (training data statistics).
  • Model quality monitoring — Tracks model accuracy over time. Requires ground truth labels (may have a delay).
  • Bias drift monitoring — Uses SageMaker Clarify to detect changes in bias metrics over time.
  • Feature attribution drift — Detects changes in feature importance using SHAP values.

How It Works

  • Create a baseline from training data (statistics, constraints)
  • Schedule monitoring jobs (hourly, daily) via Processing jobs
  • Violations generate CloudWatch alarms and can trigger retraining
  • Data capture records input/output for every prediction to S3
Exam concept: "Data drift" = the input data distribution changes over time (e.g., customer demographics shift). "Concept drift" = the relationship between input and output changes (e.g., customer behavior changes). Both require model retraining, but you detect data drift first (no labels needed) and concept drift later (requires ground truth).

SageMaker Pipelines

CI/CD for machine learning workflows:

  • Define end-to-end ML workflows as directed acyclic graphs (DAGs)
  • Steps: Processing, Training, Tuning, Model Registration, Transform, Condition
  • Integrates with SageMaker Model Registry for model versioning and approval
  • Tracks lineage: which data, code, and parameters produced each model
  • Supports parameterized execution (different datasets, hyperparameters per run)

SageMaker Model Registry

  • Catalog of trained models with versioning
  • Approval workflow: PendingManualApproval → Approved → Deployed
  • Tracks model lineage (training data, algorithm, metrics)
  • Integrates with CI/CD for automated deployment upon approval

Auto Scaling

SageMaker endpoints support Auto Scaling with Application Auto Scaling:

  • Target tracking scaling — Maintain a target metric (e.g., InvocationsPerInstance = 100). SageMaker adjusts instance count automatically. Most common on exam.
  • Step scaling — Add/remove instances based on CloudWatch alarm thresholds.
  • Scheduled scaling — Pre-scale for predictable traffic patterns (e.g., scale up before business hours).
  • Set minimum and maximum instance counts to control costs
  • Cool-down period prevents rapid scaling oscillations

Security for ML on AWS

Security is tested throughout the exam, not just in this domain:

Encryption

  • At rest: S3 server-side encryption (SSE-S3, SSE-KMS, SSE-C). SageMaker training volumes encrypted with KMS.
  • In transit: All SageMaker API calls over HTTPS/TLS. Inter-container communication encrypted for distributed training.
  • KMS customer managed keys for sensitive data. Specify in training job and endpoint configurations.

Network Isolation

  • VPC mode: Run SageMaker training and inference inside your VPC (no internet access by default)
  • VPC endpoints (PrivateLink): Access SageMaker API without going through the internet
  • Network isolation flag: Completely isolates the container (no network access at all, not even to other AWS services)

IAM for ML

  • SageMaker execution role: IAM role that SageMaker assumes for training and inference. Must have permissions for S3, ECR, CloudWatch, KMS.
  • Least privilege: Grant only the S3 buckets, KMS keys, and ECR repositories needed.
  • Resource-based policies: S3 bucket policies to restrict which SageMaker roles can access training data.
💡
Exam pattern: When a question says "data must not traverse the internet," the answer involves VPC configuration + VPC endpoints (PrivateLink). When it says "container must not access any external resources," the answer is the network isolation flag.

Edge Deployment

  • SageMaker Neo: Compiles models for specific hardware (ARM, x86, GPU). Reduces model size and improves inference speed.
  • SageMaker Edge Manager: Deploy and manage models on edge devices (IoT). Monitors model performance on edge.
  • AWS IoT Greengrass: Run ML inference locally on IoT devices using Lambda or containers.

Containers and Frameworks

  • Pre-built containers: AWS provides Docker containers for TensorFlow, PyTorch, MXNet, scikit-learn, XGBoost. Use these as the default.
  • Bring Your Own Container (BYOC): Custom Docker image when pre-built containers do not support your framework or dependencies. Must follow SageMaker container contract (specific directory structure, entry points).
  • Script Mode: Use a pre-built container but supply your own training/inference script. Easier than BYOC.
  • ECR (Elastic Container Registry): Store custom containers. SageMaker pulls from ECR during training and deployment.

Practice Questions

Q1
A company needs to generate predictions for 10 million customer records every night for a marketing campaign. The predictions are not time-sensitive. Which SageMaker deployment option is most cost-effective?

A) Real-time endpoint with Auto Scaling
B) Batch Transform
C) Serverless Inference
D) Asynchronous Inference

Answer: B — Batch Transform is designed for exactly this scenario: processing a large dataset at once without maintaining a persistent endpoint. Instances spin up, process all 10 million records, and shut down. A real-time endpoint (A) would run 24/7, wasting cost during the ~23 hours of no traffic. Serverless (C) has cold starts and per-request costs that add up for 10 million records. Async (D) is for individual large payloads, not bulk processing.
Q2
A team wants to deploy a new version of their fraud detection model alongside the current production model, sending 5% of traffic to the new model. If the new model performs well, they will gradually increase its traffic share. Which SageMaker feature should they use?

A) Multi-model endpoints
B) Production variants with traffic splitting
C) SageMaker Pipelines
D) Batch Transform with two models

Answer: B — Production variants allow deploying multiple model versions behind a single endpoint with configurable traffic splitting (e.g., 95%/5%). Traffic can be gradually shifted using UpdateEndpointWeightsAndCapacities without downtime. Multi-model endpoints (A) host multiple models but do not split traffic. Pipelines (C) orchestrate training, not deployment. Batch Transform (D) is for offline processing.
Q3
A deployed ML model starts producing inaccurate predictions after several months. Investigation reveals that the distribution of incoming customer data has shifted significantly from the training data. Which SageMaker feature would have detected this issue earliest?

A) SageMaker Model Monitor — Data Quality monitoring
B) SageMaker Model Monitor — Model Quality monitoring
C) SageMaker Debugger
D) CloudWatch inference latency metrics

Answer: A — Data Quality monitoring detects data drift by comparing incoming data distributions against the training data baseline. This would catch the distribution shift immediately, without waiting for ground truth labels. Model Quality monitoring (B) requires ground truth labels, which often arrive with a delay. Debugger (C) is for training, not production. CloudWatch latency (D) does not measure data quality.
Q4
A financial services company requires that ML training data and model artifacts never traverse the public internet, and all data must be encrypted with customer-managed KMS keys. What configuration is needed?

A) Enable SSL on the SageMaker endpoint
B) Configure SageMaker to run in a VPC with no internet gateway, use VPC endpoints for S3 and SageMaker, and specify KMS key IDs in training and endpoint configurations
C) Use S3 bucket policies to restrict access
D) Enable network isolation on the training job

Answer: B — This is the complete solution: VPC without internet gateway ensures no public internet traversal, VPC endpoints (PrivateLink) for S3 and SageMaker API keep all traffic on the AWS private network, and KMS key IDs in job configurations ensure customer-managed encryption. A only covers TLS in transit. C does not prevent internet traversal. D prevents ALL network access (too restrictive, would block S3 access).
Q5
A SageMaker real-time endpoint experiences traffic spikes every weekday at 9 AM. The current scaling policy reacts too slowly, causing timeouts during the spike. What should the team do?

A) Increase the maximum instance count
B) Add a scheduled scaling action to pre-scale before 9 AM combined with target tracking for unexpected spikes
C) Switch to Batch Transform
D) Use a larger instance type

Answer: B — Scheduled scaling proactively adds capacity before the predictable 9 AM spike, eliminating the reaction delay. Combined with target tracking scaling for unexpected variations throughout the day, this provides both proactive and reactive scaling. A alone does not help if scaling is too slow. C changes the architecture unnecessarily. D might help but does not address the scaling speed issue.

Key Takeaways for the Exam

  • Real-time endpoints for immediate predictions. Batch Transform for bulk processing. Serverless for intermittent traffic.
  • Production variants enable A/B testing with traffic splitting on a single endpoint.
  • Model Monitor: Data Quality (detects data drift without labels), Model Quality (needs ground truth labels).
  • VPC + VPC endpoints (PrivateLink) = no internet traversal. Network isolation = no network access at all.
  • SageMaker Neo for edge deployment optimization. Edge Manager for device fleet management.
  • Scheduled + target tracking scaling for predictable traffic patterns with unexpected spikes.
  • KMS customer-managed keys for data encryption at rest. All SageMaker traffic is TLS-encrypted in transit.