Advanced

ML Pipeline Automation

MLOps and pipeline automation represent one of the highest-weighted exam domains (~18%). This lesson covers Vertex Pipelines, Kubeflow, CI/CD for ML, and experiment tracking — the backbone of production ML on GCP.

MLOps Maturity Levels

Google defines three MLOps maturity levels. The exam tests your ability to recommend the right level for a given organization:

LevelDescriptionCharacteristics
Level 0: ManualManual, script-driven, interactive processJupyter notebooks, manual deployment, no CI/CD, no monitoring
Level 1: ML Pipeline AutomationAutomated ML pipeline, continuous trainingOrchestrated pipeline, automated retraining on new data, feature store
Level 2: CI/CD Pipeline AutomationAutomated CI/CD for pipeline codeSource control, automated testing, automated pipeline deployment, monitoring
💡
Exam Tip: When a question describes a team doing manual deployments with notebooks, the answer typically involves moving to Level 1 (Vertex Pipelines). When a team already has pipelines but needs faster iteration and quality gates, the answer is Level 2 (CI/CD with Cloud Build).

Vertex AI Pipelines

Vertex AI Pipelines is the managed pipeline orchestration service on GCP. It is the primary pipeline service tested on the exam.

Key Features

  • Serverless execution: No cluster management required
  • KFP v2 SDK: Define pipelines using the Kubeflow Pipelines SDK v2 in Python
  • TFX support: Run TFX pipelines on Vertex AI Pipelines
  • Artifact lineage: Track inputs, outputs, and metadata for every pipeline run
  • Scheduling: Run pipelines on a schedule using Cloud Scheduler
  • Caching: Reuse outputs from previously completed steps to save time and cost

Pipeline Components

A Vertex Pipeline is composed of components. Each component is a self-contained unit of work:

📊

Google Cloud Pipeline Components

Pre-built components for common GCP operations: BigQuery queries, Vertex AI training, model upload, endpoint deployment, batch prediction. Use these to minimize custom code.

📌

Custom Components

Write your own components using the @component decorator. Each component runs in its own container. Input/output types are validated at compile time.

🛠

Container Components

Wrap any Docker image as a pipeline component. Useful for non-Python workloads, legacy code, or specialized tools that require specific environments.

Kubeflow Pipelines vs. Vertex Pipelines

The exam may ask you to compare these two. Know the key differences:

FeatureVertex AI PipelinesKubeflow Pipelines (on GKE)
InfrastructureFully managed, serverlessSelf-managed GKE cluster
Setup complexityLow (API call to run)High (install KFP on GKE)
Cost modelPay per pipeline step executionPay for GKE cluster (always on)
CustomizationLimited to supported component typesFull Kubernetes flexibility
GCP integrationNative (IAM, logging, monitoring)Requires manual configuration
Best forMost production ML workloadsHighly customized pipelines, multi-cloud
Exam Answer Pattern: Unless the question specifically mentions "existing Kubeflow deployment," "multi-cloud," or "Kubernetes customization," the correct answer is Vertex AI Pipelines. Google strongly favors managed services in exam answers.

CI/CD for ML on GCP

CI/CD for ML extends traditional CI/CD to handle data, models, and pipelines. The GCP stack for ML CI/CD:

StageGCP ServicePurpose
Source controlCloud Source Repositories / GitHubVersion pipeline code, training scripts, configs
Build & testCloud BuildRun unit tests, build custom containers, validate pipeline configs
Container registryArtifact RegistryStore and version custom training/serving containers
Pipeline deploymentCloud Build triggersAutomatically submit pipeline runs when code changes
Model validationVertex AI Evaluator / customGate deployments based on model quality metrics
Model deploymentVertex AI Model Registry + EndpointsDeploy validated models to production endpoints

Experiment Tracking with Vertex AI Experiments

Vertex AI Experiments provides centralized tracking for ML experiments. Key capabilities:

  • Run tracking: Log parameters, metrics, and artifacts for each training run
  • Comparison: Compare metrics across runs in a tabular or visual format
  • Lineage: Track which data, code, and parameters produced each model
  • Integration: Works with Vertex AI Training, custom training, and notebooks
  • TensorBoard: Vertex AI TensorBoard provides managed TensorBoard instances for visualization

Cloud Composer (Apache Airflow)

Cloud Composer is GCP's managed Apache Airflow service. It orchestrates broader data workflows (not just ML). Know when to use it vs. Vertex Pipelines:

  • Use Cloud Composer when: You need to orchestrate a mix of data engineering and ML tasks (e.g., trigger Dataflow, then BigQuery, then Vertex Pipeline)
  • Use Vertex Pipelines when: The workflow is primarily ML (data prep, train, evaluate, deploy)
  • Cloud Composer can trigger Vertex Pipelines as a step in a larger DAG

Practice Questions

📝
Question 1: Your ML team currently uses Jupyter notebooks for model development. They manually copy trained models to Cloud Storage and update endpoints by hand. Models are retrained monthly. You want to automate this process with minimal infrastructure overhead. What should you recommend?

A. Set up Kubeflow Pipelines on a GKE cluster
B. Create a Vertex AI Pipeline that automates data prep, training, evaluation, and deployment
C. Write a bash script that runs on a Compute Engine VM with a cron job
D. Use Cloud Composer with custom Airflow operators
Answer: B. Vertex AI Pipelines is the managed, serverless solution that moves the team from MLOps Level 0 to Level 1 with minimal infrastructure overhead. Kubeflow (A) requires managing a GKE cluster. Bash scripts (C) are fragile and not scalable. Cloud Composer (D) adds unnecessary complexity for a pure ML workflow.
📝
Question 2: Your ML pipeline takes 6 hours to run. The data preparation step (2 hours) rarely changes, but the training step is updated frequently. How can you reduce pipeline execution time?

A. Use a faster machine type for data preparation
B. Enable pipeline step caching in Vertex AI Pipelines
C. Move data preparation to a separate Cloud Function
D. Run the pipeline more frequently with smaller data batches
Answer: B. Vertex AI Pipelines supports step caching: if a step's inputs have not changed, it reuses the previous output. Since the data prep step rarely changes, caching would skip the 2-hour step most of the time, reducing total pipeline time to ~4 hours. Faster machines (A) only marginally help. Cloud Functions (C) have a 9-minute timeout. Smaller batches (D) change the problem.
📝
Question 3: Your organization requires that every deployed model must pass a quality gate: AUC must be above 0.85 and latency must be below 100ms. Where in the ML CI/CD pipeline should this gate be implemented?

A. In the Cloud Build pipeline, before pushing the container image
B. In the Vertex AI Pipeline, after the evaluation step and before the deployment step
C. In the model monitoring configuration, after deployment
D. In the Jupyter notebook, during development
Answer: B. The quality gate belongs in the ML pipeline between evaluation and deployment. The evaluation step produces metrics (AUC), and a conditional step checks if they meet the threshold before proceeding to deployment. Latency can be tested with a shadow deployment step. Cloud Build (A) does not have model metrics. Post-deployment monitoring (C) is too late. Notebooks (D) are not part of CI/CD.
📝
Question 4: You need to orchestrate a workflow that: (1) runs a Dataflow job to process raw data, (2) loads results into BigQuery, (3) triggers a Vertex AI Pipeline for model training, and (4) sends a Slack notification on completion. Which orchestration service should you use?

A. Vertex AI Pipelines
B. Cloud Composer (Apache Airflow)
C. Cloud Workflows
D. Cloud Scheduler
Answer: B. Cloud Composer (Airflow) is the right choice because this workflow spans multiple GCP services beyond just ML. Airflow has native operators for Dataflow, BigQuery, Vertex AI, and Slack. Vertex Pipelines (A) is ML-focused and does not have native Dataflow or Slack integration. Cloud Workflows (C) is for lightweight API orchestration. Cloud Scheduler (D) is for simple cron triggers, not complex DAGs.