Advanced

ML Pipeline Automation

MLOps and pipeline automation represent one of the highest-weighted exam domains (~18%). This lesson covers Vertex Pipelines, Kubeflow, CI/CD for ML, and experiment tracking — the backbone of production ML on GCP.

MLOps Maturity Levels

Google defines three MLOps maturity levels. The exam tests your ability to recommend the right level for a given organization:

Level	Description	Characteristics
Level 0: Manual	Manual, script-driven, interactive process	Jupyter notebooks, manual deployment, no CI/CD, no monitoring
Level 1: ML Pipeline Automation	Automated ML pipeline, continuous training	Orchestrated pipeline, automated retraining on new data, feature store
Level 2: CI/CD Pipeline Automation	Automated CI/CD for pipeline code	Source control, automated testing, automated pipeline deployment, monitoring

💡

Exam Tip: When a question describes a team doing manual deployments with notebooks, the answer typically involves moving to Level 1 (Vertex Pipelines). When a team already has pipelines but needs faster iteration and quality gates, the answer is Level 2 (CI/CD with Cloud Build).

Vertex AI Pipelines

Vertex AI Pipelines is the managed pipeline orchestration service on GCP. It is the primary pipeline service tested on the exam.

Key Features

Serverless execution: No cluster management required
KFP v2 SDK: Define pipelines using the Kubeflow Pipelines SDK v2 in Python
TFX support: Run TFX pipelines on Vertex AI Pipelines
Artifact lineage: Track inputs, outputs, and metadata for every pipeline run
Scheduling: Run pipelines on a schedule using Cloud Scheduler
Caching: Reuse outputs from previously completed steps to save time and cost

Pipeline Components

A Vertex Pipeline is composed of components. Each component is a self-contained unit of work:

📊

Google Cloud Pipeline Components

Pre-built components for common GCP operations: BigQuery queries, Vertex AI training, model upload, endpoint deployment, batch prediction. Use these to minimize custom code.

📌

Custom Components

Write your own components using the @component decorator. Each component runs in its own container. Input/output types are validated at compile time.

🛠

Container Components

Wrap any Docker image as a pipeline component. Useful for non-Python workloads, legacy code, or specialized tools that require specific environments.

Kubeflow Pipelines vs. Vertex Pipelines

The exam may ask you to compare these two. Know the key differences:

Feature	Vertex AI Pipelines	Kubeflow Pipelines (on GKE)
Infrastructure	Fully managed, serverless	Self-managed GKE cluster
Setup complexity	Low (API call to run)	High (install KFP on GKE)
Cost model	Pay per pipeline step execution	Pay for GKE cluster (always on)
Customization	Limited to supported component types	Full Kubernetes flexibility
GCP integration	Native (IAM, logging, monitoring)	Requires manual configuration
Best for	Most production ML workloads	Highly customized pipelines, multi-cloud

⚠

Exam Answer Pattern: Unless the question specifically mentions "existing Kubeflow deployment," "multi-cloud," or "Kubernetes customization," the correct answer is Vertex AI Pipelines. Google strongly favors managed services in exam answers.

CI/CD for ML on GCP

CI/CD for ML extends traditional CI/CD to handle data, models, and pipelines. The GCP stack for ML CI/CD:

Stage	GCP Service	Purpose
Source control	Cloud Source Repositories / GitHub	Version pipeline code, training scripts, configs
Build & test	Cloud Build	Run unit tests, build custom containers, validate pipeline configs
Container registry	Artifact Registry	Store and version custom training/serving containers
Pipeline deployment	Cloud Build triggers	Automatically submit pipeline runs when code changes
Model validation	Vertex AI Evaluator / custom	Gate deployments based on model quality metrics
Model deployment	Vertex AI Model Registry + Endpoints	Deploy validated models to production endpoints

Experiment Tracking with Vertex AI Experiments

Vertex AI Experiments provides centralized tracking for ML experiments. Key capabilities:

Run tracking: Log parameters, metrics, and artifacts for each training run
Comparison: Compare metrics across runs in a tabular or visual format
Lineage: Track which data, code, and parameters produced each model
Integration: Works with Vertex AI Training, custom training, and notebooks
TensorBoard: Vertex AI TensorBoard provides managed TensorBoard instances for visualization

Cloud Composer (Apache Airflow)

Cloud Composer is GCP's managed Apache Airflow service. It orchestrates broader data workflows (not just ML). Know when to use it vs. Vertex Pipelines:

Use Cloud Composer when: You need to orchestrate a mix of data engineering and ML tasks (e.g., trigger Dataflow, then BigQuery, then Vertex Pipeline)
Use Vertex Pipelines when: The workflow is primarily ML (data prep, train, evaluate, deploy)
Cloud Composer can trigger Vertex Pipelines as a step in a larger DAG

Practice Questions

📝

Question 1: Your ML team currently uses Jupyter notebooks for model development. They manually copy trained models to Cloud Storage and update endpoints by hand. Models are retrained monthly. You want to automate this process with minimal infrastructure overhead. What should you recommend?

A. Set up Kubeflow Pipelines on a GKE cluster
B. Create a Vertex AI Pipeline that automates data prep, training, evaluation, and deployment
C. Write a bash script that runs on a Compute Engine VM with a cron job
D. Use Cloud Composer with custom Airflow operators

✅

Answer: B. Vertex AI Pipelines is the managed, serverless solution that moves the team from MLOps Level 0 to Level 1 with minimal infrastructure overhead. Kubeflow (A) requires managing a GKE cluster. Bash scripts (C) are fragile and not scalable. Cloud Composer (D) adds unnecessary complexity for a pure ML workflow.

📝

Question 2: Your ML pipeline takes 6 hours to run. The data preparation step (2 hours) rarely changes, but the training step is updated frequently. How can you reduce pipeline execution time?

A. Use a faster machine type for data preparation
B. Enable pipeline step caching in Vertex AI Pipelines
C. Move data preparation to a separate Cloud Function
D. Run the pipeline more frequently with smaller data batches

✅

Answer: B. Vertex AI Pipelines supports step caching: if a step's inputs have not changed, it reuses the previous output. Since the data prep step rarely changes, caching would skip the 2-hour step most of the time, reducing total pipeline time to ~4 hours. Faster machines (A) only marginally help. Cloud Functions (C) have a 9-minute timeout. Smaller batches (D) change the problem.

📝

Question 3: Your organization requires that every deployed model must pass a quality gate: AUC must be above 0.85 and latency must be below 100ms. Where in the ML CI/CD pipeline should this gate be implemented?

A. In the Cloud Build pipeline, before pushing the container image
B. In the Vertex AI Pipeline, after the evaluation step and before the deployment step
C. In the model monitoring configuration, after deployment
D. In the Jupyter notebook, during development

✅

Answer: B. The quality gate belongs in the ML pipeline between evaluation and deployment. The evaluation step produces metrics (AUC), and a conditional step checks if they meet the threshold before proceeding to deployment. Latency can be tested with a shadow deployment step. Cloud Build (A) does not have model metrics. Post-deployment monitoring (C) is too late. Notebooks (D) are not part of CI/CD.

📝

Question 4: You need to orchestrate a workflow that: (1) runs a Dataflow job to process raw data, (2) loads results into BigQuery, (3) triggers a Vertex AI Pipeline for model training, and (4) sends a Slack notification on completion. Which orchestration service should you use?

A. Vertex AI Pipelines
B. Cloud Composer (Apache Airflow)
C. Cloud Workflows
D. Cloud Scheduler

✅

Answer: B. Cloud Composer (Airflow) is the right choice because this workflow spans multiple GCP services beyond just ML. Airflow has native operators for Dataflow, BigQuery, Vertex AI, and Slack. Vertex Pipelines (A) is ML-focused and does not have native Dataflow or Slack integration. Cloud Workflows (C) is for lightweight API orchestration. Cloud Scheduler (D) is for simple cron triggers, not complex DAGs.

← Previous Model Development Next → Serving & Monitoring