Design ML Solutions (30-35%) Intermediate
This is the highest-weighted domain on the DP-100 exam. You must understand how to design the Azure ML workspace, choose compute targets, manage data assets, and build pipeline architectures. Master this domain and you are a third of the way to passing.
Azure Machine Learning Workspace
The workspace is the top-level resource for Azure ML. Everything — compute, data, models, endpoints, experiments — lives inside a workspace.
Key Workspace Components
| Component | Purpose | Exam Focus |
|---|---|---|
| Workspace | Top-level container for all ML assets | Creation, RBAC, networking (private endpoints) |
| Resource Group | Azure container for related resources | Organizing workspaces by team/project |
| Storage Account | Default datastore (Blob Storage) | Automatically created with workspace |
| Key Vault | Secrets and credentials management | Connection strings, API keys |
| Application Insights | Monitoring and telemetry | Endpoint monitoring, logging |
| Container Registry | Docker images for environments | Custom environments, deployment images |
# Creating a workspace with Python SDK v2
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Workspace
from azure.identity import DefaultAzureCredential
# Authenticate
credential = DefaultAzureCredential()
# Define workspace
ws = Workspace(
name="dp100-exam-workspace",
location="eastus",
display_name="DP-100 Study Workspace",
description="Workspace for DP-100 exam preparation",
tags={"purpose": "certification-prep"}
)
# Create workspace (also creates Storage, Key Vault, App Insights)
ml_client = MLClient(
credential=credential,
subscription_id="your-subscription-id",
resource_group_name="dp100-rg"
)
ml_client.workspaces.begin_create_or_update(ws)
Compute Targets
Choosing the right compute is critical for both the exam and real projects. Know when to use each type.
| Compute Type | Use Case | Key Features | Exam Tips |
|---|---|---|---|
| Compute Instance | Development, notebooks | Single VM, Jupyter, VS Code, SSH | One per user, auto-shutdown, not for production |
| Compute Cluster | Training jobs, pipelines | Auto-scaling, multi-node, spot VMs | Min nodes=0 saves cost, max nodes limits spend |
| Serverless Compute | On-demand training | No cluster management, pay per job | New option — no need to pre-provision |
| Kubernetes (AKS) | Production inference | Scalable, GPU support, custom networking | Attach existing AKS cluster to workspace |
| Managed Online Endpoint | Real-time inference | Built-in load balancing, blue-green | Preferred for real-time scoring scenarios |
| Managed Batch Endpoint | Batch scoring | Process large datasets, parallel jobs | Best for offline/scheduled predictions |
# Creating a compute cluster with SDK v2
from azure.ai.ml.entities import AmlCompute
# Define compute cluster
cluster = AmlCompute(
name="dp100-cluster",
type="amlcompute",
size="Standard_DS3_v2", # VM size
min_instances=0, # Scale to zero when idle
max_instances=4, # Max parallel nodes
idle_time_before_scale_down=120, # Seconds before scale-down
tier="Dedicated" # or "LowPriority" for spot VMs
)
ml_client.compute.begin_create_or_update(cluster)
Data Assets and Datastores
Azure ML uses datastores to connect to storage services and data assets to reference specific datasets.
Datastore Types
- Azure Blob Storage — Default datastore, best for unstructured data (images, text files)
- Azure Data Lake Storage Gen2 — Hierarchical namespace, best for large-scale analytics
- Azure SQL Database — Structured data, direct SQL queries
- Azure Files — File shares, mountable as network drive
Data Asset Types
| Type | Description | When to Use |
|---|---|---|
| URI File | Points to a single file | Single CSV, Parquet, or model file |
| URI Folder | Points to a folder | Image datasets, multiple files in a directory |
| MLTable | Tabular data with schema | Structured data with column types, transformations |
# Register a data asset (URI File)
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
# CSV file in blob storage
data_asset = Data(
name="customer-churn-data",
description="Customer churn dataset for training",
path="azureml://datastores/workspaceblobstore/paths/data/churn.csv",
type=AssetTypes.URI_FILE,
version="1"
)
ml_client.data.create_or_update(data_asset)
# Register an MLTable asset
mltable_asset = Data(
name="customer-churn-mltable",
description="Customer churn as MLTable with schema",
path="azureml://datastores/workspaceblobstore/paths/data/churn-mltable/",
type=AssetTypes.MLTABLE,
version="1"
)
ml_client.data.create_or_update(mltable_asset)
Environments
Environments define the software dependencies for training and deployment. Azure ML supports curated environments (pre-built) and custom environments.
# Using a curated environment
from azure.ai.ml.entities import Environment
# List curated environments
envs = ml_client.environments.list()
for env in envs:
if "sklearn" in env.name.lower():
print(f"{env.name}: {env.version}")
# Create custom environment from conda file
custom_env = Environment(
name="dp100-custom-env",
description="Custom environment for DP-100 training",
conda_file="conda.yml",
image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
version="1"
)
ml_client.environments.create_or_update(custom_env)
Pipeline Architecture
Pipelines orchestrate multi-step ML workflows. For the exam, understand pipeline components, data passing, and scheduling.
# Building a pipeline with SDK v2
from azure.ai.ml import dsl, Input, Output
from azure.ai.ml.entities import CommandComponent
# Define pipeline steps as components
@dsl.pipeline(
description="DP-100 training pipeline",
compute="dp100-cluster"
)
def training_pipeline(input_data):
# Step 1: Data preparation
prep_step = prep_component(
raw_data=input_data
)
# Step 2: Train model
train_step = train_component(
training_data=prep_step.outputs.prepared_data,
learning_rate=0.01,
n_estimators=100
)
# Step 3: Evaluate model
eval_step = eval_component(
model=train_step.outputs.model,
test_data=prep_step.outputs.test_data
)
return {
"trained_model": train_step.outputs.model,
"metrics": eval_step.outputs.metrics
}
# Submit pipeline
pipeline_job = training_pipeline(
input_data=Input(
type="uri_file",
path="azureml://datastores/workspaceblobstore/paths/data/churn.csv"
)
)
ml_client.jobs.create_or_update(pipeline_job)
Practice Questions
A. Compute Cluster with min_instances=0
B. Compute Instance with auto-shutdown enabled
C. Managed Online Endpoint
D. Azure Databricks cluster
Show Answer
B. Compute Instance with auto-shutdown enabled. Compute Instances are designed for individual development work including Jupyter notebooks. Auto-shutdown stops the VM after a period of inactivity. Compute Clusters are for training jobs, not interactive notebook use. Managed Online Endpoints are for inference. Databricks is a separate service not directly managed within Azure ML Studio.
A. Compute Cluster with Dedicated tier and Standard_NC6 VMs
B. Compute Cluster with Low Priority tier and Standard_NC6 VMs
C. Compute Instance with Standard_NC6 VM
D. Serverless Compute with GPU
Show Answer
B. Compute Cluster with Low Priority tier and Standard_NC6 VMs. Low Priority (Spot) VMs cost up to 80% less than Dedicated VMs. Since the jobs are fault-tolerant with checkpointing, preemption is acceptable. Compute Instances are for development, not batch training. Serverless Compute is an option but Low Priority clusters give more cost control for predictable GPU workloads.
A. URI File
B. URI Folder
C. MLTable
D. Azure Open Dataset
Show Answer
B. URI Folder. URI Folder references a directory of files, making it ideal for image datasets with many files. URI File is for a single file. MLTable is for tabular/structured data with schema definitions. Azure Open Datasets are pre-existing public datasets, not for registering your own data.
A. Azure Key Vault
B. Azure Kubernetes Service
C. Storage Account
D. Azure Container Registry
E. Virtual Network
Show Answer
A (Azure Key Vault) and C (Storage Account). When you create an Azure ML workspace, Azure automatically provisions a Storage Account (default datastore), Key Vault (secrets), and Application Insights (monitoring). Container Registry is created on-demand when you first build a custom environment or deploy a model. AKS and VNet must be created separately.
A. Create a cron schedule on the pipeline job
B. Use Azure Logic Apps to trigger the pipeline
C. Set up a recurring compute instance
D. Use Azure Data Factory to orchestrate
Show Answer
A. Create a cron schedule on the pipeline job. Azure ML pipelines natively support cron-based scheduling using the SDK v2 or CLI v2. While Logic Apps and Data Factory can also trigger pipelines, the simplest and most direct approach is the built-in schedule. Compute instances do not have recurring job scheduling.
Lilly Tech Systems