Model Deployment
Deploy trained models to production using Watson Machine Learning — online endpoints, batch scoring, REST APIs, model versioning, and monitoring.
Watson Machine Learning (WML)
Watson Machine Learning is IBM's service for deploying, managing, and monitoring AI models in production. It supports models built with scikit-learn, TensorFlow, Keras, PyTorch, and other frameworks.
Deployment Workflow
- Save the model — Serialize the trained model (joblib for scikit-learn, SavedModel for TensorFlow)
- Store in WML — Upload the model to the Watson ML repository with metadata
- Create a deployment — Deploy as an online endpoint or batch job
- Test the endpoint — Send test data to verify predictions
- Integrate — Connect the endpoint to your application via REST API
# Deploy a model using the IBM Watson ML Python client
from ibm_watson_machine_learning import APIClient
# Connect to WML
wml_credentials = {
"url": "https://us-south.ml.cloud.ibm.com",
"apikey": "your-api-key"
}
client = APIClient(wml_credentials)
# Store the model
model_details = client.repository.store_model(
model=trained_model,
meta_props={
client.repository.ModelMetaNames.NAME: "churn-predictor",
client.repository.ModelMetaNames.TYPE: "scikit-learn_1.1",
client.repository.ModelMetaNames.SOFTWARE_SPEC_UID:
client.software_specifications.get_id_by_name("runtime-23.1-py3.10")
}
)
# Deploy as online endpoint
deployment = client.deployments.create(
artifact_uid=model_id,
meta_props={
client.deployments.ConfigurationMetaNames.NAME: "churn-deployment",
client.deployments.ConfigurationMetaNames.ONLINE: {}
}
)
Deployment Types
Online Deployment
Creates a REST API endpoint for real-time predictions. Each request receives an immediate response.
- Use case: Real-time scoring (fraud detection, recommendations, chatbots)
- Latency: Milliseconds to seconds
- Scaling: IBM manages autoscaling based on request volume
Batch Deployment
Processes large datasets in bulk. Submit a job with input data and retrieve results when complete.
- Use case: Scoring entire customer databases, nightly reports, bulk predictions
- Cost: More cost-effective for large volumes than individual API calls
- Data sources: Can read from Cloud Object Storage, databases, or inline data
REST API Integration
# Scoring with the deployed model via REST API
import requests
scoring_url = "https://us-south.ml.cloud.ibm.com/..."
headers = {
"Authorization": "Bearer " + token,
"Content-Type": "application/json"
}
payload = {
"input_data": [{
"fields": ["age", "tenure", "monthly_charges"],
"values": [[35, 12, 65.50], [55, 3, 85.00]]
}]
}
response = requests.post(scoring_url, json=payload, headers=headers)
predictions = response.json()["predictions"]
Model Versioning and Lifecycle
- Model versioning — Store multiple versions of the same model in the WML repository
- A/B deployment — Route traffic between model versions to compare performance
- Rollback — Quickly revert to a previous model version if the new one underperforms
- Model lineage — Track which data and code produced each model version
Model Monitoring
IBM OpenScale (now part of Watsonx.governance) monitors deployed models:
- Quality monitoring — Track accuracy, precision, recall over time with feedback data
- Drift monitoring — Detect when input data distributions shift from training data
- Fairness monitoring — Check for bias across protected attributes (age, gender, race)
- Explainability — Generate explanations for individual predictions (SHAP-based)
Practice Questions
A) Online deployment
B) Batch deployment
C) Edge deployment
D) Streaming deployment
Show Answer
B) Batch deployment. Batch deployment is designed for processing large volumes of data in bulk. Scoring 5 million records individually through an online endpoint would be slow and expensive. Batch jobs process the entire dataset efficiently in a single run.
A) The REST API is failing
B) Data drift
C) The model is overfitting
D) The server ran out of memory
Show Answer
B) Data drift. When a model's accuracy degrades over time without any changes to the model itself, the most likely cause is that the input data distribution has shifted from what the model was trained on. Retraining with recent data restores accuracy.
A) Create a REST endpoint
B) Monitor for drift
C) Store the trained model in the WML repository
D) Send scoring requests
Show Answer
C) Store the trained model in the WML repository. Before creating a deployment, you must first store the trained model in the Watson ML repository with metadata (name, type, software spec). Then you create a deployment from the stored model, and finally you can send scoring requests.
A) Batch deployment
B) Online deployment
C) Scheduled job
D) Manual scoring
Show Answer
B) Online deployment. Real-time fraud detection requires immediate responses. An online deployment creates a REST API endpoint that returns predictions in milliseconds, meeting the 200ms latency requirement. Batch deployment processes data in bulk and is not suitable for real-time use cases.
A) Quality monitoring
B) Drift monitoring
C) Fairness monitoring
D) Performance monitoring
Show Answer
C) Fairness monitoring. Fairness monitoring in Watsonx.governance checks model predictions across protected attributes (age, gender, race, etc.) to detect disparate impact or bias. It alerts when predictions show unfair patterns and suggests debiasing actions.