Advanced
MLflow Deployment
Deploy MLflow models to production — as REST APIs, Docker containers, on Kubernetes, or in the cloud.
Serving Models Locally
Shell — MLflow models serve
# Serve a model from a run
mlflow models serve -m "runs:/abc123/model" -p 5001 --no-conda
# Serve a model from the registry
mlflow models serve -m "models:/churn-predictor/Production" -p 5001
# Serve with specific workers
mlflow models serve -m "models:/churn-predictor/1" -p 5001 --workers 4
REST API Endpoint
MLflow serves models with a standard REST API:
Shell — Making predictions via REST API
# JSON input format
curl -X POST http://localhost:5001/invocations \
-H "Content-Type: application/json" \
-d '{
"dataframe_split": {
"columns": ["age", "income", "tenure", "num_products"],
"data": [[35, 75000, 24, 3], [28, 45000, 6, 1]]
}
}'
# CSV input format
curl -X POST http://localhost:5001/invocations \
-H "Content-Type: text/csv" \
-d 'age,income,tenure,num_products
35,75000,24,3
28,45000,6,1'
# Health check
curl http://localhost:5001/health
Docker Deployment
Shell — Build and run Docker container
# Build a Docker image from a logged model
mlflow models build-docker \
-m "models:/churn-predictor/Production" \
-n "churn-predictor" \
--enable-mlserver # Use MLServer for better performance
# Run the container
docker run -p 5001:8080 churn-predictor
# With environment variables
docker run -p 5001:8080 \
-e MLFLOW_TRACKING_URI=http://tracking-server:5000 \
churn-predictor
Kubernetes Deployment
YAML — Kubernetes deployment for MLflow model
apiVersion: apps/v1
kind: Deployment
metadata:
name: churn-predictor
labels:
app: churn-predictor
spec:
replicas: 3
selector:
matchLabels:
app: churn-predictor
template:
metadata:
labels:
app: churn-predictor
spec:
containers:
- name: model
image: churn-predictor:latest
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: churn-predictor-hpa
spec:
scaleRef:
apiVersion: apps/v1
kind: Deployment
name: churn-predictor
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Cloud Deployment
AWS SageMaker
Python — Deploy to SageMaker
import mlflow.sagemaker
# Deploy model to SageMaker
mlflow.sagemaker.deploy(
app_name="churn-predictor",
model_uri="models:/churn-predictor/Production",
region_name="us-east-1",
mode="create",
instance_type="ml.m5.large",
instance_count=2,
)
Azure ML
Python — Deploy to Azure ML
import mlflow.azureml
# Build Azure ML image and deploy
azure_model, azure_image = mlflow.azureml.build_image(
model_uri="models:/churn-predictor/Production",
workspace=workspace,
model_name="churn-predictor",
)
# Deploy to Azure Container Instances or AKS
from azureml.core.webservice import AciWebservice
aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)
service = azure_model.deploy(workspace, "churn-service", [azure_image], aci_config)
Batch Inference
Python — Batch predictions with MLflow
import mlflow
import pandas as pd
# Load production model
model = mlflow.pyfunc.load_model("models:/churn-predictor/Production")
# Load batch data
batch_data = pd.read_parquet("s3://data/daily_customers.parquet")
# Generate predictions
predictions = model.predict(batch_data)
# Save results
results = batch_data.assign(
churn_prediction=predictions,
prediction_date=pd.Timestamp.now(),
model_version="Production",
)
results.to_parquet("s3://data/predictions/daily_churn.parquet")
Monitoring Deployed Models
After deployment: Monitor prediction latency, throughput, error rates, and prediction distributions. Set up alerts for anomalies. Log all predictions for later analysis and drift detection. See the MLOps Monitoring lesson for detailed guidance.
Lilly Tech Systems