Intermediate
MLflow on Databricks
Master experiment tracking, model registry, and model serving using Databricks' fully managed MLflow integration for end-to-end ML lifecycle management.
Managed MLflow on Databricks
MLflow is an open-source platform for managing the ML lifecycle, created by Databricks. On the Databricks platform, MLflow is fully managed with zero setup — experiment tracking, the model registry, and model serving are pre-configured and integrated with Unity Catalog.
Good to know: MLflow on Databricks automatically logs experiments from notebook runs, provides a visual UI for comparing models, and integrates with Unity Catalog for governed model management across workspaces.
Experiment Tracking
Track every training run with parameters, metrics, and artifacts:
MLflow Experiment Tracking
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Auto-logging captures parameters, metrics, and model
mlflow.autolog()
with mlflow.start_run(run_name="rf_experiment"):
# Train model
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)
# Log custom metrics
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
mlflow.log_metric("test_accuracy", accuracy)
# Log artifacts (plots, data samples, etc.)
mlflow.log_artifact("feature_importance.png")
Model Registry with Unity Catalog
The Unity Catalog model registry provides governed model management:
| Feature | Description |
|---|---|
| Model versioning | Track every model version with lineage back to training data and code |
| Stage transitions | Promote models through stages: None → Staging → Production |
| Access control | Unity Catalog permissions govern who can read, write, or deploy models |
| Cross-workspace | Share models across workspaces using Unity Catalog's three-level namespace |
| Lineage | Automatic lineage from model to training run, datasets, and notebooks |
Model Serving
Deploy models as production REST API endpoints directly from the registry:
- Serverless serving: Auto-scaling endpoints with pay-per-request pricing and zero infrastructure management
- GPU serving: Serve large models and LLMs on GPU instances with optimized inference
- A/B testing: Route traffic between model versions for safe production rollouts
- Feature serving: Low-latency feature lookup integrated with Feature Store
- Monitoring: Built-in payload logging, drift detection, and performance dashboards
Feature Store
Databricks Feature Store provides centralized feature management:
- Define features as Delta tables governed by Unity Catalog
- Automatic feature lookup at training and inference time
- Point-in-time lookups to prevent data leakage in time-series problems
- Online feature serving with low-latency lookups for real-time models
Key takeaway: MLflow on Databricks provides a complete ML lifecycle platform. Combined with Unity Catalog governance, it enables enterprises to track experiments, manage model versions, deploy to production, and maintain full auditability of their ML assets.
Lilly Tech Systems