MLflow Best Practices
Organize experiments effectively, collaborate with your team, integrate with CI/CD, and avoid common mistakes.
Experiment Organization
Structure your experiments for discoverability and clarity:
- One experiment per ML task: e.g., "churn-prediction", "fraud-detection", "recommendation-ranking".
- Use nested runs: Group related runs (hyperparameter sweeps, cross-validation folds) under a parent run.
- Separate dev and prod: Use different experiments for exploratory work vs. production training.
import mlflow
# Experiment naming convention: team/project/task
mlflow.set_experiment("payments/fraud-detection/v2")
# Use nested runs for hyperparameter sweeps
with mlflow.start_run(run_name="hyperparam-sweep-2025-03") as parent:
for lr in [0.001, 0.01, 0.1]:
with mlflow.start_run(run_name=f"lr-{lr}", nested=True):
mlflow.log_param("learning_rate", lr)
model = train(lr=lr)
mlflow.log_metric("accuracy", evaluate(model))
Naming Conventions
| Entity | Convention | Example |
|---|---|---|
| Experiments | team/project/task | payments/fraud-detection/v2 |
| Runs | descriptive-name | xgboost-tuned-v3, baseline-logistic |
| Registered Models | kebab-case | fraud-detector, churn-predictor |
| Tags | snake_case keys | data_version, feature_set, team |
Tagging Strategy
with mlflow.start_run():
# Standard tags to always set
mlflow.set_tags({
"team": "fraud-detection",
"data_version": "v2.3",
"feature_set": "v3",
"environment": "development",
"git_commit": get_git_commit(),
"developer": "alice",
"purpose": "hyperparameter_tuning",
})
Storage Optimization
- Don't log huge artifacts: Avoid logging entire datasets. Log data checksums and paths instead.
- Clean up failed runs: Delete runs that failed or were abandoned to save storage.
- Use artifact TTLs: Set up lifecycle policies on your artifact storage (S3, GCS) to archive old artifacts.
- Compress artifacts: Log compressed versions of large files.
Team Collaboration
CI/CD Integration
name: ML Training Pipeline
on:
push:
branches: [main]
paths: ['src/**', 'configs/**']
jobs:
train-and-register:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Train and evaluate
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: |
python train.py --config configs/production.yaml
python evaluate.py --min-accuracy 0.90
- name: Register model if improved
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
run: python register_if_better.py
Migration Guide
Moving to MLflow from another tool? Key steps:
Install and configure
Set up the tracking server, choose your backend store and artifact location.
Start with new experiments
Don't try to migrate old data. Begin logging new experiments to MLflow.
Adopt autologging
Add
mlflow.autolog()to existing training scripts for instant tracking with zero code changes.Standardize gradually
Introduce naming conventions, tagging standards, and model registry workflows over time.
Common Pitfalls
- Not setting signatures: Without signatures, serving errors are hard to debug. Always infer and log the model signature.
- Logging too much: Don't log every intermediate computation. Focus on parameters, key metrics, and the final model.
- Ignoring the registry: Using run IDs in production code instead of model registry URIs makes updates difficult.
- No cleanup: Old experiments and artifacts accumulate. Set up regular cleanup jobs.
- Single-user mindset: Not setting up a shared tracking server limits collaboration and reproducibility.
Frequently Asked Questions
Absolutely. MLflow works seamlessly in Jupyter notebooks. Just import mlflow, set your tracking URI, and use the same API. Autologging works in notebooks too. The MLflow UI runs in a separate browser tab for viewing results.
Never log sensitive values as parameters. Use environment variables for secrets and log only non-sensitive configuration. If you need to track which credentials were used, log a reference (e.g., "secret_version=v3") rather than the actual value.
Yes. MLflow has an R client (install.packages("mlflow")) and a Java/Scala client. The REST API can also be used from any language. However, the Python client has the most features and the best framework integration.
Back up two things: (1) the backend store (database dump of PostgreSQL/MySQL) and (2) the artifact store (S3 bucket replication, GCS versioning, etc.). For SQLite, simply copy the mlruns directory. Set up automated backups on a schedule.
log_model saves the model as an artifact within a run. register_model adds it to the Model Registry with a name and version. You can do both at once by passing registered_model_name to log_model. Think of logging as saving, and registering as publishing.
Lilly Tech Systems