Advanced

MLflow Best Practices

Organize experiments effectively, collaborate with your team, integrate with CI/CD, and avoid common mistakes.

Experiment Organization

Structure your experiments for discoverability and clarity:

  • One experiment per ML task: e.g., "churn-prediction", "fraud-detection", "recommendation-ranking".
  • Use nested runs: Group related runs (hyperparameter sweeps, cross-validation folds) under a parent run.
  • Separate dev and prod: Use different experiments for exploratory work vs. production training.
Python — Organized experiment structure
import mlflow

# Experiment naming convention: team/project/task
mlflow.set_experiment("payments/fraud-detection/v2")

# Use nested runs for hyperparameter sweeps
with mlflow.start_run(run_name="hyperparam-sweep-2025-03") as parent:
    for lr in [0.001, 0.01, 0.1]:
        with mlflow.start_run(run_name=f"lr-{lr}", nested=True):
            mlflow.log_param("learning_rate", lr)
            model = train(lr=lr)
            mlflow.log_metric("accuracy", evaluate(model))

Naming Conventions

EntityConventionExample
Experimentsteam/project/taskpayments/fraud-detection/v2
Runsdescriptive-namexgboost-tuned-v3, baseline-logistic
Registered Modelskebab-casefraud-detector, churn-predictor
Tagssnake_case keysdata_version, feature_set, team

Tagging Strategy

Python — Recommended tags
with mlflow.start_run():
    # Standard tags to always set
    mlflow.set_tags({
        "team": "fraud-detection",
        "data_version": "v2.3",
        "feature_set": "v3",
        "environment": "development",
        "git_commit": get_git_commit(),
        "developer": "alice",
        "purpose": "hyperparameter_tuning",
    })

Storage Optimization

  • Don't log huge artifacts: Avoid logging entire datasets. Log data checksums and paths instead.
  • Clean up failed runs: Delete runs that failed or were abandoned to save storage.
  • Use artifact TTLs: Set up lifecycle policies on your artifact storage (S3, GCS) to archive old artifacts.
  • Compress artifacts: Log compressed versions of large files.

Team Collaboration

Shared tracking server: Set up a central tracking server so all team members log to the same place. Use experiment-level permissions to control access. Standardize tags so everyone can find and compare relevant runs.

CI/CD Integration

YAML — GitHub Actions with MLflow
name: ML Training Pipeline

on:
  push:
    branches: [main]
    paths: ['src/**', 'configs/**']

jobs:
  train-and-register:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Train and evaluate
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          python train.py --config configs/production.yaml
          python evaluate.py --min-accuracy 0.90

      - name: Register model if improved
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: python register_if_better.py

Migration Guide

Moving to MLflow from another tool? Key steps:

  1. Install and configure

    Set up the tracking server, choose your backend store and artifact location.

  2. Start with new experiments

    Don't try to migrate old data. Begin logging new experiments to MLflow.

  3. Adopt autologging

    Add mlflow.autolog() to existing training scripts for instant tracking with zero code changes.

  4. Standardize gradually

    Introduce naming conventions, tagging standards, and model registry workflows over time.

Common Pitfalls

  • Not setting signatures: Without signatures, serving errors are hard to debug. Always infer and log the model signature.
  • Logging too much: Don't log every intermediate computation. Focus on parameters, key metrics, and the final model.
  • Ignoring the registry: Using run IDs in production code instead of model registry URIs makes updates difficult.
  • No cleanup: Old experiments and artifacts accumulate. Set up regular cleanup jobs.
  • Single-user mindset: Not setting up a shared tracking server limits collaboration and reproducibility.

Frequently Asked Questions

Absolutely. MLflow works seamlessly in Jupyter notebooks. Just import mlflow, set your tracking URI, and use the same API. Autologging works in notebooks too. The MLflow UI runs in a separate browser tab for viewing results.

Never log sensitive values as parameters. Use environment variables for secrets and log only non-sensitive configuration. If you need to track which credentials were used, log a reference (e.g., "secret_version=v3") rather than the actual value.

Yes. MLflow has an R client (install.packages("mlflow")) and a Java/Scala client. The REST API can also be used from any language. However, the Python client has the most features and the best framework integration.

Back up two things: (1) the backend store (database dump of PostgreSQL/MySQL) and (2) the artifact store (S3 bucket replication, GCS versioning, etc.). For SQLite, simply copy the mlruns directory. Set up automated backups on a schedule.

log_model saves the model as an artifact within a run. register_model adds it to the Model Registry with a name and version. You can do both at once by passing registered_model_name to log_model. Think of logging as saving, and registering as publishing.