Intermediate

MLflow Tracking

MLflow Tracking is the most heavily tested component on the certification exam (~30%). This lesson covers experiments, runs, logging parameters, metrics, and artifacts, autologging, the search API, and includes practice questions that mirror the exam format.

Experiments and Runs

The fundamental units of MLflow Tracking are experiments and runs. An experiment is a named group of runs (like a project), and a run is a single execution of your ML code where you log data.

import mlflow

# Set or create an experiment
mlflow.set_experiment("my-classification-experiment")

# Start a run (context manager ensures run ends properly)
with mlflow.start_run(run_name="random-forest-v1"):
    # Your ML code goes here
    # Everything logged inside this block belongs to this run
    pass

# You can also nest runs for hyperparameter sweeps
with mlflow.start_run(run_name="parent-sweep"):
    for n_estimators in [50, 100, 200]:
        with mlflow.start_run(run_name=f"rf-{n_estimators}", nested=True):
            # Train model with this config
            pass

💡

Exam tip: Know the difference between mlflow.set_experiment() (sets the active experiment by name, creates if it does not exist) and mlflow.create_experiment() (creates a new experiment and returns its ID, raises error if it already exists).

Logging Parameters

Parameters are key-value pairs that describe the configuration of your run. They are typically hyperparameters or settings that do not change during training.

import mlflow

with mlflow.start_run():
    # Log a single parameter
    mlflow.log_param("n_estimators", 100)
    mlflow.log_param("max_depth", 5)
    mlflow.log_param("learning_rate", 0.01)

    # Log multiple parameters at once (more efficient)
    mlflow.log_params({
        "n_estimators": 100,
        "max_depth": 5,
        "learning_rate": 0.01,
        "model_type": "random_forest"
    })

# IMPORTANT: Parameter values are stored as strings
# mlflow.log_param("epochs", 50) stores "50" not 50
# Parameters are immutable - you cannot update them after logging

Logging Metrics

Metrics are numeric values that you want to track over time. Unlike parameters, metrics can be logged multiple times (e.g., loss at each epoch) to create a time series.

import mlflow

with mlflow.start_run():
    # Log a single metric
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.92)

    # Log multiple metrics at once
    mlflow.log_metrics({
        "accuracy": 0.95,
        "f1_score": 0.92,
        "precision": 0.94,
        "recall": 0.90
    })

    # Log metric with step (for time series - e.g., loss per epoch)
    for epoch in range(10):
        train_loss = 1.0 / (epoch + 1)  # Example loss
        mlflow.log_metric("train_loss", train_loss, step=epoch)

# IMPORTANT: Metrics are numeric (int or float)
# Metrics CAN be updated - logging same key adds a new step
# Use step parameter for epoch-level tracking

Logging Artifacts

Artifacts are output files like models, plots, data files, or any other files you want to associate with a run.

import mlflow
import matplotlib.pyplot as plt
import json

with mlflow.start_run():
    # Log a single file as an artifact
    mlflow.log_artifact("model.pkl")

    # Log a file into a specific subdirectory
    mlflow.log_artifact("confusion_matrix.png", artifact_path="plots")

    # Log an entire directory of artifacts
    mlflow.log_artifacts("./output_dir", artifact_path="results")

    # Log a dict as a JSON artifact
    mlflow.log_dict({"threshold": 0.5, "classes": ["cat", "dog"]}, "config.json")

    # Log a figure directly (matplotlib)
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3], [1, 4, 9])
    mlflow.log_figure(fig, "training_curve.png")

    # Log text content
    mlflow.log_text("Model trained on 2026-03-21", "notes.txt")

# IMPORTANT: Artifacts are stored in the artifact store (local or remote)
# log_artifact() logs a single file
# log_artifacts() logs all files in a directory

Autologging

MLflow autologging automatically captures parameters, metrics, and models for supported ML frameworks. This is a major exam topic.

import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Enable autologging for all supported frameworks
mlflow.autolog()

# Or enable for a specific framework
mlflow.sklearn.autolog()
mlflow.tensorflow.autolog()
mlflow.pytorch.autolog()
mlflow.xgboost.autolog()
mlflow.lightgbm.autolog()
mlflow.spark.autolog()

# Example: sklearn autologging
mlflow.sklearn.autolog()

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Autologging captures everything automatically:
# - Parameters: n_estimators, max_depth, etc.
# - Metrics: accuracy, f1, precision, recall
# - Artifacts: model, confusion matrix, feature importances
# - Model signature and input example
with mlflow.start_run():
    clf = RandomForestClassifier(n_estimators=100)
    clf.fit(X_train, y_train)  # Autolog triggers on .fit()

# Disable autologging
mlflow.sklearn.autolog(disable=True)

# EXAM TIP: Autologging is triggered by model.fit()
# It does NOT require mlflow.start_run() - it creates one if needed
# Know which frameworks support autologging (sklearn, tf, pytorch, xgboost, etc.)

Searching and Querying Runs

The MLflow search API lets you query runs across experiments using a SQL-like filter syntax. This is frequently tested on the exam.

import mlflow

# Search runs in an experiment
runs = mlflow.search_runs(
    experiment_ids=["1"],
    filter_string="metrics.accuracy > 0.9 AND params.model_type = 'random_forest'",
    order_by=["metrics.accuracy DESC"],
    max_results=10
)

# Filter string syntax examples
filters = [
    "metrics.accuracy > 0.9",                              # Metric filter
    "params.n_estimators = '100'",                          # Param filter (string value!)
    "tags.mlflow.runName = 'best-model'",                   # Tag filter
    "attributes.status = 'FINISHED'",                       # Status filter
    "metrics.f1 > 0.8 AND params.max_depth = '5'",         # Combined filters
    "params.model_type LIKE 'random%'",                     # LIKE operator
    "params.model_type IN ('rf', 'xgb')",                   # IN operator
]

# The result is a pandas DataFrame with all run data
print(runs[["run_id", "params.n_estimators", "metrics.accuracy"]])

# Using MlflowClient for more control
from mlflow.tracking import MlflowClient

client = MlflowClient()
runs = client.search_runs(
    experiment_ids=["1"],
    filter_string="metrics.accuracy > 0.9",
    run_view_type=mlflow.entities.ViewType.ACTIVE_ONLY
)

# EXAM TIP: Parameter values in filters are ALWAYS strings (quoted)
# Metric values are numeric (not quoted)
# Know the difference between mlflow.search_runs() and client.search_runs()

Tracking Server Architecture

Understanding the MLflow Tracking Server components is essential for the exam. The server has two storage backends.

# MLflow Tracking Server - Two storage components

tracking_architecture = {
    "Backend Store": {
        "purpose": "Stores experiment/run metadata, parameters, metrics, tags",
        "options": [
            "File store (local ./mlruns directory - default)",
            "Database store (SQLite, MySQL, PostgreSQL)",
        ],
        "configured_with": "--backend-store-uri",
        "examples": [
            "sqlite:///mlflow.db",
            "postgresql://user:pass@host:5432/mlflow",
            "mysql://user:pass@host:3306/mlflow"
        ]
    },
    "Artifact Store": {
        "purpose": "Stores artifacts (models, plots, data files)",
        "options": [
            "Local filesystem (default)",
            "Amazon S3",
            "Azure Blob Storage",
            "Google Cloud Storage",
            "HDFS"
        ],
        "configured_with": "--default-artifact-root",
        "examples": [
            "s3://my-bucket/mlflow-artifacts",
            "wasbs://container@account.blob.core.windows.net/mlflow",
            "gs://my-bucket/mlflow-artifacts"
        ]
    }
}

# Start a tracking server with remote storage
# mlflow server \
#   --backend-store-uri postgresql://user:pass@host/mlflow \
#   --default-artifact-root s3://my-bucket/artifacts \
#   --host 0.0.0.0 \
#   --port 5000

# Connect client to remote tracking server
# mlflow.set_tracking_uri("http://my-server:5000")

# EXAM TIP: Know the difference between backend store and artifact store
# Backend store = metadata (params, metrics, tags)
# Artifact store = files (models, plots, data)

Practice Questions

Test your understanding of MLflow Tracking with these exam-style questions.

Question 1

Which function would you use to log multiple hyperparameters at once?

A) mlflow.log_param()

B) mlflow.log_params()

C) mlflow.log_metrics()

D) mlflow.set_params()

Show Answer

B) mlflow.log_params() — This accepts a dictionary of key-value pairs and logs them all as parameters. log_param() logs a single parameter. log_metrics() is for metrics, not parameters. set_params() does not exist in the MLflow API.

Question 2

What is the correct filter string to find runs where accuracy is greater than 0.9 and model_type parameter is "xgboost"?

A) "accuracy > 0.9 AND model_type = 'xgboost'"

B) "metrics.accuracy > 0.9 AND params.model_type = 'xgboost'"

C) "metrics.accuracy > '0.9' AND params.model_type = xgboost"

D) "run.accuracy > 0.9 AND run.model_type = 'xgboost'"

Show Answer

B) — Metric filters use metrics. prefix with unquoted numeric values. Parameter filters use params. prefix with quoted string values. Both prefixes are required in the filter string.

Question 3

When using mlflow.sklearn.autolog(), what triggers the automatic logging?

A) Calling mlflow.start_run()

B) Calling model.fit()

C) Calling mlflow.end_run()

D) Calling model.predict()

Show Answer

B) model.fit() — Autologging is triggered when .fit() is called on the model. If no active run exists, autologging creates one automatically. It does not require an explicit mlflow.start_run().

Question 4

What does the backend store in MLflow Tracking Server store?

A) Model files and plot images

B) Parameters, metrics, tags, and run metadata

C) Docker images and conda environments

D) Source code and Git commits

Show Answer

B) — The backend store holds experiment/run metadata including parameters, metrics, and tags. Model files, plots, and other output files go in the artifact store. This is a key distinction for the exam.

Question 5

Which statement about MLflow parameters is TRUE?

A) Parameters can be updated after logging

B) Parameters are stored as their original Python types

C) Parameters are immutable and stored as strings

D) Parameters can be logged with a step value like metrics

Show Answer

C) — Parameters are immutable (cannot be changed once logged) and are always stored as strings regardless of the Python type passed in. Metrics, not parameters, support step values for time series logging.

Question 6

You want to log all files in a directory called "output" as artifacts under the path "results". Which call is correct?

A) mlflow.log_artifact("output", "results")

B) mlflow.log_artifacts("output", "results")

C) mlflow.log_artifact("output/", artifact_path="results")

D) mlflow.log_dir("output", "results")

Show Answer

B) mlflow.log_artifacts("output", "results") — log_artifacts() (plural) logs all files in a directory. log_artifact() (singular) logs a single file. log_dir() does not exist in the MLflow API.

Key Takeaways

💡

MLflow Tracking has two core concepts: experiments (groups of runs) and runs (single executions)
Parameters are immutable strings; metrics are numeric and support step-based logging
Artifacts are files (models, plots, data) stored in the artifact store, separate from the backend store
Autologging is triggered by model.fit() and works with sklearn, TensorFlow, PyTorch, XGBoost, and more
Search runs using mlflow.search_runs() with SQL-like filter strings — parameters are quoted, metrics are not
The tracking server has two backends: backend store (metadata) and artifact store (files)

← Previous Exam Overview Next → Models & Registry