Advanced

Exam Tips & Practice

Your final review before taking the Databricks MLflow certification. This lesson includes a quick reference cheat sheet, additional practice questions covering all topics, common mistakes that cause failures, and a comprehensive FAQ section.

Quick Reference Cheat Sheet

Review this the night before your exam. These are the most important API calls and concepts organized by topic.

# MLflow Certification - Quick Reference

# ═══════════════════════════════════════════
# TRACKING
# ═══════════════════════════════════════════
import mlflow

mlflow.set_experiment("experiment-name")     # Set active experiment
mlflow.set_tracking_uri("http://host:5000")  # Connect to remote server

with mlflow.start_run(run_name="my-run"):    # Start a tracked run
    mlflow.log_param("key", "value")         # Log single parameter (immutable, string)
    mlflow.log_params({"k1": "v1"})          # Log multiple parameters
    mlflow.log_metric("accuracy", 0.95)      # Log single metric (numeric)
    mlflow.log_metrics({"acc": 0.95})        # Log multiple metrics
    mlflow.log_metric("loss", 0.1, step=1)   # Log metric with step
    mlflow.log_artifact("file.pkl")          # Log single file
    mlflow.log_artifacts("dir/")             # Log all files in directory
    mlflow.autolog()                         # Enable autologging

# Search runs
runs = mlflow.search_runs(
    filter_string="metrics.accuracy > 0.9 AND params.model = 'rf'"
)

# ═══════════════════════════════════════════
# MODELS & REGISTRY
# ═══════════════════════════════════════════
from mlflow.models import infer_signature

sig = infer_signature(X_train, predictions)  # Create model signature
mlflow.sklearn.log_model(model, "model",     # Log with registration
    signature=sig,
    registered_model_name="my-model")

# Load models
model = mlflow.pyfunc.load_model("runs:/<id>/model")     # By run
model = mlflow.pyfunc.load_model("models:/name/Production") # By stage
model = mlflow.pyfunc.load_model("models:/name/2")          # By version

# Stage transitions
from mlflow.tracking import MlflowClient
client = MlflowClient()
client.transition_model_version_stage("name", 2, "Production",
    archive_existing_versions=True)

# ═══════════════════════════════════════════
# PROJECTS
# ═══════════════════════════════════════════
# MLproject file: name, conda_env/docker_env/python_env, entry_points
# Run: mlflow run . -P param=value -e entry_point
mlflow.projects.run(".", parameters={"lr": 0.01}, version="v1.0")

# ═══════════════════════════════════════════
# DEPLOYMENT
# ═══════════════════════════════════════════
# Serve:  mlflow models serve --model-uri models:/name/Production --port 5001
# Docker: mlflow models build-docker --model-uri models:/name/1 --name img
# Batch:  model = mlflow.pyfunc.load_model(uri); model.predict(df)
# Spark:  udf = mlflow.pyfunc.spark_udf(spark, uri)
# Endpoints: POST /invocations (predictions), GET /ping (health)

Common Mistakes That Cause Failures

# Top 10 mistakes on the Databricks MLflow Certification exam

common_mistakes = [
    {
        "mistake": "Confusing log_param() with log_params()",
        "topic": "Tracking",
        "fix": "log_param(key, value) = single, log_params(dict) = multiple",
        "severity": "HIGH"
    },
    {
        "mistake": "Forgetting that parameters are stored as strings",
        "topic": "Tracking",
        "fix": "In search filters, param values must be quoted: params.x = '100'",
        "severity": "HIGH"
    },
    {
        "mistake": "Confusing log_artifact() with log_artifacts()",
        "topic": "Tracking",
        "fix": "log_artifact(file) = single file, log_artifacts(dir) = directory",
        "severity": "MEDIUM"
    },
    {
        "mistake": "Not knowing the pyfunc universal flavor",
        "topic": "Models",
        "fix": "mlflow.pyfunc.load_model() works for ANY logged model",
        "severity": "HIGH"
    },
    {
        "mistake": "Confusing runs:/ URI with models:/ URI",
        "topic": "Models & Registry",
        "fix": "runs:/ = specific run, models:/ = Model Registry (stage or version)",
        "severity": "HIGH"
    },
    {
        "mistake": "Not knowing Model Registry stages",
        "topic": "Registry",
        "fix": "Four stages: None, Staging, Production, Archived",
        "severity": "MEDIUM"
    },
    {
        "mistake": "Thinking MLproject file has a .yaml extension",
        "topic": "Projects",
        "fix": "File must be named exactly 'MLproject' (no extension)",
        "severity": "MEDIUM"
    },
    {
        "mistake": "Confusing mlflow models serve with mlflow server",
        "topic": "Deployment",
        "fix": "serve = model REST API, server = tracking server",
        "severity": "HIGH"
    },
    {
        "mistake": "Not knowing the /invocations input formats",
        "topic": "Deployment",
        "fix": "dataframe_split (recommended), dataframe_records, instances, csv",
        "severity": "MEDIUM"
    },
    {
        "mistake": "Forgetting archive_existing_versions in stage transitions",
        "topic": "Registry",
        "fix": "Set archive_existing_versions=True when promoting to Production",
        "severity": "MEDIUM"
    }
]

Additional Practice Questions

These questions cover topics across all exam areas. Try to answer each one before revealing the solution.

Question 1

What is the difference between mlflow.set_experiment() and mlflow.create_experiment()?

Show Answer

set_experiment() sets the active experiment by name, creating it if it does not exist. create_experiment() creates a new experiment and returns its ID, but raises an error if the experiment already exists. Use set_experiment() for idempotent workflows.

Question 2

You have a model logged in run "abc123". Write the code to register it in the Model Registry as "churn-predictor".

Show Answer

mlflow.register_model(model_uri="runs:/abc123/model", name="churn-predictor") — This registers the model from the specified run. Alternatively, you can pass registered_model_name="churn-predictor" when calling log_model().

Question 3

What happens when you call mlflow.autolog() without an active run and then call model.fit()?

Show Answer

MLflow automatically creates a new run when model.fit() is called. Autologging does not require an explicit mlflow.start_run(). The run is auto-created and auto-ended after .fit() completes.

Question 4

What is the difference between the backend store and the artifact store in MLflow Tracking Server?

Show Answer

The backend store holds experiment/run metadata (parameters, metrics, tags) and is configured with --backend-store-uri (database). The artifact store holds files (models, plots, data) and is configured with --default-artifact-root (S3, Azure Blob, etc.).

Question 5

Write the MLflow CLI command to build a Docker image named "fraud-model" from a Production model called "fraud-detector".

Show Answer

mlflow models build-docker --model-uri models:/fraud-detector/Production --name fraud-model — This creates a self-contained Docker image that serves on port 8080 with /invocations and /ping endpoints.

Question 6

In an MLproject file, what is the difference between conda_env and python_env?

Show Answer

conda_env references a conda.yaml file and creates a full conda environment (supports non-Python dependencies). python_env references a python_env.yaml file and uses virtualenv, which is lighter weight but only supports pip packages. Use conda for complex environments, python_env for simple ones.

Question 7

What are the standard steps in an MLflow Recipe, and what does each one do?

Show Answer

ingest (load data), split (train/validation/test split), transform (feature engineering), train (model training), evaluate (compute metrics and check thresholds), register (register model in Registry if it passes evaluation).

Question 8

How do you create a Spark UDF from a registered model for distributed batch scoring?

Show Answer

predict_udf = mlflow.pyfunc.spark_udf(spark, "models:/my-model/Production") — Then apply it: df.withColumn("prediction", predict_udf()). The SparkSession must be the first argument, and the model URI is the second.

Frequently Asked Questions

Is the MLflow certification exam open-book?

No, the Databricks MLflow certification is a proctored exam. You cannot access external resources, documentation, or code editors during the exam. You must know the key API calls, concepts, and patterns from memory. This is why hands-on practice before the exam is essential.

How many questions are on the exam?

The exam typically has 45 multiple-choice questions to be completed in 90 minutes. This gives you roughly 2 minutes per question. Some questions include code snippets that require careful reading. Flag difficult questions and come back to them rather than spending too long on any single question.

What happens if I fail the exam?

If you fail, you can retake the exam after a 14-day waiting period. Each attempt requires the full $200 fee. Use the waiting period to review the topics you struggled with. Focus on hands-on practice rather than just reading documentation. Most candidates who fail once pass on their second attempt with targeted study.

Do I need Databricks experience to pass?

While some questions reference Databricks-specific features (like Databricks Model Serving and Unity Catalog), the majority of the exam focuses on open-source MLflow concepts. You can pass without deep Databricks experience, but knowing the basics of how MLflow integrates with Databricks (workspace, experiments, model serving) will help with a few questions.

Which MLflow version is the exam based on?

The exam is based on MLflow 2.x. Check the Databricks exam guide for the exact version. Key differences from MLflow 1.x include the mlflow.models.infer_signature() function, the python_env project environment type, and MLflow Recipes (which replaced the older Pipelines concept). Make sure your study materials cover MLflow 2.x APIs.

How long is the certification valid?

The Databricks MLflow certification is valid for 2 years from the date you pass. After that, you would need to recertify by taking the current version of the exam. Given how quickly MLflow evolves, recertification ensures your knowledge stays current.

What is the best way to prepare for code snippet questions?

Code snippet questions test your ability to read MLflow code and identify correct or incorrect usage. The best preparation is hands-on practice. Write real MLflow code daily for at least 2 weeks before the exam. Focus on: logging params/metrics/artifacts, registering models, transitioning stages, and using search filters. If you have written the code yourself, you will recognize correct syntax instantly on the exam.

Should I study MLflow Recipes or is it a small part of the exam?

MLflow Recipes (formerly Pipelines) makes up a smaller portion of the exam compared to Tracking and Models/Registry. However, you should still know the key concepts: the 6 standard steps (ingest, split, transform, train, evaluate, register), what profiles are and how they work, and the basic recipe.yaml structure. A few questions on Recipes can make the difference between passing and failing.

Exam Day Strategy

First Pass (60 min)

Answer all easy questions first. Go through all 45 questions and answer the ones you are confident about. Flag anything you are unsure of and move on. Do not spend more than 1 minute on any question during the first pass.

Second Pass (25 min)

Tackle flagged questions. Return to the questions you flagged. Read each one carefully, eliminate wrong answers, and make your best choice. Most questions have one obviously wrong answer you can eliminate immediately.

Final Review (5 min)

Check for mistakes. Review any questions you are still uncertain about. Make sure you have answered every question — there is no penalty for guessing. Double-check that you did not misread any code snippets.

Key Takeaways

💡

The exam is 45 questions in 90 minutes — about 2 minutes per question
Tracking and Models/Registry make up ~60% of the exam — master these first
Know the exact API call names: log_param vs log_params, log_artifact vs log_artifacts
Understand the runs:/ vs models:/ URI schemes and when to use each
Practice writing MLflow code hands-on — the exam tests practical knowledge, not theory
Use the two-pass strategy: easy questions first, flagged questions second, then review
Review this cheat sheet the night before your exam for last-minute reinforcement

← Previous Model Deployment