Advanced

Practice Exam

25 exam-style questions covering all domains of the dbt Analytics Engineering certification. Try to answer each question before reading the explanation. Target time: 35 minutes (matching the real exam pace of ~1.4 min/question).

💡

Exam simulation: Cover the answer explanations as you go. Write down your answers first, then check. A passing score would be roughly 18/25 correct (72%).

Question 1 — Models

What SQL statement does a dbt model file contain?

A) CREATE TABLE AS SELECT
B) A SELECT statement only
C) INSERT INTO ... SELECT
D) DDL statements like CREATE TABLE and ALTER TABLE

Answer: B — dbt models contain only a SELECT statement. dbt handles all DDL/DML (CREATE TABLE, INSERT, MERGE, etc.) automatically based on the model's materialization configuration. This separation of business logic from warehouse operations is a core dbt principle.

Question 2 — ref()

Why should you always use {{ ref('model_name') }} instead of hardcoding table names in your dbt models?

A) ref() runs queries faster
B) ref() creates dependencies in the DAG, enables environment-aware schema resolution, and ensures correct build order
C) ref() provides automatic data encryption
D) ref() is required for SQL syntax validation

Answer: B — ref() serves three critical purposes: (1) it creates model dependencies in the DAG so dbt builds models in the correct order, (2) it resolves table names dynamically based on the target environment (dev, staging, prod), and (3) it enables lineage tracking for documentation and impact analysis.

Question 3 — Materializations

Which materialization creates NO object in the warehouse and instead injects the model's SQL as a CTE in downstream models?

A) view
B) table
C) incremental
D) ephemeral

Answer: D — Ephemeral models are not created in the warehouse. Their SQL is injected as a Common Table Expression (CTE) wherever they are referenced by downstream models. This reduces warehouse objects but means ephemeral models cannot be queried directly.

Question 4 — Sources

What does dbt source freshness check?

A) Whether the dbt project code is up to date
B) Whether the source tables have been updated recently based on the loaded_at_field configuration
C) Whether the dbt packages need updating
D) Whether the warehouse connection is active

Answer: B — Source freshness checks the loaded_at_field column in source tables to determine when data was last loaded. It compares this against the warn_after and error_after thresholds defined in the source YAML to alert you when data is stale.

Question 5 — Testing

A singular data test SQL file returns 5 rows when executed. What does this mean?

A) The test passed with 5 successful checks
B) The test failed because 5 rows violated the test condition
C) The test needs 5 more rows to be valid
D) The test is inconclusive

Answer: B — In dbt, a singular data test passes only when the query returns zero rows. Any rows returned represent violations of the test condition. In this case, 5 rows failed the test.

Question 6 — Incremental Models

What does the is_incremental() macro return on the FIRST run of an incremental model?

A) TRUE
B) FALSE
C) NULL
D) An error

Answer: B — On the first run, the target table does not exist yet, so is_incremental() returns FALSE. This means the WHERE clause inside the {% if is_incremental() %} block is skipped, and dbt processes ALL data to build the initial table. On subsequent runs, it returns TRUE and only processes new data.

Question 7 — Snapshots

What type of Slowly Changing Dimension do dbt snapshots implement?

A) SCD Type 1 (overwrite)
B) SCD Type 2 (historical tracking with valid_from/valid_to)
C) SCD Type 3 (current and previous value columns)
D) SCD Type 0 (never change)

Answer: B — dbt snapshots implement SCD Type 2, which maintains a complete history of changes. Each row gets dbt_valid_from and dbt_valid_to timestamps. When a record changes, the old row's dbt_valid_to is set, and a new row is inserted with the current timestamp as dbt_valid_from.

Question 8 — Jinja

Which Jinja delimiters are used for control flow statements like if/for loops in dbt?

A) {{ }}
B) {% %}
C) {# #}
D) {[ ]}

Answer: B — {% %} is used for control flow statements (if, for, set, macro). {{ }} outputs expressions. {# #} is for comments. These are standard Jinja2 delimiters.

Question 9 — Macros

Where should reusable Jinja macros be stored in a dbt project?

A) In the models/ directory
B) In the macros/ directory
C) In the tests/ directory
D) In the seeds/ directory

Answer: B — Macros are stored as .sql files in the macros/ directory. They are available to all models, tests, and other macros in the project. Macros defined in installed packages are also accessible.

Question 10 — Packages

Q10

Which command installs dbt packages defined in packages.yml?

A) dbt install
B) dbt deps
C) dbt packages
D) dbt init

Answer: B — dbt deps (short for dependencies) reads packages.yml and downloads all specified packages into the dbt_packages/ directory. This must be run before dbt run if you use any external packages.

Question 11 — Documentation

Q11

Which two commands generate and serve the dbt documentation website?

A) dbt docs build and dbt docs deploy
B) dbt docs generate and dbt docs serve
C) dbt build docs and dbt serve docs
D) dbt compile and dbt serve

Answer: B — dbt docs generate creates a catalog.json file with metadata about all models, sources, and tests. dbt docs serve launches a local web server to browse the documentation and DAG visualization.

Question 12 — Seeds

Q12

Which dbt command loads CSV files from the seeds/ directory into the warehouse?

A) dbt run
B) dbt seed
C) dbt load
D) dbt import

Answer: B — dbt seed reads CSV files from the seeds/ directory and loads them into the warehouse as tables. Seeds are version-controlled in Git and are designed for small, static reference data.

Question 13 — Project Structure

Q13

In a well-structured dbt project, where should raw source table references appear?

A) In any model throughout the project
B) Only in staging models, using the source() function
C) Only in mart models
D) In the dbt_project.yml file

Answer: B — Best practice dictates that raw source references (via the source() function) should only appear in staging models. All other models should reference staging or intermediate models using ref(). This creates a clean abstraction layer between raw data and transformed models.

Question 14 — Testing

Q14

Which built-in dbt test checks referential integrity between two models?

A) unique
B) not_null
C) accepted_values
D) relationships

Answer: D — The relationships test verifies that every value in a column exists in a specified column of another model, similar to a foreign key constraint. It is configured with to: ref('other_model') and field: column_name.

Question 15 — Incremental Strategies

Q15

Which incremental strategy should be used when the source data can have updates to existing rows AND new rows?

A) append
B) merge
C) delete+insert
D) insert_overwrite

Answer: B — The merge strategy uses a MERGE (or UPSERT) statement that handles both updates to existing rows (matched by unique_key) and inserts of new rows. Append (A) only inserts and would create duplicates. Delete+insert (C) works but is less efficient. Insert_overwrite (D) replaces entire partitions.

Question 16 — Hooks

Q16

When does a post-hook execute?

A) Before the model is compiled
B) After the model is successfully built in the warehouse
C) After dbt deps is run
D) Before tests are executed

Answer: B — A post-hook runs SQL immediately after a model is successfully materialized in the warehouse. Common uses include granting permissions, creating indexes, and running ANALYZE statements.

Question 17 — dbt Commands

Q17

Which dbt command both runs models AND executes tests in a single invocation?

A) dbt run --test
B) dbt build
C) dbt test --run
D) dbt run && dbt test

Answer: B — dbt build runs models, tests, snapshots, and seeds in DAG order. It runs tests immediately after each model is built, stopping early if critical tests fail. This is the recommended command for CI/CD pipelines.

Question 18 — Environments

Q18

How does dbt ensure that development runs do not affect production tables?

A) By locking production tables during development
B) By using target-specific schemas configured in profiles.yml so dev models are built in a separate schema
C) By running development queries as read-only
D) By automatically backing up production data before each run

Answer: B — The profiles.yml file defines different targets (dev, prod) with different schemas. When a developer runs dbt locally, models are built in the dev schema (e.g., dev_analyst_name). Production runs use the production schema. ref() handles the schema resolution automatically.

Question 19 — DAG

Q19

What does the dbt DAG represent?

A) A database access group for security
B) A Directed Acyclic Graph showing the dependencies and execution order of all models
C) A data analysis guidebook
D) A deployment automation gateway

Answer: B — The DAG (Directed Acyclic Graph) visualizes all model dependencies based on ref() and source() calls. It determines the execution order of models and enables features like model selection (e.g., dbt run --select model_name+ to run a model and all its downstream dependencies).

Question 20 — dbt for ML

Q20

A data scientist wants to document that their churn prediction ML model depends on the fct_customer_features dbt model. Which dbt feature should they use?

A) A custom test
B) An exposure
C) A snapshot
D) A seed

Answer: B — Exposures document downstream consumers of dbt models that exist outside of dbt, such as ML models, dashboards, and applications. They create lineage visibility in the DAG and identify ownership.

Question 21 — Tags

Q21

An analytics engineer wants to run only the ML-related models in their dbt project. They have tagged all ML models with tags: ['ml']. Which command runs only those models?

A) dbt run --models ml
B) dbt run --select tag:ml
C) dbt run --tag ml
D) dbt run --filter ml

Answer: B — The --select tag:ml syntax selects all models with the specified tag. Tags can be applied at the model, directory, or project level. This enables selective execution of model subsets.

Question 22 — Materializations

Q22

A staging model is queried infrequently but must always show the latest data. Which materialization is most appropriate?

A) table
B) view
C) incremental
D) ephemeral

Answer: B — Views always reflect the latest source data because they are virtual tables that execute the SQL on query. For staging models that are queried infrequently, a view avoids the cost of rebuilding a physical table on every dbt run while always providing current data.

Question 23 — Python Models

Q23

Where do dbt Python models execute?

A) On the developer's local machine
B) On the dbt Cloud server
C) Inside the data warehouse's Python runtime (e.g., Snowpark, Databricks)
D) In a separate Docker container

Answer: C — dbt Python models execute inside the warehouse's native Python runtime, such as Snowpark (Snowflake), PySpark (Databricks), or BigQuery's Python runtime. The code runs on the warehouse's compute resources, not locally.

Question 24 — CI/CD

Q24

In dbt Cloud, what happens when a CI job detects a pull request?

A) It runs all models in the project
B) It runs only the models that have been modified in the pull request and their downstream dependencies
C) It runs only the tests without building models
D) It sends a notification without running anything

Answer: B — dbt Cloud CI jobs use "slim CI" to detect which models were modified in the pull request (using state comparison with the manifest). It runs only those models and their downstream dependencies, saving time and compute costs.

Question 25 — Feature Engineering

Q25

Why is point-in-time feature generation important for ML training data in dbt?

A) It makes queries run faster
B) It prevents data leakage by ensuring features are computed using only data available before each prediction date
C) It reduces storage costs
D) It is required by dbt for incremental models

Answer: B — Point-in-time feature generation prevents data leakage by ensuring that training features only use data that would have been available at the time of prediction. Without this, models train on "future" information and appear more accurate than they really are, leading to poor production performance.

Score Interpretation

20-25 Correct

Ready for the exam. You have strong command of all domains. Focus your remaining study time on any questions you got wrong.

15-19 Correct

Almost ready. Review the domains where you missed questions. Re-read the relevant lesson and try again in a few days.

Below 15

More study needed. Go back through the course lessons, focusing on hands-on practice with dbt models, tests, and incremental models. Retake this exam when you feel more confident.

← Previous dbt for ML Next → Exam Tips & Review Sheet