Advanced
Practice Exam
25 exam-style questions covering all domains of the dbt Analytics Engineering certification. Try to answer each question before reading the explanation. Target time: 35 minutes (matching the real exam pace of ~1.4 min/question).
Exam simulation: Cover the answer explanations as you go. Write down your answers first, then check. A passing score would be roughly 18/25 correct (72%).
Question 1 — Models
What SQL statement does a dbt model file contain?
A) CREATE TABLE AS SELECT
B) A SELECT statement only
C) INSERT INTO ... SELECT
D) DDL statements like CREATE TABLE and ALTER TABLE
Answer: B — dbt models contain only a SELECT statement. dbt handles all DDL/DML (CREATE TABLE, INSERT, MERGE, etc.) automatically based on the model's materialization configuration. This separation of business logic from warehouse operations is a core dbt principle.
A) CREATE TABLE AS SELECT
B) A SELECT statement only
C) INSERT INTO ... SELECT
D) DDL statements like CREATE TABLE and ALTER TABLE
Answer: B — dbt models contain only a SELECT statement. dbt handles all DDL/DML (CREATE TABLE, INSERT, MERGE, etc.) automatically based on the model's materialization configuration. This separation of business logic from warehouse operations is a core dbt principle.
Question 2 — ref()
Why should you always use
A) ref() runs queries faster
B) ref() creates dependencies in the DAG, enables environment-aware schema resolution, and ensures correct build order
C) ref() provides automatic data encryption
D) ref() is required for SQL syntax validation
Answer: B — ref() serves three critical purposes: (1) it creates model dependencies in the DAG so dbt builds models in the correct order, (2) it resolves table names dynamically based on the target environment (dev, staging, prod), and (3) it enables lineage tracking for documentation and impact analysis.
{{ ref('model_name') }} instead of hardcoding table names in your dbt models?A) ref() runs queries faster
B) ref() creates dependencies in the DAG, enables environment-aware schema resolution, and ensures correct build order
C) ref() provides automatic data encryption
D) ref() is required for SQL syntax validation
Answer: B — ref() serves three critical purposes: (1) it creates model dependencies in the DAG so dbt builds models in the correct order, (2) it resolves table names dynamically based on the target environment (dev, staging, prod), and (3) it enables lineage tracking for documentation and impact analysis.
Question 3 — Materializations
Which materialization creates NO object in the warehouse and instead injects the model's SQL as a CTE in downstream models?
A) view
B) table
C) incremental
D) ephemeral
Answer: D — Ephemeral models are not created in the warehouse. Their SQL is injected as a Common Table Expression (CTE) wherever they are referenced by downstream models. This reduces warehouse objects but means ephemeral models cannot be queried directly.
A) view
B) table
C) incremental
D) ephemeral
Answer: D — Ephemeral models are not created in the warehouse. Their SQL is injected as a Common Table Expression (CTE) wherever they are referenced by downstream models. This reduces warehouse objects but means ephemeral models cannot be queried directly.
Question 4 — Sources
What does
A) Whether the dbt project code is up to date
B) Whether the source tables have been updated recently based on the loaded_at_field configuration
C) Whether the dbt packages need updating
D) Whether the warehouse connection is active
Answer: B — Source freshness checks the
dbt source freshness check?A) Whether the dbt project code is up to date
B) Whether the source tables have been updated recently based on the loaded_at_field configuration
C) Whether the dbt packages need updating
D) Whether the warehouse connection is active
Answer: B — Source freshness checks the
loaded_at_field column in source tables to determine when data was last loaded. It compares this against the warn_after and error_after thresholds defined in the source YAML to alert you when data is stale.
Question 5 — Testing
A singular data test SQL file returns 5 rows when executed. What does this mean?
A) The test passed with 5 successful checks
B) The test failed because 5 rows violated the test condition
C) The test needs 5 more rows to be valid
D) The test is inconclusive
Answer: B — In dbt, a singular data test passes only when the query returns zero rows. Any rows returned represent violations of the test condition. In this case, 5 rows failed the test.
A) The test passed with 5 successful checks
B) The test failed because 5 rows violated the test condition
C) The test needs 5 more rows to be valid
D) The test is inconclusive
Answer: B — In dbt, a singular data test passes only when the query returns zero rows. Any rows returned represent violations of the test condition. In this case, 5 rows failed the test.
Question 6 — Incremental Models
What does the
A) TRUE
B) FALSE
C) NULL
D) An error
Answer: B — On the first run, the target table does not exist yet, so
is_incremental() macro return on the FIRST run of an incremental model?A) TRUE
B) FALSE
C) NULL
D) An error
Answer: B — On the first run, the target table does not exist yet, so
is_incremental() returns FALSE. This means the WHERE clause inside the {% if is_incremental() %} block is skipped, and dbt processes ALL data to build the initial table. On subsequent runs, it returns TRUE and only processes new data.
Question 7 — Snapshots
What type of Slowly Changing Dimension do dbt snapshots implement?
A) SCD Type 1 (overwrite)
B) SCD Type 2 (historical tracking with valid_from/valid_to)
C) SCD Type 3 (current and previous value columns)
D) SCD Type 0 (never change)
Answer: B — dbt snapshots implement SCD Type 2, which maintains a complete history of changes. Each row gets
A) SCD Type 1 (overwrite)
B) SCD Type 2 (historical tracking with valid_from/valid_to)
C) SCD Type 3 (current and previous value columns)
D) SCD Type 0 (never change)
Answer: B — dbt snapshots implement SCD Type 2, which maintains a complete history of changes. Each row gets
dbt_valid_from and dbt_valid_to timestamps. When a record changes, the old row's dbt_valid_to is set, and a new row is inserted with the current timestamp as dbt_valid_from.
Question 8 — Jinja
Which Jinja delimiters are used for control flow statements like if/for loops in dbt?
A) {{ }}
B) {% %}
C) {# #}
D) {[ ]}
Answer: B —
A) {{ }}
B) {% %}
C) {# #}
D) {[ ]}
Answer: B —
{% %} is used for control flow statements (if, for, set, macro). {{ }} outputs expressions. {# #} is for comments. These are standard Jinja2 delimiters.
Question 9 — Macros
Where should reusable Jinja macros be stored in a dbt project?
A) In the models/ directory
B) In the macros/ directory
C) In the tests/ directory
D) In the seeds/ directory
Answer: B — Macros are stored as .sql files in the
A) In the models/ directory
B) In the macros/ directory
C) In the tests/ directory
D) In the seeds/ directory
Answer: B — Macros are stored as .sql files in the
macros/ directory. They are available to all models, tests, and other macros in the project. Macros defined in installed packages are also accessible.
Question 10 — Packages
Which command installs dbt packages defined in
A) dbt install
B) dbt deps
C) dbt packages
D) dbt init
Answer: B —
packages.yml?A) dbt install
B) dbt deps
C) dbt packages
D) dbt init
Answer: B —
dbt deps (short for dependencies) reads packages.yml and downloads all specified packages into the dbt_packages/ directory. This must be run before dbt run if you use any external packages.
Question 11 — Documentation
Which two commands generate and serve the dbt documentation website?
A) dbt docs build and dbt docs deploy
B) dbt docs generate and dbt docs serve
C) dbt build docs and dbt serve docs
D) dbt compile and dbt serve
Answer: B —
A) dbt docs build and dbt docs deploy
B) dbt docs generate and dbt docs serve
C) dbt build docs and dbt serve docs
D) dbt compile and dbt serve
Answer: B —
dbt docs generate creates a catalog.json file with metadata about all models, sources, and tests. dbt docs serve launches a local web server to browse the documentation and DAG visualization.
Question 12 — Seeds
Which dbt command loads CSV files from the seeds/ directory into the warehouse?
A) dbt run
B) dbt seed
C) dbt load
D) dbt import
Answer: B —
A) dbt run
B) dbt seed
C) dbt load
D) dbt import
Answer: B —
dbt seed reads CSV files from the seeds/ directory and loads them into the warehouse as tables. Seeds are version-controlled in Git and are designed for small, static reference data.
Question 13 — Project Structure
In a well-structured dbt project, where should raw source table references appear?
A) In any model throughout the project
B) Only in staging models, using the source() function
C) Only in mart models
D) In the dbt_project.yml file
Answer: B — Best practice dictates that raw source references (via the
A) In any model throughout the project
B) Only in staging models, using the source() function
C) Only in mart models
D) In the dbt_project.yml file
Answer: B — Best practice dictates that raw source references (via the
source() function) should only appear in staging models. All other models should reference staging or intermediate models using ref(). This creates a clean abstraction layer between raw data and transformed models.
Question 14 — Testing
Which built-in dbt test checks referential integrity between two models?
A) unique
B) not_null
C) accepted_values
D) relationships
Answer: D — The
A) unique
B) not_null
C) accepted_values
D) relationships
Answer: D — The
relationships test verifies that every value in a column exists in a specified column of another model, similar to a foreign key constraint. It is configured with to: ref('other_model') and field: column_name.
Question 15 — Incremental Strategies
Which incremental strategy should be used when the source data can have updates to existing rows AND new rows?
A) append
B) merge
C) delete+insert
D) insert_overwrite
Answer: B — The merge strategy uses a MERGE (or UPSERT) statement that handles both updates to existing rows (matched by unique_key) and inserts of new rows. Append (A) only inserts and would create duplicates. Delete+insert (C) works but is less efficient. Insert_overwrite (D) replaces entire partitions.
A) append
B) merge
C) delete+insert
D) insert_overwrite
Answer: B — The merge strategy uses a MERGE (or UPSERT) statement that handles both updates to existing rows (matched by unique_key) and inserts of new rows. Append (A) only inserts and would create duplicates. Delete+insert (C) works but is less efficient. Insert_overwrite (D) replaces entire partitions.
Question 16 — Hooks
When does a post-hook execute?
A) Before the model is compiled
B) After the model is successfully built in the warehouse
C) After dbt deps is run
D) Before tests are executed
Answer: B — A post-hook runs SQL immediately after a model is successfully materialized in the warehouse. Common uses include granting permissions, creating indexes, and running ANALYZE statements.
A) Before the model is compiled
B) After the model is successfully built in the warehouse
C) After dbt deps is run
D) Before tests are executed
Answer: B — A post-hook runs SQL immediately after a model is successfully materialized in the warehouse. Common uses include granting permissions, creating indexes, and running ANALYZE statements.
Question 17 — dbt Commands
Which dbt command both runs models AND executes tests in a single invocation?
A) dbt run --test
B) dbt build
C) dbt test --run
D) dbt run && dbt test
Answer: B —
A) dbt run --test
B) dbt build
C) dbt test --run
D) dbt run && dbt test
Answer: B —
dbt build runs models, tests, snapshots, and seeds in DAG order. It runs tests immediately after each model is built, stopping early if critical tests fail. This is the recommended command for CI/CD pipelines.
Question 18 — Environments
How does dbt ensure that development runs do not affect production tables?
A) By locking production tables during development
B) By using target-specific schemas configured in profiles.yml so dev models are built in a separate schema
C) By running development queries as read-only
D) By automatically backing up production data before each run
Answer: B — The
A) By locking production tables during development
B) By using target-specific schemas configured in profiles.yml so dev models are built in a separate schema
C) By running development queries as read-only
D) By automatically backing up production data before each run
Answer: B — The
profiles.yml file defines different targets (dev, prod) with different schemas. When a developer runs dbt locally, models are built in the dev schema (e.g., dev_analyst_name). Production runs use the production schema. ref() handles the schema resolution automatically.
Question 19 — DAG
What does the dbt DAG represent?
A) A database access group for security
B) A Directed Acyclic Graph showing the dependencies and execution order of all models
C) A data analysis guidebook
D) A deployment automation gateway
Answer: B — The DAG (Directed Acyclic Graph) visualizes all model dependencies based on ref() and source() calls. It determines the execution order of models and enables features like model selection (e.g.,
A) A database access group for security
B) A Directed Acyclic Graph showing the dependencies and execution order of all models
C) A data analysis guidebook
D) A deployment automation gateway
Answer: B — The DAG (Directed Acyclic Graph) visualizes all model dependencies based on ref() and source() calls. It determines the execution order of models and enables features like model selection (e.g.,
dbt run --select model_name+ to run a model and all its downstream dependencies).
Question 20 — dbt for ML
A data scientist wants to document that their churn prediction ML model depends on the
A) A custom test
B) An exposure
C) A snapshot
D) A seed
Answer: B — Exposures document downstream consumers of dbt models that exist outside of dbt, such as ML models, dashboards, and applications. They create lineage visibility in the DAG and identify ownership.
fct_customer_features dbt model. Which dbt feature should they use?A) A custom test
B) An exposure
C) A snapshot
D) A seed
Answer: B — Exposures document downstream consumers of dbt models that exist outside of dbt, such as ML models, dashboards, and applications. They create lineage visibility in the DAG and identify ownership.
Question 21 — Tags
An analytics engineer wants to run only the ML-related models in their dbt project. They have tagged all ML models with
A) dbt run --models ml
B) dbt run --select tag:ml
C) dbt run --tag ml
D) dbt run --filter ml
Answer: B — The
tags: ['ml']. Which command runs only those models?A) dbt run --models ml
B) dbt run --select tag:ml
C) dbt run --tag ml
D) dbt run --filter ml
Answer: B — The
--select tag:ml syntax selects all models with the specified tag. Tags can be applied at the model, directory, or project level. This enables selective execution of model subsets.
Question 22 — Materializations
A staging model is queried infrequently but must always show the latest data. Which materialization is most appropriate?
A) table
B) view
C) incremental
D) ephemeral
Answer: B — Views always reflect the latest source data because they are virtual tables that execute the SQL on query. For staging models that are queried infrequently, a view avoids the cost of rebuilding a physical table on every dbt run while always providing current data.
A) table
B) view
C) incremental
D) ephemeral
Answer: B — Views always reflect the latest source data because they are virtual tables that execute the SQL on query. For staging models that are queried infrequently, a view avoids the cost of rebuilding a physical table on every dbt run while always providing current data.
Question 23 — Python Models
Where do dbt Python models execute?
A) On the developer's local machine
B) On the dbt Cloud server
C) Inside the data warehouse's Python runtime (e.g., Snowpark, Databricks)
D) In a separate Docker container
Answer: C — dbt Python models execute inside the warehouse's native Python runtime, such as Snowpark (Snowflake), PySpark (Databricks), or BigQuery's Python runtime. The code runs on the warehouse's compute resources, not locally.
A) On the developer's local machine
B) On the dbt Cloud server
C) Inside the data warehouse's Python runtime (e.g., Snowpark, Databricks)
D) In a separate Docker container
Answer: C — dbt Python models execute inside the warehouse's native Python runtime, such as Snowpark (Snowflake), PySpark (Databricks), or BigQuery's Python runtime. The code runs on the warehouse's compute resources, not locally.
Question 24 — CI/CD
In dbt Cloud, what happens when a CI job detects a pull request?
A) It runs all models in the project
B) It runs only the models that have been modified in the pull request and their downstream dependencies
C) It runs only the tests without building models
D) It sends a notification without running anything
Answer: B — dbt Cloud CI jobs use "slim CI" to detect which models were modified in the pull request (using state comparison with the manifest). It runs only those models and their downstream dependencies, saving time and compute costs.
A) It runs all models in the project
B) It runs only the models that have been modified in the pull request and their downstream dependencies
C) It runs only the tests without building models
D) It sends a notification without running anything
Answer: B — dbt Cloud CI jobs use "slim CI" to detect which models were modified in the pull request (using state comparison with the manifest). It runs only those models and their downstream dependencies, saving time and compute costs.
Question 25 — Feature Engineering
Why is point-in-time feature generation important for ML training data in dbt?
A) It makes queries run faster
B) It prevents data leakage by ensuring features are computed using only data available before each prediction date
C) It reduces storage costs
D) It is required by dbt for incremental models
Answer: B — Point-in-time feature generation prevents data leakage by ensuring that training features only use data that would have been available at the time of prediction. Without this, models train on "future" information and appear more accurate than they really are, leading to poor production performance.
A) It makes queries run faster
B) It prevents data leakage by ensuring features are computed using only data available before each prediction date
C) It reduces storage costs
D) It is required by dbt for incremental models
Answer: B — Point-in-time feature generation prevents data leakage by ensuring that training features only use data that would have been available at the time of prediction. Without this, models train on "future" information and appear more accurate than they really are, leading to poor production performance.
Score Interpretation
20-25 Correct
Ready for the exam. You have strong command of all domains. Focus your remaining study time on any questions you got wrong.
15-19 Correct
Almost ready. Review the domains where you missed questions. Re-read the relevant lesson and try again in a few days.
Below 15
More study needed. Go back through the course lessons, focusing on hands-on practice with dbt models, tests, and incremental models. Retake this exam when you feel more confident.
Lilly Tech Systems