Intermediate

Unity Catalog

Master Databricks' unified governance solution for data, ML models, and AI assets with fine-grained access control, automated lineage, and cross-workspace data sharing.

What is Unity Catalog?

Unity Catalog is Databricks' centralized governance layer that provides a single place to manage access control, auditing, lineage, and data discovery across all Databricks workspaces and cloud environments.

It introduces a three-level namespace: catalog.schema.table, enabling logical organization of data assets that mirrors your business structure.

💡
Open source: Unity Catalog was open-sourced in 2024, making it the industry's first open governance solution for data and AI. It integrates with the broader lakehouse ecosystem including Apache Iceberg and Delta Lake.

Three-Level Namespace

LevelPurposeExample
CatalogTop-level container (business unit or environment)production, development, finance
SchemaLogical grouping of related assetsraw_data, curated, ml_features
ObjectTables, views, volumes, models, functionscustomer_transactions
Unity Catalog Namespace
-- Access a table using three-level namespace
SELECT * FROM production.curated.customer_transactions;

-- Grant access to a schema
GRANT USE SCHEMA ON SCHEMA production.curated TO `data-analysts`;
GRANT SELECT ON SCHEMA production.curated TO `data-analysts`;

-- Create a managed table
CREATE TABLE production.curated.daily_metrics (
  date DATE,
  metric_name STRING,
  value DOUBLE
) USING DELTA;

Access Control

Unity Catalog provides fine-grained, SQL-standard access control:

  • Privilege inheritance: Permissions cascade from catalog to schema to object level
  • Row-level security: Filter data based on user attributes using row filters
  • Column masking: Dynamically mask sensitive columns based on group membership
  • Identity federation: Map cloud IAM identities to Databricks principals

Data Lineage

Unity Catalog automatically captures lineage across all workloads:

  • Table-to-table lineage: Track how data flows between tables through ETL pipelines
  • Column-level lineage: See which source columns contribute to each downstream column
  • Notebook and job lineage: Identify which code and jobs produced each dataset
  • ML model lineage: Trace models back to training data and feature tables

Delta Sharing

Delta Sharing is an open protocol for secure data sharing, integrated with Unity Catalog:

  • Share data across Databricks workspaces and with external organizations
  • Recipients can access shared data using any Delta Sharing client (Spark, pandas, Power BI)
  • Data providers maintain full control over access and can revoke shares at any time
  • No data copying — recipients read directly from the provider's storage
Key takeaway: Unity Catalog is essential for any enterprise Databricks deployment. It provides the governance foundation needed for compliance, data democratization, and secure collaboration across teams and organizations.