Intermediate
Unity Catalog
Master Databricks' unified governance solution for data, ML models, and AI assets with fine-grained access control, automated lineage, and cross-workspace data sharing.
What is Unity Catalog?
Unity Catalog is Databricks' centralized governance layer that provides a single place to manage access control, auditing, lineage, and data discovery across all Databricks workspaces and cloud environments.
It introduces a three-level namespace: catalog.schema.table, enabling logical organization of data assets that mirrors your business structure.
Open source: Unity Catalog was open-sourced in 2024, making it the industry's first open governance solution for data and AI. It integrates with the broader lakehouse ecosystem including Apache Iceberg and Delta Lake.
Three-Level Namespace
| Level | Purpose | Example |
|---|---|---|
| Catalog | Top-level container (business unit or environment) | production, development, finance |
| Schema | Logical grouping of related assets | raw_data, curated, ml_features |
| Object | Tables, views, volumes, models, functions | customer_transactions |
Unity Catalog Namespace
-- Access a table using three-level namespace
SELECT * FROM production.curated.customer_transactions;
-- Grant access to a schema
GRANT USE SCHEMA ON SCHEMA production.curated TO `data-analysts`;
GRANT SELECT ON SCHEMA production.curated TO `data-analysts`;
-- Create a managed table
CREATE TABLE production.curated.daily_metrics (
date DATE,
metric_name STRING,
value DOUBLE
) USING DELTA;
Access Control
Unity Catalog provides fine-grained, SQL-standard access control:
- Privilege inheritance: Permissions cascade from catalog to schema to object level
- Row-level security: Filter data based on user attributes using row filters
- Column masking: Dynamically mask sensitive columns based on group membership
- Identity federation: Map cloud IAM identities to Databricks principals
Data Lineage
Unity Catalog automatically captures lineage across all workloads:
- Table-to-table lineage: Track how data flows between tables through ETL pipelines
- Column-level lineage: See which source columns contribute to each downstream column
- Notebook and job lineage: Identify which code and jobs produced each dataset
- ML model lineage: Trace models back to training data and feature tables
Delta Sharing
Delta Sharing is an open protocol for secure data sharing, integrated with Unity Catalog:
- Share data across Databricks workspaces and with external organizations
- Recipients can access shared data using any Delta Sharing client (Spark, pandas, Power BI)
- Data providers maintain full control over access and can revoke shares at any time
- No data copying — recipients read directly from the provider's storage
Key takeaway: Unity Catalog is essential for any enterprise Databricks deployment. It provides the governance foundation needed for compliance, data democratization, and secure collaboration across teams and organizations.
Lilly Tech Systems