Intermediate

Data Products

A data product is a curated, documented, quality-guaranteed dataset published by a domain for consumption by other teams. Treating data as a product shifts the mindset from "dump data in a lake" to "serve consumers with reliable, discoverable data."

Characteristics of a Data Product

A well-designed data product exhibits these qualities:

  • Discoverable: Listed in a data catalog with descriptions, tags, and sample data
  • Addressable: Has a unique, stable identifier and access endpoint
  • Trustworthy: Published quality metrics, SLAs, and data contracts
  • Self-describing: Includes schema, semantic descriptions, and usage documentation
  • Interoperable: Follows global standards for formats, identifiers, and access patterns
  • Secure: Built-in access controls aligned with governance policies
  • Valuable on its own: Provides standalone utility without requiring deep domain knowledge to use

Data Product Types for AI

TypeDescriptionAI Use Case
Source-alignedClean, modeled data from operational systemsTraining data for supervised learning
AggregatePre-computed metrics and summariesFeature engineering, analytics
ML FeaturesReady-to-use features with versioningDirect consumption by ML pipelines
EmbeddingsVector representations of entitiesRAG, semantic search, recommendations
Event streamsReal-time data feedsOnline learning, real-time inference

Data Contracts

Data contracts formalize the agreement between producers and consumers:

  • Schema definition: Exact field names, types, and constraints
  • Quality expectations: Completeness thresholds, accuracy targets, uniqueness guarantees
  • Freshness SLA: Maximum acceptable delay between source event and data availability
  • Availability SLA: Uptime guarantees and maintenance windows
  • Versioning policy: How schema changes are communicated and backward compatibility is maintained
  • Access policy: Who can read, what approvals are needed, and what restrictions apply
Start simple: Your first data products do not need all qualities from day one. Start with discoverable, addressable, and trustworthy. Add sophistication iteratively based on consumer feedback.

Measuring Data Product Success

  • Consumer count: Number of teams and systems consuming the data product
  • Usage frequency: How often the data product is queried or accessed
  • Consumer satisfaction: NPS or feedback scores from data consumers
  • SLA compliance: Percentage of time freshness and quality targets are met
  • Time to consume: How long it takes a new consumer to start using the data product