Intermediate
Data Products
A data product is a curated, documented, quality-guaranteed dataset published by a domain for consumption by other teams. Treating data as a product shifts the mindset from "dump data in a lake" to "serve consumers with reliable, discoverable data."
Characteristics of a Data Product
A well-designed data product exhibits these qualities:
- Discoverable: Listed in a data catalog with descriptions, tags, and sample data
- Addressable: Has a unique, stable identifier and access endpoint
- Trustworthy: Published quality metrics, SLAs, and data contracts
- Self-describing: Includes schema, semantic descriptions, and usage documentation
- Interoperable: Follows global standards for formats, identifiers, and access patterns
- Secure: Built-in access controls aligned with governance policies
- Valuable on its own: Provides standalone utility without requiring deep domain knowledge to use
Data Product Types for AI
| Type | Description | AI Use Case |
|---|---|---|
| Source-aligned | Clean, modeled data from operational systems | Training data for supervised learning |
| Aggregate | Pre-computed metrics and summaries | Feature engineering, analytics |
| ML Features | Ready-to-use features with versioning | Direct consumption by ML pipelines |
| Embeddings | Vector representations of entities | RAG, semantic search, recommendations |
| Event streams | Real-time data feeds | Online learning, real-time inference |
Data Contracts
Data contracts formalize the agreement between producers and consumers:
- Schema definition: Exact field names, types, and constraints
- Quality expectations: Completeness thresholds, accuracy targets, uniqueness guarantees
- Freshness SLA: Maximum acceptable delay between source event and data availability
- Availability SLA: Uptime guarantees and maintenance windows
- Versioning policy: How schema changes are communicated and backward compatibility is maintained
- Access policy: Who can read, what approvals are needed, and what restrictions apply
Start simple: Your first data products do not need all qualities from day one. Start with discoverable, addressable, and trustworthy. Add sophistication iteratively based on consumer feedback.
Measuring Data Product Success
- Consumer count: Number of teams and systems consuming the data product
- Usage frequency: How often the data product is queried or accessed
- Consumer satisfaction: NPS or feedback scores from data consumers
- SLA compliance: Percentage of time freshness and quality targets are met
- Time to consume: How long it takes a new consumer to start using the data product
Lilly Tech Systems