Beginner
MDM Framework
Choose and implement the right MDM architecture for your machine learning needs. This lesson covers the major implementation styles and how each supports AI workloads.
MDM Implementation Styles
| Style | Description | ML Suitability |
|---|---|---|
| Registry | Central index linking records across systems without moving data | Good for read-heavy ML feature lookups |
| Consolidation | Copy and merge data into a golden record hub | Best for training data preparation |
| Coexistence | Hub and sources stay in sync bidirectionally | Best for real-time inference with consistent data |
| Centralized | Hub is the single authoring point for master data | Highest quality but slowest to implement |
MDM Architecture Components
Data Model
Define canonical schemas for each master data domain (customer, product, etc.) with all attributes, relationships, and hierarchies.
Match and Merge Engine
Identify and resolve duplicate records across source systems using deterministic and probabilistic matching rules.
Data Stewardship
Workflows for human review of uncertain matches, data corrections, and exception handling.
Synchronization
Mechanisms to keep master data consistent across source systems — event-driven, batch, or API-based.
Governance Layer
Policies, roles, and audit trails governing who can create, modify, and approve master data changes.
Choosing an Approach for ML
For ML-focused MDM, consider these factors:
- If you primarily need consistent training data: Consolidation style gives you a clean golden record for feature engineering
- If you need real-time entity resolution: Registry or coexistence style supports low-latency lookups during inference
- If you have strong governance requirements: Centralized style provides the highest data quality and auditability
- If you need to start quickly: Registry style requires the least change to existing systems
Practical advice: Most organizations implementing MDM for ML start with a consolidation approach for their most critical entity (usually customers). This provides immediate value for training data quality while you plan the broader MDM program.
MDM Tools Landscape
| Category | Tools |
|---|---|
| Enterprise MDM | Informatica MDM, IBM InfoSphere, SAP Master Data Governance, Reltio |
| Cloud-Native | Tamr, Ataccama, Profisee |
| Open Source | Apache Atlas (metadata), Zingg (entity resolution), dedupe.io |
Lilly Tech Systems