Beginner

MDM Framework

Choose and implement the right MDM architecture for your machine learning needs. This lesson covers the major implementation styles and how each supports AI workloads.

MDM Implementation Styles

StyleDescriptionML Suitability
RegistryCentral index linking records across systems without moving dataGood for read-heavy ML feature lookups
ConsolidationCopy and merge data into a golden record hubBest for training data preparation
CoexistenceHub and sources stay in sync bidirectionallyBest for real-time inference with consistent data
CentralizedHub is the single authoring point for master dataHighest quality but slowest to implement

MDM Architecture Components

  1. Data Model

    Define canonical schemas for each master data domain (customer, product, etc.) with all attributes, relationships, and hierarchies.

  2. Match and Merge Engine

    Identify and resolve duplicate records across source systems using deterministic and probabilistic matching rules.

  3. Data Stewardship

    Workflows for human review of uncertain matches, data corrections, and exception handling.

  4. Synchronization

    Mechanisms to keep master data consistent across source systems — event-driven, batch, or API-based.

  5. Governance Layer

    Policies, roles, and audit trails governing who can create, modify, and approve master data changes.

Choosing an Approach for ML

For ML-focused MDM, consider these factors:

  • If you primarily need consistent training data: Consolidation style gives you a clean golden record for feature engineering
  • If you need real-time entity resolution: Registry or coexistence style supports low-latency lookups during inference
  • If you have strong governance requirements: Centralized style provides the highest data quality and auditability
  • If you need to start quickly: Registry style requires the least change to existing systems
Practical advice: Most organizations implementing MDM for ML start with a consolidation approach for their most critical entity (usually customers). This provides immediate value for training data quality while you plan the broader MDM program.

MDM Tools Landscape

CategoryTools
Enterprise MDMInformatica MDM, IBM InfoSphere, SAP Master Data Governance, Reltio
Cloud-NativeTamr, Ataccama, Profisee
Open SourceApache Atlas (metadata), Zingg (entity resolution), dedupe.io