Intermediate

Data Integration

Learn how to connect, ingest, and transform data from any enterprise source using Foundry's data connection framework and pipeline tools.

Data Connection Framework

Foundry provides connectors for virtually any data source, with built-in sync management and health monitoring:

Source Type	Examples	Sync Modes
Databases	PostgreSQL, Oracle, SQL Server, MySQL	Full, incremental, CDC
Cloud storage	S3, Azure Blob, GCS	File-based, streaming
APIs	REST, GraphQL, SOAP	Polling, webhook
SaaS platforms	Salesforce, SAP, Workday	Pre-built connectors
Streaming	Kafka, Kinesis, Event Hubs	Real-time ingestion

💡

Key concept: Every dataset in Foundry is versioned like Git. You can see the full history of changes, revert to previous versions, and trace how data flows through the system via automatic lineage tracking.

Pipeline Builder

Foundry offers multiple approaches to data transformation:

Contour: Visual, no-code data analysis and transformation tool for business users
Pipeline Builder: Low-code visual pipeline design with drag-and-drop transforms
Code Repositories: Full Python/PySpark transforms for complex logic with Git versioning
Code Workbook: Interactive notebook-style environment for exploration and prototyping

Data Lineage

Foundry automatically tracks data lineage across the entire platform:

End-to-end visibility: Trace any value from raw source through every transformation to the final output
Impact analysis: Understand downstream effects before modifying a dataset or pipeline
Health monitoring: Automatic alerts when data quality degrades or pipelines fail
Branching: Test pipeline changes on branches before merging to production, like code

Data Governance

Foundry's governance model is built into every layer:

Marking system: Tag data with classification markings that automatically propagate through pipelines
Project-based access: Organize datasets into projects with role-based permissions
Provenance tracking: Full audit trail of who accessed, modified, or derived data
Data health: Automated expectations and checks that validate data quality on every pipeline run

✅

Key takeaway: Foundry's data integration is distinguished by its versioning model (every dataset is versioned like code), automatic lineage tracking, and marking-based governance that propagates security classifications through transformations.

← Previous Introduction Next → Ontology