Intermediate

Data Integration

Learn how to connect, ingest, and transform data from any enterprise source using Foundry's data connection framework and pipeline tools.

Data Connection Framework

Foundry provides connectors for virtually any data source, with built-in sync management and health monitoring:

Source TypeExamplesSync Modes
DatabasesPostgreSQL, Oracle, SQL Server, MySQLFull, incremental, CDC
Cloud storageS3, Azure Blob, GCSFile-based, streaming
APIsREST, GraphQL, SOAPPolling, webhook
SaaS platformsSalesforce, SAP, WorkdayPre-built connectors
StreamingKafka, Kinesis, Event HubsReal-time ingestion
💡
Key concept: Every dataset in Foundry is versioned like Git. You can see the full history of changes, revert to previous versions, and trace how data flows through the system via automatic lineage tracking.

Pipeline Builder

Foundry offers multiple approaches to data transformation:

  • Contour: Visual, no-code data analysis and transformation tool for business users
  • Pipeline Builder: Low-code visual pipeline design with drag-and-drop transforms
  • Code Repositories: Full Python/PySpark transforms for complex logic with Git versioning
  • Code Workbook: Interactive notebook-style environment for exploration and prototyping

Data Lineage

Foundry automatically tracks data lineage across the entire platform:

  • End-to-end visibility: Trace any value from raw source through every transformation to the final output
  • Impact analysis: Understand downstream effects before modifying a dataset or pipeline
  • Health monitoring: Automatic alerts when data quality degrades or pipelines fail
  • Branching: Test pipeline changes on branches before merging to production, like code

Data Governance

Foundry's governance model is built into every layer:

  • Marking system: Tag data with classification markings that automatically propagate through pipelines
  • Project-based access: Organize datasets into projects with role-based permissions
  • Provenance tracking: Full audit trail of who accessed, modified, or derived data
  • Data health: Automated expectations and checks that validate data quality on every pipeline run
Key takeaway: Foundry's data integration is distinguished by its versioning model (every dataset is versioned like code), automatic lineage tracking, and marking-based governance that propagates security classifications through transformations.