Advanced

CI Integration for ML Tests

Running ML tests in continuous integration pipelines. Part of the Unit Testing for ML Pipelines course at AI School by Lilly Tech Systems.

Why CI Matters for ML Projects

Continuous Integration (CI) automatically runs your test suite whenever code is pushed, ensuring that new changes do not break existing functionality. For ML projects, CI is even more important than for traditional software because ML bugs are often silent. A broken feature engineering function does not throw an error; it just produces wrong features that lead to a degraded model. Only automated tests running in CI catch these issues before they reach production.

GitHub Actions for ML Testing

GitHub Actions is a popular choice for ML CI due to its free tier and easy GPU access. Here is a production-ready workflow configuration:

# .github/workflows/ml-tests.yml
name: ML Pipeline Tests
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ['3.10', '3.11']

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}

      - name: Cache pip packages
        uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest pytest-cov pytest-xdist

      - name: Run unit tests
        run: pytest tests/ -m "not slow and not integration" -n auto --cov=src --cov-report=xml

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          file: coverage.xml

  data-validation:
    runs-on: ubuntu-latest
    needs: unit-tests
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run data validation tests
        run: pytest tests/ -m "data" -v

  integration-tests:
    runs-on: ubuntu-latest
    needs: [unit-tests, data-validation]
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Run integration tests
        run: pytest tests/ -m "integration" -v --timeout=300
💡
CI optimization: Split your tests into stages that run in sequence: fast unit tests first, then data validation, then slower integration tests. If unit tests fail, there is no point running expensive integration tests. This saves CI minutes and gives faster feedback.

Handling Large Test Data in CI

ML test suites often need sample datasets that are too large for Git. Strategies for managing test data in CI:

  • Small synthetic fixtures — Generate test data programmatically in conftest.py (preferred for unit tests)
  • DVC (Data Version Control) — Track test data separately from code, pull it in CI
  • Git LFS — Store large test files with Git Large File Storage
  • Cloud storage — Download test data from S3 or GCS during CI setup

Managing ML Dependencies in CI

ML dependencies (TensorFlow, PyTorch, scikit-learn) are large and slow to install. Speed up CI with these techniques:

  1. Use pip caching to avoid re-downloading packages
  2. Create a minimal requirements-test.txt with only what tests need
  3. Use Docker images with pre-installed ML libraries
  4. Pin exact versions to avoid surprise breaking changes

CI Quality Gates for ML

Configure your CI pipeline to enforce quality standards:

# Add to your CI workflow
- name: Check test coverage
  run: |
    coverage=$(pytest tests/ --cov=src --cov-report=term | grep TOTAL | awk '{print $4}' | tr -d '%')
    if [ "$coverage" -lt 80 ]; then
      echo "Test coverage $coverage% is below 80% threshold"
      exit 1
    fi

- name: Check for test warnings
  run: pytest tests/ -W error::UserWarning -W error::DeprecationWarning

Pre-commit Hooks for ML Code

Add pre-commit hooks that run fast checks before code is committed: linting with ruff or flake8, type checking with mypy, and running the fastest subset of tests. This catches issues before they even reach CI, saving pipeline time.

Common CI pitfall: Tests that pass locally but fail in CI due to different random seeds, missing data files, or environment differences. Always run your tests with PYTHONHASHSEED=0 and fixed random seeds to ensure reproducibility across environments.