Intermediate

Differential Privacy Tools & Libraries

A practical guide to the open-source libraries that make differential privacy accessible for production ML systems. From data analysis to model training, these tools handle the complex math so you can focus on your application.

Library Comparison

LibraryMaintainerUse CaseLanguage
OpenDPHarvard/MicrosoftGeneral DP data analysisRust + Python bindings
Google DPGoogleAggregate analyticsC++, Java, Go, Python
OpacusMetaPyTorch DP-SGD trainingPython (PyTorch)
TF PrivacyGoogleTensorFlow DP-SGD trainingPython (TensorFlow)
PipelineDPOpenMined/GoogleDP on Spark/Beam pipelinesPython
Tumult AnalyticsTumult LabsDP SQL-like queriesPython (PySpark)

OpenDP

OpenDP is a community effort to build trustworthy, open-source software tools for statistical analysis with differential privacy:

Python - OpenDP Example
from opendp.mod import enable_features
enable_features("contrib")

import opendp.prelude as dp

# Build a measurement: private mean of ages
input_space = dp.vector_domain(dp.atom_domain(T=float)), \
              dp.symmetric_distance()

# Chain transformations
mean_measurement = (
    input_space >>
    dp.t.then_clamp(bounds=(0.0, 120.0)) >>
    dp.t.then_resize(size=1000, constant=50.0) >>
    dp.t.then_mean() >>
    dp.m.then_laplace(scale=0.12)  # Calibrated noise
)

# Check the privacy guarantee
print(f"Privacy loss: ε = {mean_measurement.map(1)}")

# Apply to data
ages = [25.0, 34.0, 42.0, 56.0, ...]  # 1000 ages
private_mean = mean_measurement(ages)

Google DP Library

Google's differential privacy library provides battle-tested implementations of DP aggregation functions:

Python - Google DP Library
from pydp.algorithms.laplacian import BoundedMean, Count

# Private count
count = Count(epsilon=1.0, dtype="int")
for val in data:
    count.add_entry(val)
private_count = count.result()

# Private bounded mean
mean = BoundedMean(
    epsilon=1.0,
    lower_bound=0,
    upper_bound=100,
    dtype="float"
)
for val in data:
    mean.add_entry(val)
private_mean = mean.result()

Opacus for PyTorch

Opacus is Meta's library for training PyTorch models with differential privacy. It wraps your existing training loop with minimal code changes. See the DP-SGD lesson for a full training example.

TensorFlow Privacy

TensorFlow Privacy provides DP-SGD optimizers that drop into standard TensorFlow/Keras training:

Python - TensorFlow Privacy
from tensorflow_privacy.privacy.optimizers.dp_optimizer_keras \
    import DPKerasSGDOptimizer
from tensorflow_privacy.privacy.analysis \
    import compute_dp_sgd_privacy

optimizer = DPKerasSGDOptimizer(
    l2_norm_clip=1.0,
    noise_multiplier=1.1,
    num_microbatches=256,
    learning_rate=0.01
)

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
model.fit(x_train, y_train, epochs=10, batch_size=256)

# Compute the achieved privacy guarantee
eps, _ = compute_dp_sgd_privacy.compute_dp_sgd_privacy(
    n=len(x_train), batch_size=256,
    noise_multiplier=1.1, epochs=10, delta=1e-5
)
print(f"Achieved ε = {eps:.2f}")
Choosing a library: For ML model training, use Opacus (PyTorch) or TF Privacy (TensorFlow). For data analytics and queries, use OpenDP or Google's DP library. For large-scale pipeline processing, use PipelineDP or Tumult Analytics.