Intermediate

Local vs Global Differential Privacy

The trust model determines where noise is added: at the data source (local DP) or by a trusted aggregator (global/central DP). This choice fundamentally affects both the privacy guarantee and the utility of results.

Two Trust Models

Model	Trust Assumption	Where Noise Is Added	Utility	Use Case
Global (Central) DP	Trusted curator holds raw data	After aggregation	Higher accuracy	Internal analytics, Census
Local DP	No trusted party; users do not share raw data	Before data leaves user device	Lower accuracy	Telemetry, keyboard data

Global (Central) Differential Privacy

In the central model, a trusted data curator collects raw data and applies noise to the outputs of queries or analyses:

Advantage: Much better accuracy for the same privacy level. Noise is added once to the aggregate, not to each individual record.
Disadvantage: Requires trusting the curator with raw data. If the curator is compromised, all data is exposed.
Examples: US Census Bureau's 2020 Census, internal company analytics, DP-SGD model training.

Local Differential Privacy

In the local model, each user perturbs their own data before sending it. The server never sees raw data:

Advantage: Strongest trust model. Even if the server is compromised, individual data remains private.
Disadvantage: Requires significantly more users to achieve the same accuracy. Noise is amplified by n (number of users) compared to central DP.
Examples: Apple's emoji/Safari data, Google's RAPPOR for Chrome.

💡

Accuracy trade-off: For the same privacy guarantee (same ε), local DP requires roughly n times more users than central DP to achieve the same accuracy, where n is the number of participants. This makes local DP practical only for very large user populations.

Randomized Response

The simplest local DP mechanism, invented by Stanley Warner in 1965 for survey research:

Python - Randomized Response

import random

def randomized_response(true_answer: bool, p: float = 0.75) -> bool:
    """Local DP via randomized response.
    With probability p, report truthfully.
    With probability 1-p, report randomly.
    Satisfies ln(p/(1-p))-DP when p > 0.5."""
    if random.random() < p:
        return true_answer        # Truth
    else:
        return random.choice([True, False])  # Random

def estimate_true_proportion(responses, p=0.75):
    """Recover the true proportion from noisy responses."""
    observed = sum(responses) / len(responses)
    # Correct for the noise: true = (observed - 0.5*(1-p)) / p
    estimated = (observed - 0.5 * (1 - p)) / (2 * p - 1)
    return max(0, min(1, estimated))

RAPPOR

RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) is Google's local DP system for collecting statistics from Chrome browsers. It uses a combination of Bloom filters and randomized response to collect frequency data on categorical values while protecting individual users.

The Shuffle Model

A middle ground between local and central DP. Users apply local randomization, then a trusted shuffler permutes the messages before the server sees them:

The shuffling amplifies the privacy guarantee beyond what local DP alone provides
Achieves accuracy close to central DP with local DP's trust model
Can be implemented with secure shuffling protocols or anonymous channels

Choosing the Right Model

Use Central DP when: You control the data infrastructure, can secure it, and need high accuracy. Typical for internal analytics and ML training.
Use Local DP when: Users do not trust the data collector, you have millions of users, and you need basic aggregate statistics.
Use Shuffle DP when: You want the trust model of local DP but need better accuracy than pure local DP.

✅

For ML training: Central DP (via DP-SGD) is almost always the right choice. Local DP adds too much noise for gradient-based optimization to work well. If you cannot trust the training infrastructure, consider federated learning with secure aggregation plus central DP.

← Previous DP-SGD Next → Tools & Libraries