Local vs Global Differential Privacy
The trust model determines where noise is added: at the data source (local DP) or by a trusted aggregator (global/central DP). This choice fundamentally affects both the privacy guarantee and the utility of results.
Two Trust Models
| Model | Trust Assumption | Where Noise Is Added | Utility | Use Case |
|---|---|---|---|---|
| Global (Central) DP | Trusted curator holds raw data | After aggregation | Higher accuracy | Internal analytics, Census |
| Local DP | No trusted party; users do not share raw data | Before data leaves user device | Lower accuracy | Telemetry, keyboard data |
Global (Central) Differential Privacy
In the central model, a trusted data curator collects raw data and applies noise to the outputs of queries or analyses:
- Advantage: Much better accuracy for the same privacy level. Noise is added once to the aggregate, not to each individual record.
- Disadvantage: Requires trusting the curator with raw data. If the curator is compromised, all data is exposed.
- Examples: US Census Bureau's 2020 Census, internal company analytics, DP-SGD model training.
Local Differential Privacy
In the local model, each user perturbs their own data before sending it. The server never sees raw data:
- Advantage: Strongest trust model. Even if the server is compromised, individual data remains private.
- Disadvantage: Requires significantly more users to achieve the same accuracy. Noise is amplified by n (number of users) compared to central DP.
- Examples: Apple's emoji/Safari data, Google's RAPPOR for Chrome.
Randomized Response
The simplest local DP mechanism, invented by Stanley Warner in 1965 for survey research:
import random def randomized_response(true_answer: bool, p: float = 0.75) -> bool: """Local DP via randomized response. With probability p, report truthfully. With probability 1-p, report randomly. Satisfies ln(p/(1-p))-DP when p > 0.5.""" if random.random() < p: return true_answer # Truth else: return random.choice([True, False]) # Random def estimate_true_proportion(responses, p=0.75): """Recover the true proportion from noisy responses.""" observed = sum(responses) / len(responses) # Correct for the noise: true = (observed - 0.5*(1-p)) / p estimated = (observed - 0.5 * (1 - p)) / (2 * p - 1) return max(0, min(1, estimated))
RAPPOR
RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) is Google's local DP system for collecting statistics from Chrome browsers. It uses a combination of Bloom filters and randomized response to collect frequency data on categorical values while protecting individual users.
The Shuffle Model
A middle ground between local and central DP. Users apply local randomization, then a trusted shuffler permutes the messages before the server sees them:
- The shuffling amplifies the privacy guarantee beyond what local DP alone provides
- Achieves accuracy close to central DP with local DP's trust model
- Can be implemented with secure shuffling protocols or anonymous channels
Choosing the Right Model
- Use Central DP when: You control the data infrastructure, can secure it, and need high accuracy. Typical for internal analytics and ML training.
- Use Local DP when: Users do not trust the data collector, you have millions of users, and you need basic aggregate statistics.
- Use Shuffle DP when: You want the trust model of local DP but need better accuracy than pure local DP.
Lilly Tech Systems