Differential privacy without sensitivity [NIPS2016読み会資料]

Differential Privacy
without
Sensitivity
南賢太郎（東大情報理工 D1）
2017/1/19@NIPS2016読み会

Overview
Differential privacy (DP)
• Degrees of privacy protection [Dwork+06]
Gibbs posterior
• A generalization of the Bayesian posterior
Contribution
We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness
of the loss
2

Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
3

Outline
1. Background
3. Applications
4

Privacy constraint in ML & statistics
5
𝑋1 𝑋2 𝑋 𝑛
⋯
User’s data 𝐷 Curator Statistic 𝜃

Privacy constraint in ML & statistics
6
⋯
User’s data 𝐷 Curator Statistic 𝜃
In many applications of ML & statistics, the data 𝐷 =
{𝑋1, … , 𝑋 𝑛} contains user’s personal information
Problem: Calculate a statistic of interest 𝜃 privately
TBD.

Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
7
⋯

8
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Auxiliary info. 𝐷′

9
⋯
Noise

10
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯

11
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Small noise for 𝜃
 Adding noise may not
deteriorate the accuracy
Large noise for 𝑋𝑖
 Privacy preservation

Differential privacy
Idea:
1. Generate a random 𝜃 from a data-dependent distribution 𝜌 𝐷
12
⋯

Idea:
2. Two “adjacent” datasets differing in a single individual
should be statistically indistinguishable
13
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Close in the sense of
a “statistical distance”

Def: Differential Privacy [Dwork+06]
• 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters
• 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if
1. for any adjacent datasets 𝐷, 𝐷′, and
2. for any set 𝐴 ⊂ Θ of outputs,
the following inequality holds:
14

Interpretation of DP
• DP prevents identification with statistical significance
• e.g. Adversary cannot construct power 𝛾-test for
𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋
at 5% significance level
• See also:
15

DP and statistical learning
Example: Linear classification
• Find a 𝜀, 𝛿 -DP distribution of hyperplanes
that minimizes the expected classification error
16

Differentially private learning
Question: What kind of random estimators should we use?
1. Noise addition to a deterministic estimator
• e.g. maximum likelihood estimator + noise
2. Modification of the Bayesian posterior (this work)
17

Outline
1. Background
3. Applications
18

Gibbs posterior
• Bayesian posterior
• Introduce a “scale parameter” 𝛽 > 0
19

Gibbs posterior
A natural data-dependent distribution in statistics & ML
• Contains the Bayesian posterior
ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1
• Important in PAC-Bayes theory [Catoni07][Zhang06]
20
Loss function
ℓ(𝜃, 𝑥)
Prior distribution
𝜋
Inverse temperature
𝛽 > 0

Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
22
𝛽 → 0

Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
23
𝛽 → 0
Answer
Yes, if…
• ℓ is bounded (Previously known)
• 𝛻ℓ is bounded (This work)

The exponential mechanism
Theorem [MT07]
An algorithm that draws 𝜃 from a distribution
satisfies (𝜀, 0)-DP
24

The exponential mechanism
Theorem [MT07]
An algorithm that draws 𝜃 from a distribution
satisfies (𝜀, 0)-DP
• This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1
𝑛
ℓ(𝜃, 𝑥𝑖)
• 𝛽 has to satisfy
𝛽 ≤
𝜀
2Δℒ
• Δℒ: sensitivity (TBD.)
25

Sensitivity
Definition: Sensitivity of ℒ: Θ × 𝒳 𝑛 → ℝ
• The exponential mechanism works if 𝛥ℒ < ∞ !
26
𝐿∞-norm
Supremum is taken over
adjacent datasets

Sensitivity
Theorem [Wang+15]
(A) ℓ 𝜃, 𝑥 ≤ 𝐴
⟹ Δℒ ≤ 2𝐴
(B) ℓ 𝜃, 𝑥 − ℓ 𝜃, 𝑥′
≤ 𝐴
⟹ Δℒ ≤ 𝐴
27
𝜃
𝐴
𝜃
𝐴

Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
28𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞

Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
29𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞
We need differential privacy
without sensitivity!

From bounded to Lipschitz
• In the example of logistic loss, the 1st derivative is
bounded
• The Lipschitz constant 𝐿 is not influenced by
the size of parameter space DiamΘ
30

Main theorem
31
Theorem [Minami+16]
Assumption:
1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex
2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex
3. Θ = ℝ 𝑑
 The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as
(1)
Independent of the sensitivity!

Outline
1. Background
3. Applications
32

Example: Logistic Loss
Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦( 𝑎, 𝑧 + 𝑏)
33
𝒵 = 𝑧 ∈ ℝ 𝑑, ∥ 𝑧 ∥2≤ 𝑅
𝒳 = 𝑧, 𝑦 ∣ 𝑧 ∈ 𝒵, 𝑦 ∈ −1, +1
𝜃 = (𝑎, 𝑏)

Example: Logistic Loss
• Gaussian prior
𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼
• The Gibbs posterior is given by:
• 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if
34

Langevin Monte Carlo method
• In practice, sampling from the Gibbs posterior can be a
computationally hard problem
• Some approximate sampling methods are used
(e.g. MCMC, VB)
35

• Langevin Monte Carlo (LMC)
36
GD LMC

• “Mixing-time” results have been derived for log-concave
distributions [Dalalyan14][Durmus & Moulines15]
• LMC can attain 𝛾-approximation after finite 𝑇 iterations
• Polynomial time in 𝑛 and 𝛾−1
:
𝑇 ∼ 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
37

• I have a Privacy Preservation guarantee
• I have an Approximate Posterior
• (Ah…)
38

Privacy Preserving Approximate Posterior (PPAP)
• We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior
Proposition [Minami+16]
• Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem.
• We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳
• After 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
iterations, the output of the LMC satisfies
(𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP.
39

Summary
= Differential privacy + Statistical learning
2. We developed a new method to prove (𝜀, 𝛿)-DP
for Gibbs posteriors without “sensitivity”
• Applicable to Lipschitz & convex losses
• (+) Guarantee for an approximate sampling method
Thank you!
40

Differential privacy without sensitivity [NIPS2016読み会資料]

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Differential privacy without sensitivity [NIPS2016読み会資料] (20)

Recently uploaded (20)

Differential privacy without sensitivity [NIPS2016読み会資料]

Editor's Notes