SlideShare a Scribd company logo
Differential Privacy
without
Sensitivity
南 賢太郎(東大 情報理工 D1)
2017/1/19@NIPS2016読み会
Overview
Differential privacy (DP)
• Degrees of privacy protection [Dwork+06]
Gibbs posterior
• A generalization of the Bayesian posterior
Contribution
We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness
of the loss
2
Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
3
Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
4
Privacy constraint in ML & statistics
5
𝑋1 𝑋2 𝑋 𝑛
⋯
User’s data 𝐷 Curator Statistic 𝜃
Privacy constraint in ML & statistics
6
𝑋1 𝑋2 𝑋 𝑛
⋯
User’s data 𝐷 Curator Statistic 𝜃
In many applications of ML & statistics, the data 𝐷 =
{𝑋1, … , 𝑋 𝑛} contains user’s personal information
Problem: Calculate a statistic of interest 𝜃 privately
TBD.
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
7
𝑋1 𝑋2 𝑋 𝑛
⋯
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
8
𝑋1 𝑋2 𝑋 𝑛
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Auxiliary info. 𝐷′
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
9
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
10
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
11
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Small noise for 𝜃
 Adding noise may not
deteriorate the accuracy
Large noise for 𝑋𝑖
 Privacy preservation
Differential privacy
Idea:
1. Generate a random 𝜃 from a data-dependent distribution 𝜌 𝐷
12
𝑋1 𝑋2 𝑋 𝑛
⋯
Differential privacy
Idea:
2. Two “adjacent” datasets differing in a single individual
should be statistically indistinguishable
13
𝑋1 𝑋2 𝑋 𝑛
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Close in the sense of
a “statistical distance”
Differential privacy
Def: Differential Privacy [Dwork+06]
• 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters
• 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if
1. for any adjacent datasets 𝐷, 𝐷′, and
2. for any set 𝐴 ⊂ Θ of outputs,
the following inequality holds:
14
Interpretation of DP
• DP prevents identification with statistical significance
• e.g. Adversary cannot construct power 𝛾-test for
𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋
at 5% significance level
• See also:
15
DP and statistical learning
Example: Linear classification
• Find a 𝜀, 𝛿 -DP distribution of hyperplanes
that minimizes the expected classification error
16
Differentially private learning
Question: What kind of random estimators should we use?
1. Noise addition to a deterministic estimator
• e.g. maximum likelihood estimator + noise
2. Modification of the Bayesian posterior (this work)
17
Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
18
Gibbs posterior
• Bayesian posterior
• Introduce a “scale parameter” 𝛽 > 0
19
Gibbs posterior
A natural data-dependent distribution in statistics & ML
• Contains the Bayesian posterior
ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1
• Important in PAC-Bayes theory [Catoni07][Zhang06]
20
Loss function
ℓ(𝜃, 𝑥)
Prior distribution
𝜋
Inverse temperature
𝛽 > 0
Gibbs posterior
21
𝛽 → 0
Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
22
𝛽 → 0
Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
23
𝛽 → 0
Answer
Yes, if…
• ℓ is bounded (Previously known)
• 𝛻ℓ is bounded (This work)
The exponential mechanism
Theorem [MT07]
An algorithm that draws 𝜃 from a distribution
satisfies (𝜀, 0)-DP
24
The exponential mechanism
Theorem [MT07]
An algorithm that draws 𝜃 from a distribution
satisfies (𝜀, 0)-DP
• This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1
𝑛
ℓ(𝜃, 𝑥𝑖)
• 𝛽 has to satisfy
𝛽 ≤
𝜀
2Δℒ
• Δℒ: sensitivity (TBD.)
25
Sensitivity
Definition: Sensitivity of ℒ: Θ × 𝒳 𝑛 → ℝ
• The exponential mechanism works if 𝛥ℒ < ∞ !
26
𝐿∞-norm
Supremum is taken over
adjacent datasets
Sensitivity
Theorem [Wang+15]
(A) ℓ 𝜃, 𝑥 ≤ 𝐴
⟹ Δℒ ≤ 2𝐴
(B) ℓ 𝜃, 𝑥 − ℓ 𝜃, 𝑥′
≤ 𝐴
⟹ Δℒ ≤ 𝐴
27
𝜃
𝐴
𝜃
𝐴
Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
28𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞
Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
29𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞
We need differential privacy
without sensitivity!
From bounded to Lipschitz
• In the example of logistic loss, the 1st derivative is
bounded
• The Lipschitz constant 𝐿 is not influenced by
the size of parameter space DiamΘ
30
Main theorem
31
Theorem [Minami+16]
Assumption:
1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex
2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex
3. Θ = ℝ 𝑑
 The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as
(1)
Independent of the sensitivity!
Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
32
Example: Logistic Loss
Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦( 𝑎, 𝑧 + 𝑏)
33
𝒵 = 𝑧 ∈ ℝ 𝑑, ∥ 𝑧 ∥2≤ 𝑅
𝒳 = 𝑧, 𝑦 ∣ 𝑧 ∈ 𝒵, 𝑦 ∈ −1, +1
𝜃 = (𝑎, 𝑏)
Example: Logistic Loss
• Gaussian prior
𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼
• The Gibbs posterior is given by:
• 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if
34
Langevin Monte Carlo method
• In practice, sampling from the Gibbs posterior can be a
computationally hard problem
• Some approximate sampling methods are used
(e.g. MCMC, VB)
35
Langevin Monte Carlo method
• Langevin Monte Carlo (LMC)
36
GD LMC
Langevin Monte Carlo method
• “Mixing-time” results have been derived for log-concave
distributions [Dalalyan14][Durmus & Moulines15]
• LMC can attain 𝛾-approximation after finite 𝑇 iterations
• Polynomial time in 𝑛 and 𝛾−1
:
𝑇 ∼ 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
37
• I have a Privacy Preservation guarantee
• I have an Approximate Posterior
• (Ah…)
38
Privacy Preserving Approximate Posterior (PPAP)
• We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior
Proposition [Minami+16]
• Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem.
• We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳
• After 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
iterations, the output of the LMC satisfies
(𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP.
39
Summary
1. Differentially private learning
= Differential privacy + Statistical learning
2. We developed a new method to prove (𝜀, 𝛿)-DP
for Gibbs posteriors without “sensitivity”
• Applicable to Lipschitz & convex losses
• (+) Guarantee for an approximate sampling method
Thank you!
40

More Related Content

PPTX
Introduction of "TrailBlazer" algorithm
PDF
Safe and Efficient Off-Policy Reinforcement Learning
PDF
Improving Variational Inference with Inverse Autoregressive Flow
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
PDF
Dual Learning for Machine Translation (NIPS 2016)
PPTX
K-Means Clustering Simply
PPTX
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
PDF
Parallel Optimization in Machine Learning
Introduction of "TrailBlazer" algorithm
Safe and Efficient Off-Policy Reinforcement Learning
Improving Variational Inference with Inverse Autoregressive Flow
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
Dual Learning for Machine Translation (NIPS 2016)
K-Means Clustering Simply
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Parallel Optimization in Machine Learning

What's hot (20)

PPTX
Support Vector Machines Simply
PDF
Gradient Estimation Using Stochastic Computation Graphs
PDF
InfoGAN and Generative Adversarial Networks
PDF
Hyperparameter optimization with approximate gradient
PDF
safe and efficient off policy reinforcement learning
PDF
K-means and GMM
PDF
K-Means Algorithm
PPTX
0415_seminar_DeepDPG
PDF
is anyone_interest_in_auto-encoding_variational-bayes
PDF
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
PDF
010_20160216_Variational Gaussian Process
PDF
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Introduction to Big Data Science
PDF
Interaction Networks for Learning about Objects, Relations and Physics
PDF
Matching networks for one shot learning
PPTX
Deep learning with TensorFlow
PDF
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
PDF
Gradient Boosted Regression Trees in scikit-learn
PDF
A calculus of mobile Real-Time processes
Support Vector Machines Simply
Gradient Estimation Using Stochastic Computation Graphs
InfoGAN and Generative Adversarial Networks
Hyperparameter optimization with approximate gradient
safe and efficient off policy reinforcement learning
K-means and GMM
K-Means Algorithm
0415_seminar_DeepDPG
is anyone_interest_in_auto-encoding_variational-bayes
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
010_20160216_Variational Gaussian Process
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Introduction to Big Data Science
Interaction Networks for Learning about Objects, Relations and Physics
Matching networks for one shot learning
Deep learning with TensorFlow
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Gradient Boosted Regression Trees in scikit-learn
A calculus of mobile Real-Time processes
Ad

Viewers also liked (14)

PDF
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
PPTX
差分プライバシーとは何か? (定義 & 解釈編)
PDF
居場所を隠すために差分プライバシーを使おう
PDF
差分プライバシーによる時系列データの扱い方
PDF
Value iteration networks
PDF
Conditional Image Generation with PixelCNN Decoders
PPT
時系列データ3
PDF
Learning to learn by gradient descent by gradient descent
PDF
Fast and Probvably Seedings for k-Means
PDF
[DL輪読会]Convolutional Sequence to Sequence Learning
PDF
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
PDF
NIPS 2016 Overview and Deep Learning Topics
PPTX
ICML2016読み会 概要紹介
PDF
論文紹介 Pixel Recurrent Neural Networks
プライバシー保護のためのサンプリング、k-匿名化、そして差分プライバシー
差分プライバシーとは何か? (定義 & 解釈編)
居場所を隠すために差分プライバシーを使おう
差分プライバシーによる時系列データの扱い方
Value iteration networks
Conditional Image Generation with PixelCNN Decoders
時系列データ3
Learning to learn by gradient descent by gradient descent
Fast and Probvably Seedings for k-Means
[DL輪読会]Convolutional Sequence to Sequence Learning
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
NIPS 2016 Overview and Deep Learning Topics
ICML2016読み会 概要紹介
論文紹介 Pixel Recurrent Neural Networks
Ad

Similar to Differential privacy without sensitivity [NIPS2016読み会資料] (20)

PPTX
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
PDF
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
PDF
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
PDF
Sparsenet
PPTX
Thesis presentation
PPTX
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
PDF
Uncertainty Awareness in Integrating Machine Learning and Game Theory
PPTX
Convex optmization in communications
PDF
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
PDF
Session 4 .pdf
PDF
Paper Study: Melding the data decision pipeline
PPTX
Introduction of Xgboost
PPTX
Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition
PDF
Dimensionality Reduction
PDF
Practical data analysis with wine
PDF
Conjugate Gradient method for Brain Magnetic Resonance Images Segmentation
PDF
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
PDF
Selective encoding for abstractive sentence summarization
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
deeplearninhg........ applicationsWEEK 05.pdf
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Cl...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
Sparsenet
Thesis presentation
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Convex optmization in communications
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Session 4 .pdf
Paper Study: Melding the data decision pipeline
Introduction of Xgboost
Beyond Low Rank + Sparse: Multi-scale Low Rank Matrix Decomposition
Dimensionality Reduction
Practical data analysis with wine
Conjugate Gradient method for Brain Magnetic Resonance Images Segmentation
QMC: Operator Splitting Workshop, Projective Splitting with Forward Steps and...
Selective encoding for abstractive sentence summarization
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
deeplearninhg........ applicationsWEEK 05.pdf

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology

Differential privacy without sensitivity [NIPS2016読み会資料]

  • 1. Differential Privacy without Sensitivity 南 賢太郎(東大 情報理工 D1) 2017/1/19@NIPS2016読み会
  • 2. Overview Differential privacy (DP) • Degrees of privacy protection [Dwork+06] Gibbs posterior • A generalization of the Bayesian posterior Contribution We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness of the loss 2
  • 3. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 3
  • 4. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 4
  • 5. Privacy constraint in ML & statistics 5 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃
  • 6. Privacy constraint in ML & statistics 6 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃 In many applications of ML & statistics, the data 𝐷 = {𝑋1, … , 𝑋 𝑛} contains user’s personal information Problem: Calculate a statistic of interest 𝜃 privately TBD.
  • 7. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 7 𝑋1 𝑋2 𝑋 𝑛 ⋯
  • 8. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 8 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Auxiliary info. 𝐷′
  • 9. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 9 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise
  • 10. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 10 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯
  • 11. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 11 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Small noise for 𝜃  Adding noise may not deteriorate the accuracy Large noise for 𝑋𝑖  Privacy preservation
  • 12. Differential privacy Idea: 1. Generate a random 𝜃 from a data-dependent distribution 𝜌 𝐷 12 𝑋1 𝑋2 𝑋 𝑛 ⋯
  • 13. Differential privacy Idea: 2. Two “adjacent” datasets differing in a single individual should be statistically indistinguishable 13 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Close in the sense of a “statistical distance”
  • 14. Differential privacy Def: Differential Privacy [Dwork+06] • 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters • 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if 1. for any adjacent datasets 𝐷, 𝐷′, and 2. for any set 𝐴 ⊂ Θ of outputs, the following inequality holds: 14
  • 15. Interpretation of DP • DP prevents identification with statistical significance • e.g. Adversary cannot construct power 𝛾-test for 𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋 at 5% significance level • See also: 15
  • 16. DP and statistical learning Example: Linear classification • Find a 𝜀, 𝛿 -DP distribution of hyperplanes that minimizes the expected classification error 16
  • 17. Differentially private learning Question: What kind of random estimators should we use? 1. Noise addition to a deterministic estimator • e.g. maximum likelihood estimator + noise 2. Modification of the Bayesian posterior (this work) 17
  • 18. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 18
  • 19. Gibbs posterior • Bayesian posterior • Introduce a “scale parameter” 𝛽 > 0 19
  • 20. Gibbs posterior A natural data-dependent distribution in statistics & ML • Contains the Bayesian posterior ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1 • Important in PAC-Bayes theory [Catoni07][Zhang06] 20 Loss function ℓ(𝜃, 𝑥) Prior distribution 𝜋 Inverse temperature 𝛽 > 0
  • 22. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 22 𝛽 → 0
  • 23. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 23 𝛽 → 0 Answer Yes, if… • ℓ is bounded (Previously known) • 𝛻ℓ is bounded (This work)
  • 24. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP 24
  • 25. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP • This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1 𝑛 ℓ(𝜃, 𝑥𝑖) • 𝛽 has to satisfy 𝛽 ≤ 𝜀 2Δℒ • Δℒ: sensitivity (TBD.) 25
  • 26. Sensitivity Definition: Sensitivity of ℒ: Θ × 𝒳 𝑛 → ℝ • The exponential mechanism works if 𝛥ℒ < ∞ ! 26 𝐿∞-norm Supremum is taken over adjacent datasets
  • 27. Sensitivity Theorem [Wang+15] (A) ℓ 𝜃, 𝑥 ≤ 𝐴 ⟹ Δℒ ≤ 2𝐴 (B) ℓ 𝜃, 𝑥 − ℓ 𝜃, 𝑥′ ≤ 𝐴 ⟹ Δℒ ≤ 𝐴 27 𝜃 𝐴 𝜃 𝐴
  • 28. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 28𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞
  • 29. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 29𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞ We need differential privacy without sensitivity!
  • 30. From bounded to Lipschitz • In the example of logistic loss, the 1st derivative is bounded • The Lipschitz constant 𝐿 is not influenced by the size of parameter space DiamΘ 30
  • 31. Main theorem 31 Theorem [Minami+16] Assumption: 1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex 2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex 3. Θ = ℝ 𝑑  The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as (1) Independent of the sensitivity!
  • 32. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 32
  • 33. Example: Logistic Loss Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦( 𝑎, 𝑧 + 𝑏) 33 𝒵 = 𝑧 ∈ ℝ 𝑑, ∥ 𝑧 ∥2≤ 𝑅 𝒳 = 𝑧, 𝑦 ∣ 𝑧 ∈ 𝒵, 𝑦 ∈ −1, +1 𝜃 = (𝑎, 𝑏)
  • 34. Example: Logistic Loss • Gaussian prior 𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼 • The Gibbs posterior is given by: • 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if 34
  • 35. Langevin Monte Carlo method • In practice, sampling from the Gibbs posterior can be a computationally hard problem • Some approximate sampling methods are used (e.g. MCMC, VB) 35
  • 36. Langevin Monte Carlo method • Langevin Monte Carlo (LMC) 36 GD LMC
  • 37. Langevin Monte Carlo method • “Mixing-time” results have been derived for log-concave distributions [Dalalyan14][Durmus & Moulines15] • LMC can attain 𝛾-approximation after finite 𝑇 iterations • Polynomial time in 𝑛 and 𝛾−1 : 𝑇 ∼ 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 37
  • 38. • I have a Privacy Preservation guarantee • I have an Approximate Posterior • (Ah…) 38
  • 39. Privacy Preserving Approximate Posterior (PPAP) • We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior Proposition [Minami+16] • Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem. • We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳 • After 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 iterations, the output of the LMC satisfies (𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP. 39
  • 40. Summary 1. Differentially private learning = Differential privacy + Statistical learning 2. We developed a new method to prove (𝜀, 𝛿)-DP for Gibbs posteriors without “sensitivity” • Applicable to Lipschitz & convex losses • (+) Guarantee for an approximate sampling method Thank you! 40

Editor's Notes

  • #2: 009F91
  • #7: In practical data analysis or machine learning setting, the dataset, denoted by D, contains user’s personal information So we want to protect user’s data by DP
  • #15: I now introduce the formal definition of differential privacy for data-dependent distributions (Differential privacy defines the robustness of randomized statistics) rho_D is a randomized statsitics, OR similarly a data-dependent prob. measure on a certain param. space we say that rho_D satisfies (e,d)-DP if… here adjacent means “Hamming distance 1”
  • #17: The figure is an example of linear classification Here the dataset D consists of binary labeled points, and our classifier, theta, is a hyperplane In the differentially private manner, we release a random hyperplane, instead of a usual deterministic one
  • #18: \inf_{\rho_D: \; (\varepsilon, \delta)\text{-DP}} \mathbb{E}_{\theta \sim \rho_D} R(\theta) So our problem (in general) is stated like this