SlideShare a Scribd company logo
Privacy-Preserving Data Analysis
Adria Gascon
The Alan Turing Institute & Warwick University
Based on joint work with Borja Balle, Phillipp Schoppmann,
Mariana Raykova, Jack Doerner, Samee Zahur, David
Evans, Age Chapman, Alan Davoust, Peter Buneman
What analysis on what data?

Fined grained private data, e.g. tracking for targeted
advertising, credit scoring...

Data held by several organisations, e.g. hospitals?

Data held by individuals, e.g. on their phones?
Who cares?

Data owners (of course)

Data controllers
Adria Gascon Phillipp Schoppmann Borja Balle
Mariana Raykova Jack Doerner Samee Zahur David Evans
Privacy Preserving
Distributed Linear Regression on
High-Dimensional Data
Motivation
Treatment
Outcome
Medical Data
Census Data
Financial Data
Atr. 1 Atr. 2 … Atr. 4 Atr. 5 … Atr. 7 Atr. 8 …
-1.0 0 54.3 … North 34 … 5 1 …
1.5 1 0.6 … South 12 … 10 0 …
-0.3 1 16.0 … East 56 … 2 0 …
0.7 0 35.0 … Centre 67 … 15 1 …
3.1 1 20.2 … West 29 … 7 1 …
Note: This is vertcally-parttoned data; similar problems with horizontally-parttoned
Private Multi-Party Machine Learning
Assumptons
• Parameters of the model will be received by all partes
• Partes can engage in on-line secure communicatons
• External partes might be used to outsource
computaton or initalize cryptographic primitves
Problem
• Two or more partes want to jointly learn a model of
their data
• But they can’t share their private data with other partes
The Trusted Party “Solution”
(secure channel)
(secure channel)
(secure channel)
Trusted
Party
Receives plain-text data, runs
algorithm, returns result to partes
?
The Trusted Party assumpton:
• Introduces a single point of failure
• Relies on weak incentves
• Requires agreement between all data providers
=> Useful but unrealistc. Maybe can be simulated?
Secure Multi-Party Computation (MPC)
Public:
Private:
(party i)
Goal:
Compute f in a way that each party
learns y (and nothing else!)
Our Contribution
A PMPML system for vertcally parttoned linear regression
Features:
• Scalable to millions of records and hundreds of dimensions
• Formal privacy guarantees (semi-honest security)
• Open source implementaton
Tools:
• Combine standard MPC constructons (GC, OT, TI, …)
• Efcient private inner product protocols
• Conjugate gradient descent robust to fxed-point encodings
FAQ: Why is PMPML…
Excitng?
Can provide access to previously ”locked” data
Hard?
Privacy is tricky to formalize, hard to implement,
and inherently interdisciplinary
Worth?
Beter models while avoiding legal risks and bad
PR
Read It, Use It
https://github.com/schoppmp/linreg-mpc
http://eprint.iacr.org/2016/892PETS’17
Adria Gascon Phillipp Schoppmann Borja Balle
Private Document Classifcaton in
Federated Databases
Secure document classification
Secure document classification
Adria Gascon James Bell Tejas Kulkarni
Privacy-Preserving Distributed
Hypothesis Testng
● Drop off in Manhattan?
● Tip over 25 %?
● Was it a short journey?
● Was payment method
credit card?
Drop-off in Manhattan and tip over 25%
are significantly correlated events.
But this result is differentially private, so I cannot easily tell
if a given journey was included in the training dataset or not.
Problem: model-check security properties on
private source code.
Privacy-Preserving Model Checking
●
Problem: Check security properties on (private)
source code.
●
“Public” equivalent: MOPS [1], and some others.
– Security property expressed as regular expression over
sequences of instructions
– Find all paths in control flow graph that match path
●
Application of Private Regular Path Queries
[1] Hao Chen and David Wagner. 2002. MOPS: an infrastructure for examining security properties of software.
In Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), Vijay Atluri
(Ed.). ACM, New York, NY, USA, 235-244. DOI=http://dx.doi.org/10.1145/586110.586142
Privacy-Preserving Model Checking
Secure queries on graph data
Simple Example
1 #include <stdio.h>
2 #include <sys/types.h>
3 #include <unistd.h>
4 #include <pwd.h>
5
6 void drop_priv()
7 {
8 struct passwd *passwd;
9
10 if ((passwd = getpwuid(getuid())) == NULL)
11 {
12 printf("getpwuid() failed");
13 return;
14 }
15 printf("Drop user %s's privilegen", passwd-
>pw_name);
16 seteuid(getuid());
17 }
18
19 int main(int argc, char *argv[])
20 {
21 drop_priv();
22 printf("About to execn");
hello.c
Simple Example
Control flow graph Security property FSA
(system call with root priviledge)
Interesting case:
distributed private graph (code)
main.c library.c
Related Work
Verification Across Intellectual Property Boundaries [2]:
[2] Chaki, Sagar, Christian Schallhart, and Helmut Veith. "Verification across intellectual property boundaries."
ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 15.
Related Work
Verification Across Intellectual Property Boundaries [2]
They also say...
“While we are aware of advanced methods such as secure multiparty computation
[Goldreich 2002] and zeroknowledge proofs [Ben-Or et al. 1988], we believe that they are
impracticable for our problem, as such methods cannot be easily wrapped over given
validation tools. Finally, we believe that any advanced method without an intuitive proof for
its secrecy will be heavily opposed by the supplier—and might therefore be hard to
establish in practice.”
Case study: thttpd
●
Tiny http server
●
2 main modules (thttp.c and libhttp.c)
thttp.c
(2k loc)
libhttp.c
(4k loc)
thttpd control flow graph...
●
2 main modules only
●
functions are disconnected
thttpd: next steps
●
Adapt private Regular Path Queries work for
pushdown automata
●
Find some bugs.
●
Write paper.
●
Voila!
Thanks!

More Related Content

PPTX
Data Loss Prevention
PPSX
Adaptive Intrusion Detection Using Learning Classifiers
PPT
current-trends
PPTX
Using Big Data to Counteract Advanced Threats
PDF
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
PDF
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
PDF
Measure What Matters: How to Use MITRE ATTACK to do the Right Things in the R...
PDF
MITRE ATT&CKcon 2.0: Lessons in Purple Team Testing with MITRE ATT&CK; Daniel...
Data Loss Prevention
Adaptive Intrusion Detection Using Learning Classifiers
current-trends
Using Big Data to Counteract Advanced Threats
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Measure What Matters: How to Use MITRE ATTACK to do the Right Things in the R...
MITRE ATT&CKcon 2.0: Lessons in Purple Team Testing with MITRE ATT&CK; Daniel...

What's hot (7)

PPTX
BSidesLV -The SOC Counter ATT&CK
PDF
ATT&CKcon Power Hour - ATT&CK-onomics - gert-jan bruggink
PPTX
Leveraging MITRE ATT&CK - Speaking the Common Language
PDF
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CK
PPTX
SOC2016 - The Investigation Labyrinth
PPTX
Threat hunting in cyber world
PDF
Resistance Isn't Futile: A Practical Approach to Threat Modeling
BSidesLV -The SOC Counter ATT&CK
ATT&CKcon Power Hour - ATT&CK-onomics - gert-jan bruggink
Leveraging MITRE ATT&CK - Speaking the Common Language
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CK
SOC2016 - The Investigation Labyrinth
Threat hunting in cyber world
Resistance Isn't Futile: A Practical Approach to Threat Modeling
Ad

Similar to Privacy-Preserving Data Analysis, Adria Gascon (20)

PDF
In:Confidence 2019 - Tools for privacy-aware data analysis
PPTX
Introduction multiparty computation
PPTX
Privacy preserving computing and secure multi party computation
PPTX
Gde privacy tf_summit
PDF
Privacy solutions decode2021_jon_oliver
PDF
THE CRYPTO CLUSTERING FOR ENHANCEMENT OF DATA PRIVACY
PDF
A novel ppdm protocol for distributed peer to peer information sources
PDF
Implementation_of_laplacian_differential_privacy_with_varying_epsilonv3.pdf
PDF
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
PPTX
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
PDF
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
PPTX
Privacy ml session_gdg
PDF
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
PPTX
The Future Of Threat Intelligence Platforms
PPTX
Privacy Preserved Data Augmentation using Enterprise Data Fabric
PDF
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
PDF
Privacy preserving machine learning
PPTX
Final review m score
PPTX
Privacy-preserving Information Sharing: Tools and Applications
PPTX
14a Conferenza Nazionale di Statistica
In:Confidence 2019 - Tools for privacy-aware data analysis
Introduction multiparty computation
Privacy preserving computing and secure multi party computation
Gde privacy tf_summit
Privacy solutions decode2021_jon_oliver
THE CRYPTO CLUSTERING FOR ENHANCEMENT OF DATA PRIVACY
A novel ppdm protocol for distributed peer to peer information sources
Implementation_of_laplacian_differential_privacy_with_varying_epsilonv3.pdf
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Privacy-Preserving Machine Learning: secure user data without sacrificing mod...
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Privacy ml session_gdg
IRJET- Efficient Privacy-Preserving using Novel Based Secure Protocol in SVM
The Future Of Threat Intelligence Platforms
Privacy Preserved Data Augmentation using Enterprise Data Fabric
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
Privacy preserving machine learning
Final review m score
Privacy-preserving Information Sharing: Tools and Applications
14a Conferenza Nazionale di Statistica
Ad

More from Ulrik Lyngs (14)

PPTX
Social Machines: Theoretical perspectives, Paul Smart
PPTX
Mandevillian Intelligence, Paul Smart
PPTX
Human-Extended Machine Cognition, Paul Smart
PDF
Understanding Algorithmic Decisions
PPTX
Zooniverse Update
PDF
Data sharing in the age of the Social Machine
PPTX
Ulysses in Cyberspace: Distraction and Self-Regulation in Social Machines
PDF
SoLiD co operating.systems
PPTX
Sociagrams: How to design a social machine
PPTX
Safe Haven in a Box, Petros Papapanagiotou
PPTX
App Observatory
PPTX
A Privacy Framework for Social Machines
PPTX
SOCIAM Book: The Theory and Practice of Social Machines
PPTX
Provenance and Analytics for Social Machines, Trung Dong Huynh
Social Machines: Theoretical perspectives, Paul Smart
Mandevillian Intelligence, Paul Smart
Human-Extended Machine Cognition, Paul Smart
Understanding Algorithmic Decisions
Zooniverse Update
Data sharing in the age of the Social Machine
Ulysses in Cyberspace: Distraction and Self-Regulation in Social Machines
SoLiD co operating.systems
Sociagrams: How to design a social machine
Safe Haven in a Box, Petros Papapanagiotou
App Observatory
A Privacy Framework for Social Machines
SOCIAM Book: The Theory and Practice of Social Machines
Provenance and Analytics for Social Machines, Trung Dong Huynh

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Privacy-Preserving Data Analysis, Adria Gascon

  • 1. Privacy-Preserving Data Analysis Adria Gascon The Alan Turing Institute & Warwick University Based on joint work with Borja Balle, Phillipp Schoppmann, Mariana Raykova, Jack Doerner, Samee Zahur, David Evans, Age Chapman, Alan Davoust, Peter Buneman
  • 2. What analysis on what data?  Fined grained private data, e.g. tracking for targeted advertising, credit scoring...  Data held by several organisations, e.g. hospitals?  Data held by individuals, e.g. on their phones?
  • 3. Who cares?  Data owners (of course)  Data controllers
  • 4. Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Jack Doerner Samee Zahur David Evans Privacy Preserving Distributed Linear Regression on High-Dimensional Data
  • 5. Motivation Treatment Outcome Medical Data Census Data Financial Data Atr. 1 Atr. 2 … Atr. 4 Atr. 5 … Atr. 7 Atr. 8 … -1.0 0 54.3 … North 34 … 5 1 … 1.5 1 0.6 … South 12 … 10 0 … -0.3 1 16.0 … East 56 … 2 0 … 0.7 0 35.0 … Centre 67 … 15 1 … 3.1 1 20.2 … West 29 … 7 1 … Note: This is vertcally-parttoned data; similar problems with horizontally-parttoned
  • 6. Private Multi-Party Machine Learning Assumptons • Parameters of the model will be received by all partes • Partes can engage in on-line secure communicatons • External partes might be used to outsource computaton or initalize cryptographic primitves Problem • Two or more partes want to jointly learn a model of their data • But they can’t share their private data with other partes
  • 7. The Trusted Party “Solution” (secure channel) (secure channel) (secure channel) Trusted Party Receives plain-text data, runs algorithm, returns result to partes ? The Trusted Party assumpton: • Introduces a single point of failure • Relies on weak incentves • Requires agreement between all data providers => Useful but unrealistc. Maybe can be simulated?
  • 8. Secure Multi-Party Computation (MPC) Public: Private: (party i) Goal: Compute f in a way that each party learns y (and nothing else!)
  • 9. Our Contribution A PMPML system for vertcally parttoned linear regression Features: • Scalable to millions of records and hundreds of dimensions • Formal privacy guarantees (semi-honest security) • Open source implementaton Tools: • Combine standard MPC constructons (GC, OT, TI, …) • Efcient private inner product protocols • Conjugate gradient descent robust to fxed-point encodings
  • 10. FAQ: Why is PMPML… Excitng? Can provide access to previously ”locked” data Hard? Privacy is tricky to formalize, hard to implement, and inherently interdisciplinary Worth? Beter models while avoiding legal risks and bad PR
  • 11. Read It, Use It https://github.com/schoppmp/linreg-mpc http://eprint.iacr.org/2016/892PETS’17
  • 12. Adria Gascon Phillipp Schoppmann Borja Balle Private Document Classifcaton in Federated Databases
  • 15. Adria Gascon James Bell Tejas Kulkarni Privacy-Preserving Distributed Hypothesis Testng
  • 16. ● Drop off in Manhattan? ● Tip over 25 %? ● Was it a short journey? ● Was payment method credit card? Drop-off in Manhattan and tip over 25% are significantly correlated events. But this result is differentially private, so I cannot easily tell if a given journey was included in the training dataset or not.
  • 17. Problem: model-check security properties on private source code. Privacy-Preserving Model Checking
  • 18. ● Problem: Check security properties on (private) source code. ● “Public” equivalent: MOPS [1], and some others. – Security property expressed as regular expression over sequences of instructions – Find all paths in control flow graph that match path ● Application of Private Regular Path Queries [1] Hao Chen and David Wagner. 2002. MOPS: an infrastructure for examining security properties of software. In Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), Vijay Atluri (Ed.). ACM, New York, NY, USA, 235-244. DOI=http://dx.doi.org/10.1145/586110.586142 Privacy-Preserving Model Checking
  • 19. Secure queries on graph data
  • 20. Simple Example 1 #include <stdio.h> 2 #include <sys/types.h> 3 #include <unistd.h> 4 #include <pwd.h> 5 6 void drop_priv() 7 { 8 struct passwd *passwd; 9 10 if ((passwd = getpwuid(getuid())) == NULL) 11 { 12 printf("getpwuid() failed"); 13 return; 14 } 15 printf("Drop user %s's privilegen", passwd- >pw_name); 16 seteuid(getuid()); 17 } 18 19 int main(int argc, char *argv[]) 20 { 21 drop_priv(); 22 printf("About to execn"); hello.c
  • 21. Simple Example Control flow graph Security property FSA (system call with root priviledge)
  • 22. Interesting case: distributed private graph (code) main.c library.c
  • 23. Related Work Verification Across Intellectual Property Boundaries [2]: [2] Chaki, Sagar, Christian Schallhart, and Helmut Veith. "Verification across intellectual property boundaries." ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 15.
  • 24. Related Work Verification Across Intellectual Property Boundaries [2] They also say... “While we are aware of advanced methods such as secure multiparty computation [Goldreich 2002] and zeroknowledge proofs [Ben-Or et al. 1988], we believe that they are impracticable for our problem, as such methods cannot be easily wrapped over given validation tools. Finally, we believe that any advanced method without an intuitive proof for its secrecy will be heavily opposed by the supplier—and might therefore be hard to establish in practice.”
  • 25. Case study: thttpd ● Tiny http server ● 2 main modules (thttp.c and libhttp.c) thttp.c (2k loc) libhttp.c (4k loc)
  • 26. thttpd control flow graph... ● 2 main modules only ● functions are disconnected
  • 27. thttpd: next steps ● Adapt private Regular Path Queries work for pushdown automata ● Find some bugs. ● Write paper. ● Voila!