Privacy, Security and Ethics
in Data Science
Nikolaos Vasiloglou
1
Summary
● From Public to Private datasets
● Anonymizing data
● Anonymizing computations
● Seeking security for my data
● The unethical surprise of a data scientist
● Data science, the opportunity to build a more equal world
2
When are public data useful to a data scientist?
● Public data are by default anonymized (census data)
● By its nature there is no privacy concern (imagenet)
● Public data come with an identifier that allows user to join them with private
data (census)
● Public data can semantically join without an id (imagenet)
3
Public datasets that are not so useful
● Netflix or movielens and any other recommender datasets
● There is no way you can join them with real users
● They are only good for testing your algorithm
● Get an expectation of the accuracy range
4
Not so public not so private data
● Twitter data
● Facebook data
● Yelp data
● Amazon data
5
What is wrong about this data?
● Some of them are public to your friends but not to everybody
● Even when they are public, they might not be personally identifiable
● The fact that they are there does not mean you can use them without consent
● The Cambridge Analytica case
6
How to respect people’s privacy
● The minimum you can do is store them in a safe place
● Is cloud safe?
● How safe can it be?
● Is encryption enough?
7
Major Failures
8
How many layers of protection should I add?
● Two factor authentication
● VPN
● Encryption
● ….
● What is wrong with it?
● Secure -> Difficult -> People become creative in exfiltrating the data
9
Is my laptop safer than my company’s servers?
● Let’s discuss it!
● Who is the best target?
10
Use the usual trick
● Distribute your data
● Why is this helpful?
11
12
An example: Distributed Addition
13
14
Another example: Distributed multiplication
15
Homomorphic encryption another direction
16
17
18
19
20
Does encryption really respect privacy?
● What if I train a classifier and then throw the data?
● Can the classifier leak the data trained?
21
Adversarial Attacks
● Reconstructing training datasets
22
Reconstructing images
23
Reconstructing text
24
Differential Privacy, the remedy
25
What is differential privacy
● Practically speaking, a clever way to add
noise on your model without hurting
performance
● More or less the same trick like gradient
descend
● Bayesian model give you differential privacy
for free!!
26
What if you are not allowed to see the data?
● Sensitive and personal data like email
● European Union does not allow storing Personally Identifiable Information
● What representation should you use ?
27
Generative Models
● GANs
● LSTMs
28
Are they safe?
● Almost!
● It is possible to leak information
● You have to be carefull
29
What if we redact the sensitive information
● Netfix 2
30
Identifying people from location data
31
Life after model, ethical responsibility
32
Some Facts about ML algorithms
● Garbage In -> Garbage Out
● Racism In -> Racism Out
33
A never ending list of failures
34
35
36
37
Social Bias
38
More on social bias
39
More sad failures
40
Debiasing is possible
41
You can fix it
42
Use legitimate source of information
43
NIPS devoted a keynote and a worksop
44
Conclusion
45

More Related Content

PPTX
Introduction to Data Science
PDF
Ethics in Data Science and Machine Learning
PDF
WTSFest 2023 - Data Storytelling, Lazarina Stoy. (3).pdf
PDF
Data ethics
PDF
Model selection and cross validation techniques
PPTX
Data Privacy and Protection Presentation
PPT
Human Computer Interaction Chapter 5 Universal Design and User Support - Dr....
PPTX
AI and Deep Learning
Introduction to Data Science
Ethics in Data Science and Machine Learning
WTSFest 2023 - Data Storytelling, Lazarina Stoy. (3).pdf
Data ethics
Model selection and cross validation techniques
Data Privacy and Protection Presentation
Human Computer Interaction Chapter 5 Universal Design and User Support - Dr....
AI and Deep Learning

What's hot (20)

PPT
Data preparation
PDF
Module 4: Model Selection and Evaluation
PPTX
The impact of web on ir
PPTX
Machine Learning in Cyber Security
PDF
Building a performing Machine Learning model from A to Z
PPTX
Introduction of data science
PPTX
Big data and data science overview
PPTX
Introduction to Data Analytics
PDF
Classification Based Machine Learning Algorithms
PPTX
Data Mining: Outlier analysis
PPTX
Application of data mining
PPT
Data mining :Concepts and Techniques Chapter 2, data
PPTX
Supervised and unsupervised learning
PPT
similarity measure
PPTX
Bagging.pptx
PPTX
Data science life cycle
PDF
An introduction to Machine Learning
PPT
Association rule mining
PPTX
Introduction to Data mining
PDF
Cloud Mashup
Data preparation
Module 4: Model Selection and Evaluation
The impact of web on ir
Machine Learning in Cyber Security
Building a performing Machine Learning model from A to Z
Introduction of data science
Big data and data science overview
Introduction to Data Analytics
Classification Based Machine Learning Algorithms
Data Mining: Outlier analysis
Application of data mining
Data mining :Concepts and Techniques Chapter 2, data
Supervised and unsupervised learning
similarity measure
Bagging.pptx
Data science life cycle
An introduction to Machine Learning
Association rule mining
Introduction to Data mining
Cloud Mashup
Ad

Similar to Privacy, security and ethics in data science (20)

PDF
What is Differential Privacy?
PDF
Privacy best practice and Plone
PDF
Privacy preserving Data Sharing - PyData Berlin 2018
PDF
Privacy preserving machine learning
PDF
OSMC 2024 | A Guide to Responsible Data Collection In Open Source by Alex Biehl
PPTX
Itri icl 0116_distribute
PPTX
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
PDF
Differential privacy in the real world
PDF
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
PDF
WSO2CON 2024 Slides - Unlocking Value with AI
PPTX
Online Privacy & Computer Security Basics (September 2017)
PDF
Internet of Things With Privacy in Mind
PDF
Career in Data Science (July 2017, DTLA)
PDF
In:Confidence 2019 - Tools for privacy-aware data analysis
PPTX
Helping Developers with Privacy
PDF
Wearable privacy skeletons in the data closet
PPTX
The Guide to Managing the Security of Your SaaS and Cloud Providers
PDF
Introduction to data sciceeeeeeeeeeeeeeeeeeee
PDF
Getting Started in Data Science
PDF
Internet Privacy
What is Differential Privacy?
Privacy best practice and Plone
Privacy preserving Data Sharing - PyData Berlin 2018
Privacy preserving machine learning
OSMC 2024 | A Guide to Responsible Data Collection In Open Source by Alex Biehl
Itri icl 0116_distribute
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
Differential privacy in the real world
De-anonymizing, Preserving and Democratizing Data Privacy and Ownership
WSO2CON 2024 Slides - Unlocking Value with AI
Online Privacy & Computer Security Basics (September 2017)
Internet of Things With Privacy in Mind
Career in Data Science (July 2017, DTLA)
In:Confidence 2019 - Tools for privacy-aware data analysis
Helping Developers with Privacy
Wearable privacy skeletons in the data closet
The Guide to Managing the Security of Your SaaS and Cloud Providers
Introduction to data sciceeeeeeeeeeeeeeeeeeee
Getting Started in Data Science
Internet Privacy
Ad

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Wound infection.pdfWound infection.pdf123
PPT
veterinary parasitology ````````````.ppt
PDF
Science Form five needed shit SCIENEce so
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPT
Presentation of a Romanian Institutee 2.
PPTX
endocrine - management of adrenal incidentaloma.pptx
PPTX
gene cloning powerpoint for general biology 2
PPTX
PMR- PPT.pptx for students and doctors tt
PPT
LEC Synthetic Biology and its application.ppt
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Packaging materials of fruits and vegetables
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Wound infection.pdfWound infection.pdf123
veterinary parasitology ````````````.ppt
Science Form five needed shit SCIENEce so
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Presentation of a Romanian Institutee 2.
endocrine - management of adrenal incidentaloma.pptx
gene cloning powerpoint for general biology 2
PMR- PPT.pptx for students and doctors tt
LEC Synthetic Biology and its application.ppt
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
A powerpoint on colorectal cancer with brief background
Seminar Hypertension and Kidney diseases.pptx
Presentation1 INTRODUCTION TO ENZYMES.pptx
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Packaging materials of fruits and vegetables
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...

Privacy, security and ethics in data science