SlideShare a Scribd company logo
Adjusting to the GDPR:
The Impact on Data Scientists and
Behavioral Researchers
Travis Greene, Galit Shmueli, Soumya Ray
National Tsing Hua University, Taiwan
INFORMS 2nd Data Science Workshop, Phoenix, Nov 4, 2018
1
Roadmap
1. Personal Data: USA vs. EU
2. GDPR in a Nutshell
3. Processing GDPR through the InfoQ Framework
4. How will GDPR impact data scientists?
2
USA
Commercial
commodity
"Collecting and processing
[personal data] is allowed unless
it causes harm or is expressly
limited by U.S. law.”
EU
Fundamental right
(Article 8 EU Charter of Fundamental Rights)
"Processing of personal data
is prohibited unless there is an
explicit legal basis that allows it."
Opt-out Opt-in
Personal Data:
Any information that could be used to ‘single out’ a person
3
(Potentially) global reach
● Up to 20M Euro fines or 4% of global turnover
● Affects both industry and research practices
● Similar privacy laws in USA, China, India, Brazil...
Data
Controller
Data
Processor
Data
Subjects
Evolution of 1995 Data Protection Directive into EU-wide
Regulation
Defines three key entities:
4
5
If you’re a data science researcher, it is
difficult to synthesize a coherent
understanding of the new GDPR changes
→ We need a structured framework!
6
Our three-step approach to analyzing GDPR
3.
Analyze
Use
categorization to
analyze the
impact of GDPR
on data science
workflow
1.
Identify
Identify key
GDPR concepts,
definitions,
principles
relevant to data
science research
2.
Categorize
Categorize key
GDPR concepts
in a meaningful
way for data
scientists
7
InfoQ provides a coherent, systematic framework for assessing
the impact of GDPR on data scientists
1. Data resolution
2. Data structure
3. Data integration
4. Temporal relevance
5. Chronology of data & goal
6. Generalizability
7. Operationalization
8. Communication
The Information Quality (InfoQ) Framework
(Kenett & Shmueli, 2014)
InfoQ depends on
4 components:
Assess InfoQ? 8 DimensionsPotential of a dataset to achieve a goal,
given analysis method and utility
8
GDPR Concepts, Definitions, Principles
Privacy by Design
Special Category Data
Purpose Limitation
Automated Profiling
Systems
Pre-GDPR Data
Pseudonymized Data
Legitim
ate
Interests
Structured and
Unstructured Data
Statistical Research
StatisticalAggregations
Consent
Principle of Proportionality
Data Controllers
InfoQ
Statistical Research
Contractual
Necessity
Goal
Scientific Research
Statistical Research
Public Interest
Research
Historical Research
Archival Research
Data
Personal Data
Special Category data
Pseudonymized data
Statistical Data
Publicly available data
Pre-GDPR Personal
Data
Utility
Principle of
Proportionality
Purpose Limitation
Contractual Necessity
Legitimate Interests
Privacy by Design
Consent
Analysis
Statistical Aggregation
Automated Profiling
Filing Systems
Structured vs.
Unstructured
Documentation
Serve Mankind
1.
Identify
2.
Categorize
1.
Collect
Data
1.Resolution
2.Structure
3.Integration
4.Temporal relevance
Examine Typical Data Analysis Workflow Using
InfoQ Framework
5.
Communicate
4.
Generalize
3.
Share
Data
2.
Use
Data
Complete Analysis
InfoQ provides us with ‘x-ray’ vision for analyzing each step of the process
InfoQ
8 Dimensions
Beginning of Research
5.Chronology
6.Generalizability
7. Operationalization
8. Communication
3.
Analyze
11
1. Collect Data
Data Minimization
What kinds of data can
we legally collect?
Purpose Limitation
On which legal grounds
can we collect users’
data?
Pseudonymization
How should collected
data be stored and
secured?
2. Use Data
Pre-GDPR Data
If subjects consented
prior to GDPR, can we
continue to use their
data?
Heterogeneity
Will these data be
available at the time of
prediction?
3. Share Data
Collaboration
How can academics
make use of the vast
stores of BBD collected
and processed by
major internet
companies?
Liability
GDPR imposes large
potential fines
5. Communicate
Data Subjects
How do we explain our
results to concerned data
subjects?
Data Protection Authorities
How can we prove our
compliance with GDPR
principles?
1.Resolution
2.Structure
3.Integration
4.Temporal relevance
4. Generalize
Consent bias
How do we know our
results will generalize
to the population of
interest?
Replication
Can our results be
replicated?
5.Chronology
6.Generalizability
7. Operationalization
8. Communication
A Modern Data Science Workflow
8 InfoQ Dimensions
1. Gathering Data
Pre-collection
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
Data Minimization & Purpose limitation
Collect only for specific purposes clearly explained
→ Must justify “Why do you need my ethnicity?”
Can’t arbitrarily repurpose personal data
→ Need legal basis
Data minimization & privacy preservation paradox
→ Power calculations may indirectly lead to
re-identification
12
Pseudonymization is just a suggestion
→ Spur research on ‘privacy protective data mining’
Different implications for different researchers
→ Personalized vs. aggregate-level models
Pseudonymized data is contextual
→ Know incentives & data environment
Pseudonymization
Data features that might (reasonably) be
used to ID a specific person are stored
separately and securely from other data
IP:
192.18.8.1
Name:
Travis
Green
1. Gathering Data
Pre-collection
Post-collection
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
13
Reconsent,
Data Availability
& Heterogeneity
Pre-GDPR user data reconsent
→ Fewer rows but more accuracy
Data availability for future prediction
→ Must expect opt-outs
More user privacy options
→ Larger heterogeneity in completeness
Models built using de-consented data
→ Still not clear, but Article 7 seems to allow it
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
14
Increased Legal Liability
Companies dropping 3rd party sharing
→ Less rich data
Data subject re-identification and intellectual property
→ “Data access divide”: trusted researchers from elite universities
New legal instruments of compliance
→ Binding Corporate Rules (BCRs), Standard contractual clauses,
certification schemes
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
15
Consent Bias, Guinea Pigs, & Reproducibility
Privacy-savvy users may opt-out
→ Limits inferential power
Lower standards of consent & processing
→ Non-EU users become behavioral big data guinea pigs
Reproducibility of results vs. legal liability
→ Is it worth it for firms?
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
16
Data Subjects
→ Rights to access/information in simple, clear language
→ Right to explanation (why & how) of automated profiling
Authorities
→ Compliance documentation, data privacy impact
assessments (DPIAs), data breach reporting
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
Two Audiences:
Data Subjects and Data Authorities
17
Summary
& Final
Thoughts
- Rethink & justify how and why we collect, store, and analyze personal data
- Tradeoffs between economic development and fundamental rights to privacy
18

More Related Content

PPTX
European Data Protection, the Right to be Forgotten and Search Engines
PDF
GDPR Data Subject Rights - What You Need to Know
PPTX
Supporting GDPR Compliance through Data Classification
PDF
Checklist for SMEs for GDPR compliance
PDF
GDPR considerations for blockchain solution architects.
PDF
GDPR and Blockchain
PPTX
Data Protection and Academic Research: The New GDPR Framework
DOCX
Do You Have a Roadmap for EU GDPR Compliance? Article
European Data Protection, the Right to be Forgotten and Search Engines
GDPR Data Subject Rights - What You Need to Know
Supporting GDPR Compliance through Data Classification
Checklist for SMEs for GDPR compliance
GDPR considerations for blockchain solution architects.
GDPR and Blockchain
Data Protection and Academic Research: The New GDPR Framework
Do You Have a Roadmap for EU GDPR Compliance? Article

What's hot (20)

PPTX
Getting Ready for GDPR
PDF
GDPR master class accountable research organisations (january 2018)
 
PDF
Teleran Data Protection - Addressing 5 Critical GDPR Requirements
PPTX
Cryptography for privacy preserving data mining
PDF
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
PPT
Data mining and privacy preserving in data mining
PPTX
Operations network - consent under gdpr 24.01.2018
 
PPT
Building a register of data processing
PPTX
Privacy Secrets Your Systems May Be Telling
PPTX
GDPR master class - transparent research projects
 
PPTX
GDPR How to get started?
PDF
GDPR and Hadoop
PDF
Legal and ethical considerations for sharing research data
PDF
Browne Jacobson - Administrative and public law - October 2017
PPTX
Tackling the GDPR Dell EMC Index Engines Webinar
PDF
Privacy Preserving Data Mining
PPTX
Webinar: Practical Technology Playbook for the GDPR
PDF
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
PDF
Privacy Preserving Data Mining
PPTX
Privacy preserving in data mining with hybrid approach
Getting Ready for GDPR
GDPR master class accountable research organisations (january 2018)
 
Teleran Data Protection - Addressing 5 Critical GDPR Requirements
Cryptography for privacy preserving data mining
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
Data mining and privacy preserving in data mining
Operations network - consent under gdpr 24.01.2018
 
Building a register of data processing
Privacy Secrets Your Systems May Be Telling
GDPR master class - transparent research projects
 
GDPR How to get started?
GDPR and Hadoop
Legal and ethical considerations for sharing research data
Browne Jacobson - Administrative and public law - October 2017
Tackling the GDPR Dell EMC Index Engines Webinar
Privacy Preserving Data Mining
Webinar: Practical Technology Playbook for the GDPR
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
Privacy Preserving Data Mining
Privacy preserving in data mining with hybrid approach
Ad

Similar to Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers (20)

PDF
Course 5: GDPR & Big Data by Sari Depreeuw
PPTX
GDPR Presentation
PDF
[REPORT PREVIEW] GDPR Beyond May 25, 2018
PPTX
Impact of GDPR on Data Science Projects - Jagdev Bhogal (Birmingham City Univ...
PPT
GDPR webinar presentation | LawBite
PPTX
An itinerary for FAIR and privacy respecting data-driven innovation and research
PDF
GDPR (En) JM Tyszka
PPTX
GDPR Practicalities - The Data Shed
PPTX
What is the General Data Protection Regulation (GDPR)?
PPTX
How to turn GDPR into a Strategic Advantage using Connected Data
PDF
Gdpr presentation
PPTX
Privacy and video surveillance: Advanced technology and best practices protec...
PPTX
GDPR in the Healthcare Industry
PDF
Interact 2018 - GDPR for digital publishers, digital agencies and advertisers
PPTX
De groote de man Ingrid de Poorter
PPTX
GDPR- GENERAL DATA PROTECTION REGULATION
PPTX
GDPR- GENERAL DATA PROTECTION REGULATION
PDF
ITCamp 2018 - Cristiana Fernbach - GDPR compliance in the industry 4.0
PPTX
GDPR Enforcement is here. Are you ready?
PPTX
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Course 5: GDPR & Big Data by Sari Depreeuw
GDPR Presentation
[REPORT PREVIEW] GDPR Beyond May 25, 2018
Impact of GDPR on Data Science Projects - Jagdev Bhogal (Birmingham City Univ...
GDPR webinar presentation | LawBite
An itinerary for FAIR and privacy respecting data-driven innovation and research
GDPR (En) JM Tyszka
GDPR Practicalities - The Data Shed
What is the General Data Protection Regulation (GDPR)?
How to turn GDPR into a Strategic Advantage using Connected Data
Gdpr presentation
Privacy and video surveillance: Advanced technology and best practices protec...
GDPR in the Healthcare Industry
Interact 2018 - GDPR for digital publishers, digital agencies and advertisers
De groote de man Ingrid de Poorter
GDPR- GENERAL DATA PROTECTION REGULATION
GDPR- GENERAL DATA PROTECTION REGULATION
ITCamp 2018 - Cristiana Fernbach - GDPR compliance in the industry 4.0
GDPR Enforcement is here. Are you ready?
Pronti per la legge sulla data protection GDPR? No Panic! - Domenico Maracci,...
Ad

Recently uploaded (20)

PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Microsoft 365 products and services descrption
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
Managing Community Partner Relationships
PPTX
New ISO 27001_2022 standard and the changes
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Leprosy and NLEP programme community medicine
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Microsoft Core Cloud Services powerpoint
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Microsoft 365 products and services descrption
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Managing Community Partner Relationships
New ISO 27001_2022 standard and the changes
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Leprosy and NLEP programme community medicine
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Global Data and Analytics Market Outlook Report
retention in jsjsksksksnbsndjddjdnFPD.pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
IMPACT OF LANDSLIDE.....................
Microsoft Core Cloud Services powerpoint

Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers

  • 1. Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers Travis Greene, Galit Shmueli, Soumya Ray National Tsing Hua University, Taiwan INFORMS 2nd Data Science Workshop, Phoenix, Nov 4, 2018 1
  • 2. Roadmap 1. Personal Data: USA vs. EU 2. GDPR in a Nutshell 3. Processing GDPR through the InfoQ Framework 4. How will GDPR impact data scientists? 2
  • 3. USA Commercial commodity "Collecting and processing [personal data] is allowed unless it causes harm or is expressly limited by U.S. law.” EU Fundamental right (Article 8 EU Charter of Fundamental Rights) "Processing of personal data is prohibited unless there is an explicit legal basis that allows it." Opt-out Opt-in Personal Data: Any information that could be used to ‘single out’ a person 3
  • 4. (Potentially) global reach ● Up to 20M Euro fines or 4% of global turnover ● Affects both industry and research practices ● Similar privacy laws in USA, China, India, Brazil... Data Controller Data Processor Data Subjects Evolution of 1995 Data Protection Directive into EU-wide Regulation Defines three key entities: 4
  • 5. 5
  • 6. If you’re a data science researcher, it is difficult to synthesize a coherent understanding of the new GDPR changes → We need a structured framework! 6
  • 7. Our three-step approach to analyzing GDPR 3. Analyze Use categorization to analyze the impact of GDPR on data science workflow 1. Identify Identify key GDPR concepts, definitions, principles relevant to data science research 2. Categorize Categorize key GDPR concepts in a meaningful way for data scientists 7
  • 8. InfoQ provides a coherent, systematic framework for assessing the impact of GDPR on data scientists 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Chronology of data & goal 6. Generalizability 7. Operationalization 8. Communication The Information Quality (InfoQ) Framework (Kenett & Shmueli, 2014) InfoQ depends on 4 components: Assess InfoQ? 8 DimensionsPotential of a dataset to achieve a goal, given analysis method and utility 8
  • 9. GDPR Concepts, Definitions, Principles Privacy by Design Special Category Data Purpose Limitation Automated Profiling Systems Pre-GDPR Data Pseudonymized Data Legitim ate Interests Structured and Unstructured Data Statistical Research StatisticalAggregations Consent Principle of Proportionality Data Controllers InfoQ Statistical Research Contractual Necessity Goal Scientific Research Statistical Research Public Interest Research Historical Research Archival Research Data Personal Data Special Category data Pseudonymized data Statistical Data Publicly available data Pre-GDPR Personal Data Utility Principle of Proportionality Purpose Limitation Contractual Necessity Legitimate Interests Privacy by Design Consent Analysis Statistical Aggregation Automated Profiling Filing Systems Structured vs. Unstructured Documentation Serve Mankind 1. Identify 2. Categorize
  • 10. 1. Collect Data 1.Resolution 2.Structure 3.Integration 4.Temporal relevance Examine Typical Data Analysis Workflow Using InfoQ Framework 5. Communicate 4. Generalize 3. Share Data 2. Use Data Complete Analysis InfoQ provides us with ‘x-ray’ vision for analyzing each step of the process InfoQ 8 Dimensions Beginning of Research 5.Chronology 6.Generalizability 7. Operationalization 8. Communication 3. Analyze
  • 11. 11 1. Collect Data Data Minimization What kinds of data can we legally collect? Purpose Limitation On which legal grounds can we collect users’ data? Pseudonymization How should collected data be stored and secured? 2. Use Data Pre-GDPR Data If subjects consented prior to GDPR, can we continue to use their data? Heterogeneity Will these data be available at the time of prediction? 3. Share Data Collaboration How can academics make use of the vast stores of BBD collected and processed by major internet companies? Liability GDPR imposes large potential fines 5. Communicate Data Subjects How do we explain our results to concerned data subjects? Data Protection Authorities How can we prove our compliance with GDPR principles? 1.Resolution 2.Structure 3.Integration 4.Temporal relevance 4. Generalize Consent bias How do we know our results will generalize to the population of interest? Replication Can our results be replicated? 5.Chronology 6.Generalizability 7. Operationalization 8. Communication A Modern Data Science Workflow 8 InfoQ Dimensions
  • 12. 1. Gathering Data Pre-collection 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating Data Minimization & Purpose limitation Collect only for specific purposes clearly explained → Must justify “Why do you need my ethnicity?” Can’t arbitrarily repurpose personal data → Need legal basis Data minimization & privacy preservation paradox → Power calculations may indirectly lead to re-identification 12
  • 13. Pseudonymization is just a suggestion → Spur research on ‘privacy protective data mining’ Different implications for different researchers → Personalized vs. aggregate-level models Pseudonymized data is contextual → Know incentives & data environment Pseudonymization Data features that might (reasonably) be used to ID a specific person are stored separately and securely from other data IP: 192.18.8.1 Name: Travis Green 1. Gathering Data Pre-collection Post-collection 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 13
  • 14. Reconsent, Data Availability & Heterogeneity Pre-GDPR user data reconsent → Fewer rows but more accuracy Data availability for future prediction → Must expect opt-outs More user privacy options → Larger heterogeneity in completeness Models built using de-consented data → Still not clear, but Article 7 seems to allow it 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 14
  • 15. Increased Legal Liability Companies dropping 3rd party sharing → Less rich data Data subject re-identification and intellectual property → “Data access divide”: trusted researchers from elite universities New legal instruments of compliance → Binding Corporate Rules (BCRs), Standard contractual clauses, certification schemes 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 15
  • 16. Consent Bias, Guinea Pigs, & Reproducibility Privacy-savvy users may opt-out → Limits inferential power Lower standards of consent & processing → Non-EU users become behavioral big data guinea pigs Reproducibility of results vs. legal liability → Is it worth it for firms? 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 16
  • 17. Data Subjects → Rights to access/information in simple, clear language → Right to explanation (why & how) of automated profiling Authorities → Compliance documentation, data privacy impact assessments (DPIAs), data breach reporting 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating Two Audiences: Data Subjects and Data Authorities 17
  • 18. Summary & Final Thoughts - Rethink & justify how and why we collect, store, and analyze personal data - Tradeoffs between economic development and fundamental rights to privacy 18