SlideShare a Scribd company logo
Real-World Data
Engineering
Practice for GDPR
Ching-Yu Wu and Jeff Hung, SPN Data Team, Trend Micro
2019/09/06 @DataCon
© 2019 Trend Micro Inc.2
⚠️ Disclaimer
• Please view this sharing as a reference
– Detailed implementation varies with different business
requirements
– Maybe not suitable for every company
– MUST reach a consensus with legal department before
implementing your data pipeline
© 2019 Trend Micro Inc.3
What is GDPR?
General Data Protection Regulation
Effective on 2018/5/25
Protect Personal Data of EU citizens
Strengthen Privacy Rights of EU Individuals
© 2019 Trend Micro Inc.4
Key Changes
Increased Territorial Scope
• All businesses collecting personal data on EU citizens
• Regardless of the company’s location
Breach Notification
• Report it within 72 hours
Penalties
• 20M € or 4% of global turnover
• Google was fined 50M € on 2019/1/21
© 2019 Trend Micro Inc.5
Highlighted Individual’s Rights
Right to
Access
Right to
Erasure
Data
Portability
Privacy by
Design
© 2019 Trend Micro Inc.6
Simple Data Pipeline for GDPR
© 2019 Trend Micro Inc.7
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
© 2019 Trend Micro Inc.8
Data Collection Declaration
• Clearly declare the purposes in Terms of Use
– What data will be sent?
• List all the categories
– Reasons for collecting data
• Is it essential for service?
– A clear consent
• Check box for opt-in or opt-out
© 2019 Trend Micro Inc.9
Data Categorization
• Definition of personal data
– Personally Identifying Information (PII)
– Non-PII, PII and Sensitive-PII
• PII: name, account ID, email address, date of birth, gender, etc.
• Sensitive-PII: Health data, sexual orientation, Race, etc.
– Collecting Sensitive-PII data is basically prohibited
© 2019 Trend Micro Inc.10
It’s All About Compliance
• The definition MUST be established by Legal
Department
• Review process in development cycle
– Clear description for the data being collected
• Provided by product team
– Legal review, approve and archive it
– Clearer document, better communication
© 2019 Trend Micro Inc.11
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
© 2019 Trend Micro Inc.12
Separated Databases
• De-identification in analytical data
– Have a clear separation between user and analytical data
• No one can access both
– User data (user’s behavior and personal information)
• Purchase records, login records, etc.
– Analytical data (neutral logs)
• Detection logs, activity data, etc.
© 2019 Trend Micro Inc.13
Anonymization
• GDPR suggests to have a unified anonymous ID
across all the systems
– Stop using e-mail or other user’s personal information as
the unique ID
– Avoid storing personal information in each
service/application
• Use foreign key or other similar concepts
© 2019 Trend Micro Inc.14
• How to de-identify an identifiable field?
– Irreversible encoding
– Simplest way: one-way hash
• With or without salt?
• Refresh salt or not?
– Ways to avoid re-counting (e.g., DAU and MAU)
• Synchronize the salt between client and server
• Use one-way hash (or with fixed salt)
• Change the definition of “active”
Anonymization (cont’d)
© 2019 Trend Micro Inc.15
Anonymization (cont’d)
• Where to de-identify a field?
– Ideally at the client-side (before the data sends out)
– At least at the very beginning step of server-side ETL
process
• The mapping table of identifiable
data is viewed as User data
• The operation MUST be isolated
© 2019 Trend Micro Inc.16
Permission Control
• ACL on bucket
– Few users/service accounts can read
– Even fewer service accounts can write
• User cannot have write permission
– Principle of analytical data permission control
© 2019 Trend Micro Inc.17
Limited Data Retention
• Data shouldn’t be kept for “just in case” purpose
• Periodically remove outdated data
– The retention period is set according to…
• Business value (application’s need)
• Data volume (cost)
• Other legal issues
© 2019 Trend Micro Inc.18
Data Encryption
• All the data should be encrypted in storage and in
transmission
– Bucket-level encryption
– SSL connection
– Audit logs
© 2019 Trend Micro Inc.19
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
© 2019 Trend Micro Inc.20
Rights to Access and Erasure
• If the user and analytical database are separated
– Just dump/delete the related records in user database
• Otherwise
– It’s a big project…
© 2019 Trend Micro Inc.21
The Design of User Database
• Dump/Delete user database is challenging
– Try not to put historical data in user database (if you can)
– Try to concentrate personal data on few tables
– Use foreign key or similar concept for storing “key
information”
• Just modify the record in main table as “removed”
– Consider the data exportation and deletion processes at
design-phase
• Minimize the number of actions to take
© 2019 Trend Micro Inc.22
Topics
•Data Collection Declaration
•Data Categorization
Legal & Compliance
•Anonymization
•Permission Control
•Data Encryption
Security
•Right to Access and Erasure
User's Rights
•Data Abuse Prevention
Role & Responsibility
© 2019 Trend Micro Inc.23
Data Abuse Prevention
• Fulfill marketing’s requirements
– When you have to associate user and analytical data
• To send promotion e-mail to the inactive users
• Let active users have discount while purchase new edition
– Do the association at the last step
© 2019 Trend Micro Inc.24
Role & Responsibility
• There MUST be a Data Protection Officer (DPO) in
each company
– Organize a taskforce to review the out-coming inquiries
– Audit data usage
• Audit log parser for monitoring data accessing
– Monitor data breach
© 2019 Trend Micro Inc.25
Summary
© 2019 Trend Micro Inc.26
Summary
• Recommended practices for engineers
– Good communication with Legal
• Documentation
– Separate user data and analytical data
• De-identify all analytical data
• Permission control
• Data retention period
© 2019 Trend Micro Inc.27
Q & A
Automated hybrid cloud workload protection via calls to
Trend Micro APIs. Created with real data by Trend Micro
threat researcher and artist Jindrich Karasek.
© 2019 Trend Micro Inc.29
Reference
[1] https://eugdpr.org/
[2] https://gdpr-info.eu
[3] https://blog.infodiagram.com/2018/05/present-
gdpr-diagram-data-privacy-ppt-template.html

More Related Content

PDF
Rancher 2.x first step before deep dive
PDF
Docker.소개.30 m
PPTX
Build JSON and XML using RABL gem
DOCX
ratio analysis of company final a/c statements
PPTX
컨테이너 와 가상화 기술 비교 발표 자료
PDF
MUATAN HABA TENTU T4.pdf
PDF
유닉스 리눅스 마이그레이션_이호성_v1.0
PDF
Implementing security requirements for banking API system using Open Source ...
Rancher 2.x first step before deep dive
Docker.소개.30 m
Build JSON and XML using RABL gem
ratio analysis of company final a/c statements
컨테이너 와 가상화 기술 비교 발표 자료
MUATAN HABA TENTU T4.pdf
유닉스 리눅스 마이그레이션_이호성_v1.0
Implementing security requirements for banking API system using Open Source ...

Similar to Real world data engineering practices for GDPR (20)

PDF
Impact of GDPR on Third Party and M&A Security
PDF
Geek Sync: Database Auditing Essentials: Tracking Who Did What to Which Data ...
PPTX
Embedding GDPR Within Your Information and Library Service
PPTX
GDPR: 20 Million Reasons to get ready - Part 1: Preparing for compliance
PPTX
Improve IT Security and Compliance with Mainframe Data in Splunk
PDF
Compliance in Motion: Aligning Data Governance Initiatives with Business Obje...
PDF
Practical steps to GDPR compliance
PPTX
Safeguarding customer and financial data in analytics and machine learning
PDF
GDPR Noncompliance: Avoid the Risk with Data Virtualization
PPTX
General Data Protection Regulation
PDF
GDPR- The Buck Stops Here
PPTX
Gdpr brief and controls ver2.0
PPTX
How Cloudera SDX can aid GDPR compliance
PPTX
gkknwqeq3232,sqSecurity essentials domain 3
PDF
Symantec Webinar: Preparing for the California Consumer Privacy Act (CCPA)
PDF
sunil_soares_dama_day.pdf
PPTX
Prepare Your Firm for GDPR
PPTX
Cybertopicsecurity_3
PPTX
GDPR Readiness for Software Usage Analytics
PPTX
Bridging the Gap Between Privacy and Retention
Impact of GDPR on Third Party and M&A Security
Geek Sync: Database Auditing Essentials: Tracking Who Did What to Which Data ...
Embedding GDPR Within Your Information and Library Service
GDPR: 20 Million Reasons to get ready - Part 1: Preparing for compliance
Improve IT Security and Compliance with Mainframe Data in Splunk
Compliance in Motion: Aligning Data Governance Initiatives with Business Obje...
Practical steps to GDPR compliance
Safeguarding customer and financial data in analytics and machine learning
GDPR Noncompliance: Avoid the Risk with Data Virtualization
General Data Protection Regulation
GDPR- The Buck Stops Here
Gdpr brief and controls ver2.0
How Cloudera SDX can aid GDPR compliance
gkknwqeq3232,sqSecurity essentials domain 3
Symantec Webinar: Preparing for the California Consumer Privacy Act (CCPA)
sunil_soares_dama_day.pdf
Prepare Your Firm for GDPR
Cybertopicsecurity_3
GDPR Readiness for Software Usage Analytics
Bridging the Gap Between Privacy and Retention
Ad

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Foundation of Data Science unit number two notes
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Database Infoormation System (DBIS).pptx
Launch Your Data Science Career in Kochi – 2025
oil_refinery_comprehensive_20250804084928 (1).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Foundation of Data Science unit number two notes
Major-Components-ofNKJNNKNKNKNKronment.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Knowledge Engineering Part 1
Data_Analytics_and_PowerBI_Presentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Reliability_Chapter_ presentation 1221.5784
Ad

Real world data engineering practices for GDPR

  • 1. Real-World Data Engineering Practice for GDPR Ching-Yu Wu and Jeff Hung, SPN Data Team, Trend Micro 2019/09/06 @DataCon
  • 2. © 2019 Trend Micro Inc.2 ⚠️ Disclaimer • Please view this sharing as a reference – Detailed implementation varies with different business requirements – Maybe not suitable for every company – MUST reach a consensus with legal department before implementing your data pipeline
  • 3. © 2019 Trend Micro Inc.3 What is GDPR? General Data Protection Regulation Effective on 2018/5/25 Protect Personal Data of EU citizens Strengthen Privacy Rights of EU Individuals
  • 4. © 2019 Trend Micro Inc.4 Key Changes Increased Territorial Scope • All businesses collecting personal data on EU citizens • Regardless of the company’s location Breach Notification • Report it within 72 hours Penalties • 20M € or 4% of global turnover • Google was fined 50M € on 2019/1/21
  • 5. © 2019 Trend Micro Inc.5 Highlighted Individual’s Rights Right to Access Right to Erasure Data Portability Privacy by Design
  • 6. © 2019 Trend Micro Inc.6 Simple Data Pipeline for GDPR
  • 7. © 2019 Trend Micro Inc.7 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  • 8. © 2019 Trend Micro Inc.8 Data Collection Declaration • Clearly declare the purposes in Terms of Use – What data will be sent? • List all the categories – Reasons for collecting data • Is it essential for service? – A clear consent • Check box for opt-in or opt-out
  • 9. © 2019 Trend Micro Inc.9 Data Categorization • Definition of personal data – Personally Identifying Information (PII) – Non-PII, PII and Sensitive-PII • PII: name, account ID, email address, date of birth, gender, etc. • Sensitive-PII: Health data, sexual orientation, Race, etc. – Collecting Sensitive-PII data is basically prohibited
  • 10. © 2019 Trend Micro Inc.10 It’s All About Compliance • The definition MUST be established by Legal Department • Review process in development cycle – Clear description for the data being collected • Provided by product team – Legal review, approve and archive it – Clearer document, better communication
  • 11. © 2019 Trend Micro Inc.11 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  • 12. © 2019 Trend Micro Inc.12 Separated Databases • De-identification in analytical data – Have a clear separation between user and analytical data • No one can access both – User data (user’s behavior and personal information) • Purchase records, login records, etc. – Analytical data (neutral logs) • Detection logs, activity data, etc.
  • 13. © 2019 Trend Micro Inc.13 Anonymization • GDPR suggests to have a unified anonymous ID across all the systems – Stop using e-mail or other user’s personal information as the unique ID – Avoid storing personal information in each service/application • Use foreign key or other similar concepts
  • 14. © 2019 Trend Micro Inc.14 • How to de-identify an identifiable field? – Irreversible encoding – Simplest way: one-way hash • With or without salt? • Refresh salt or not? – Ways to avoid re-counting (e.g., DAU and MAU) • Synchronize the salt between client and server • Use one-way hash (or with fixed salt) • Change the definition of “active” Anonymization (cont’d)
  • 15. © 2019 Trend Micro Inc.15 Anonymization (cont’d) • Where to de-identify a field? – Ideally at the client-side (before the data sends out) – At least at the very beginning step of server-side ETL process • The mapping table of identifiable data is viewed as User data • The operation MUST be isolated
  • 16. © 2019 Trend Micro Inc.16 Permission Control • ACL on bucket – Few users/service accounts can read – Even fewer service accounts can write • User cannot have write permission – Principle of analytical data permission control
  • 17. © 2019 Trend Micro Inc.17 Limited Data Retention • Data shouldn’t be kept for “just in case” purpose • Periodically remove outdated data – The retention period is set according to… • Business value (application’s need) • Data volume (cost) • Other legal issues
  • 18. © 2019 Trend Micro Inc.18 Data Encryption • All the data should be encrypted in storage and in transmission – Bucket-level encryption – SSL connection – Audit logs
  • 19. © 2019 Trend Micro Inc.19 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  • 20. © 2019 Trend Micro Inc.20 Rights to Access and Erasure • If the user and analytical database are separated – Just dump/delete the related records in user database • Otherwise – It’s a big project…
  • 21. © 2019 Trend Micro Inc.21 The Design of User Database • Dump/Delete user database is challenging – Try not to put historical data in user database (if you can) – Try to concentrate personal data on few tables – Use foreign key or similar concept for storing “key information” • Just modify the record in main table as “removed” – Consider the data exportation and deletion processes at design-phase • Minimize the number of actions to take
  • 22. © 2019 Trend Micro Inc.22 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  • 23. © 2019 Trend Micro Inc.23 Data Abuse Prevention • Fulfill marketing’s requirements – When you have to associate user and analytical data • To send promotion e-mail to the inactive users • Let active users have discount while purchase new edition – Do the association at the last step
  • 24. © 2019 Trend Micro Inc.24 Role & Responsibility • There MUST be a Data Protection Officer (DPO) in each company – Organize a taskforce to review the out-coming inquiries – Audit data usage • Audit log parser for monitoring data accessing – Monitor data breach
  • 25. © 2019 Trend Micro Inc.25 Summary
  • 26. © 2019 Trend Micro Inc.26 Summary • Recommended practices for engineers – Good communication with Legal • Documentation – Separate user data and analytical data • De-identify all analytical data • Permission control • Data retention period
  • 27. © 2019 Trend Micro Inc.27 Q & A
  • 28. Automated hybrid cloud workload protection via calls to Trend Micro APIs. Created with real data by Trend Micro threat researcher and artist Jindrich Karasek.
  • 29. © 2019 Trend Micro Inc.29 Reference [1] https://eugdpr.org/ [2] https://gdpr-info.eu [3] https://blog.infodiagram.com/2018/05/present- gdpr-diagram-data-privacy-ppt-template.html