SlideShare a Scribd company logo
1© 2018 All rights reserved.
Distributed Database
Architecture for GDPR
Karthik Ranganathan
PostgresConf Silicon Valley
Oct 15, 2018
2© 2018 All rights reserved.
About Us
Kannan Muthukkaruppan, CEO
Nutanix ♩ Facebook ♩ Oracle
IIT-Madras, University of California-Berkeley
Karthik Ranganathan, CTO
Nutanix ♩ Facebook ♩ Microsoft
IIT-Madras, University of Texas-Austin
Mikhail Bautin, Software Architect
ClearStory Data ♩ Facebook ♩ D.E.Shaw
Nizhny Novgorod State University, Stony Brook
 Founded Feb 2016
 Apache HBase committers and early engineers on Apache Cassandra
 Built Facebook’s NoSQL platform powered by Apache HBase
 Scaled the platform to serve many mission-critical use cases
‱ Facebook Messages (Messenger)
‱ Operational Data Store (Time series Data)
 Reassembled the same Facebook team at YugaByte along with
engineers from Oracle, Google, Nutanix and LinkedIn
Founders
3© 2018 All rights reserved.
WHAT IS
YUGABYTE DB?
4© 2018 All rights reserved.
A transactional, planet-scale database
for building high-performance cloud services.
5© 2018 All rights reserved.
NoSQL + SQL Cloud Native
6© 2018 All rights reserved.
TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE
Single Shard & Distributed ACID Txns
Document-Based, Strongly
Consistent Storage
Low Latency, Tunable Reads
High Throughput
OPEN SOURCE
Apache 2.0
Popular APIs Extended
Apache Cassandra, Redis and PostgreSQL (BETA)
Auto Sharding & Rebalancing
Global Data Distribution
Design Principles
CLOUD NATIVE
Built For The Container Era
Self-Healing, Fault-Tolerant
7© 2018 All rights reserved.
WHAT IS GDPR?
8© 2018 All rights reserved.
GDPR : General Data Protection Regulation
9© 2018 All rights reserved.
Citizens of EU can control sharing and protection
of their personal data by businesses.
10© 2018 All rights reserved.
Personal Data, also called
PII (Personally Identifiable Information)
‱ User name
‱ Email address
‱ Date of birth
‱ Bank details
‱ Location details
‱ Computer IP address
11© 2018 All rights reserved.
Control over personal data
‱ Consent & data location
‱ Data privacy and safety
‱ Right to be forgotten
‱ Data access on demand
‱ Notify on data breach
‱ Data portability
‱ Ability to fix errors in data
‱ Restrict processing
Database concerns Application concerns
12© 2018 All rights reserved.
#1 USER CONSENT
AND DATA LOCATION
13© 2018 All rights reserved.
Data must be stored in EU by default. Businesses
need explicit user consent to move it outside.
14© 2018 All rights reserved.
Why is this hard?
‱ EU user data lives in that region
‱ Other countries have compliance regulation – more geo’s
‱ Public clouds may not have coverage – hybrid deployments
‱ Architecture depends on data – multiple per service
Think Global Deployments first!
15© 2018 All rights reserved.
Example – online ecommerce site
‱ Products table needs globally replication – not PII data
16© 2018 All rights reserved.
Read Replicas
Global Replication
Non-PII Data
Global Replication
with YugaByte DB
17© 2018 All rights reserved.
Example – online ecommerce site
‱ Users, orders and shipments needs locality – PII data
‱ Product locations table needs scale – may be PII
18© 2018 All rights reserved.
Primary Data in EU
PII Data
Non-EU Data
Non-EU Data
Geo-Partitioning
with YugaByte DB
19© 2018 All rights reserved.
Replicate data on demand to other geo’s
‱ User may be ok with replicating data
‱ Read replicas on demand (for remote, low-latency reads)
‱ Change data capture (for analytics)
20© 2018 All rights reserved.
Read Replicas
Primary Data in EU
PII Data with YugaByte DB
Read Replicas with
YugaByte DB
21© 2018 All rights reserved.
#2 DATA PRIVACY
AND SAFETY
22© 2018 All rights reserved.
Data must be secured by using best practices by
default. Users need to be notified on breach.
23© 2018 All rights reserved.
Implement end-to-end encryption on day #1
24© 2018 All rights reserved.
‱ Use TLS Encryption
‱ Between client and server for app interaction
‱ Between database servers for replication
Encrypt All Network Communication
25© 2018 All rights reserved.
TLS Encryption
Database Cluster
User
Server to server
communication
26© 2018 All rights reserved.
‱ Encryption at rest
‱ Integrate with external Key Management Systems
‱ Ability to rotate keys on demand
Encryption All Storage
Have a key-value table with id to cipher key. Encrypt PII data with
the cipher key for fine-grained control. More in the next section.
27© 2018 All rights reserved.
Encryption at Rest
Database Cluster
User
Encryption on disk
Key Management
Service
28© 2018 All rights reserved.
#3 RIGHT TO BE
FORGOTTEN
29© 2018 All rights reserved.
Data must be erased if on explicit request or when
data is no longer relevant to original intent.
30© 2018 All rights reserved.
‱ Have a key-value table with id to cipher key
‱ Encrypt PII data with the cipher key on write
‱ Decrypt PII data on access
‱ Delete cipher key to forget PII data
Use Encryption of Data Attributes
31© 2018 All rights reserved.
SET email=foo@bar.com FOR USER ID=XXX
Example - Storing User Profile Data
SET email=ENCRYPTED FOR USER ID=XXX
Get encryption
key for user
Encryption PII Data
Store encrypted data
‱ Reads require decryption
‱ Data not accessible without key
32© 2018 All rights reserved.
‱ Many cases where value not needed
‱ Anonymize PII data with one way hash functions
‱ Use hashed ids for in data warehouse
‱ There is no PII data if hashed ids are used!
Use Anonymization of Data Attributes
33© 2018 All rights reserved.
USER=foo@bar.com CHECKED OUT PRODUCT=X, CATEGORY=Gadget
Example – Website Analytics
USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget
One-way hash
user id
Analytics
34© 2018 All rights reserved.
Example – Website Analytics
‱ User no longer identifiable
‱ Hashed data still useful!
35© 2018 All rights reserved.
#4 DATA ACCESS
ON DEMAND
36© 2018 All rights reserved.
Ability to inform a user about what data is being used,
for what purpose and where it is stored.
37© 2018 All rights reserved.
‱ Store in a separate information architecture table
‱ Make tagging a part of the process
‱ Easy to find what PII data is stored on demand
Tag Tables and Columns with PII
38© 2018 All rights reserved.
‱ Ensure PII are encrypted
‱ Ensure non-PII columns do not have sensitive data
‱ Use Spark/Presto to perform scan periodically
‱ Run scan on a read replica to not impact production
Run Continuous Compliance Checks
39© 2018 All rights reserved.
Ensure PII columns are encrypted
Ensure no PII data in other columns
Tag PII Columns
40© 2018 All rights reserved.
PUTTING IT ALL TOGETHER
41© 2018 All rights reserved.
GDPR Reference Architecture
Primary Cluster
(in EU)
Read Replica Clusters
(Anywhere in the World)
Encrypted Encrypted
App clients
Encrypted Async
Replication
Reads & Writes, Encrypted
Analytics clients
Read only, Encrypted
At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes
PII Columns Encrypted w/
Cipher Key
Tag PII Columns
Ensure PII columns are
encrypted
Ensure no PII data in other
columns
42© 2018 All rights reserved.
43© 2018 All rights reserved.
Questions?
Try it at
docs.yugabyte.com/latest/quick-start

More Related Content

PDF
Azure Monitoring Overview
 
PDF
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
PDF
ëȘšë“  데읎터넌 위한 당 하나의 저임소, Amazon S3 êž°ë°˜ 데읎터 ë ˆìŽíŹ::정섞웅::AWS Summit Seoul 2018
PPTX
AzureăźçźĄç†æš©é™ă«ă€ă„ăŠ
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
PDF
AWSćˆćżƒè€…ć‘ă‘Webinar RDBたAWSăžăźç§»èĄŒæ–čæł•Oracleă‚’äŸ‹ă«ïŒ‰
PDF
Apache kafka performance(latency)_benchmark_v0.3
PPTX
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Azure Monitoring Overview
 
20190424 AWS Black Belt Online Seminar Amazon Aurora MySQL
ëȘšë“  데읎터넌 위한 당 하나의 저임소, Amazon S3 êž°ë°˜ 데읎터 ë ˆìŽíŹ::정섞웅::AWS Summit Seoul 2018
AzureăźçźĄç†æš©é™ă«ă€ă„ăŠ
Building Reliable Lakehouses with Apache Flink and Delta Lake
AWSćˆćżƒè€…ć‘ă‘Webinar RDBたAWSăžăźç§»èĄŒæ–čæł•Oracleă‚’äŸ‹ă«ïŒ‰
Apache kafka performance(latency)_benchmark_v0.3
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...

What's hot (20)

PDF
Iceberg + Alluxio for Fast Data Analytics
PDF
VMware Cloud on AWS POC HCX ăƒ‡ăƒ—ăƒ­ă‚€ă‚Źă‚€ăƒ‰
PDF
Kafka with IBM Event Streams - Technical Presentation
PDF
20190828 AWS Black Belt Online Seminar Amazon Aurora with PostgreSQL Compatib...
PPTX
Envoy and Kafka
PDF
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
PPTX
ç›ŁæŸ»èŠä»¶ă‚’æœ‰ă™ă‚‹ă‚·ă‚čăƒ†ăƒ ă«ćŻŸă™ă‚‹ PostgreSQL 氎慄たèȘČéĄŒăšćŻèƒœæ€§
PDF
20180425 AWS Black Belt Online Seminar Amazon Relational Database Service (Am...
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PDF
AWS Black Belt Techă‚·ăƒȘăƒŒă‚ș Amazon EBS
PDF
큎띌우드 넀읎티람 데읎터ëČ ìŽìŠ€ 서ëč„ìŠ€ëĄœ Oracle RAC 전환 - êč€ì§€í›ˆ :: AWS 큎띌우드 마읎귞레읎션 옚띌읞
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Kafka Security 101 and Real-World Tips
PPTX
Data platform modernization with Databricks.pptx
PDF
Data Discovery at Databricks with Amundsen
PPTX
Presto: SQL-on-anything
PDF
Exploring Scenarios of Flink CDC in Streaming Data Integration
PDF
Google Cloud ăźăƒăƒƒăƒˆăƒŻăƒŒă‚Żăšăƒ­ăƒŒăƒˆă‚™ăƒă‚™ăƒ©ăƒłă‚”
PDF
AWS Black Belt Online Seminar AWS Direct Connect
PDF
íššêłŒì ìž NoSQL (Elasticahe / DynamoDB) 디자읞 및 활용 방안 (씜유정 & 씜홍식, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞) :: ...
Iceberg + Alluxio for Fast Data Analytics
VMware Cloud on AWS POC HCX ăƒ‡ăƒ—ăƒ­ă‚€ă‚Źă‚€ăƒ‰
Kafka with IBM Event Streams - Technical Presentation
20190828 AWS Black Belt Online Seminar Amazon Aurora with PostgreSQL Compatib...
Envoy and Kafka
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
ç›ŁæŸ»èŠä»¶ă‚’æœ‰ă™ă‚‹ă‚·ă‚čăƒ†ăƒ ă«ćŻŸă™ă‚‹ PostgreSQL 氎慄たèȘČéĄŒăšćŻèƒœæ€§
20180425 AWS Black Belt Online Seminar Amazon Relational Database Service (Am...
A Thorough Comparison of Delta Lake, Iceberg and Hudi
AWS Black Belt Techă‚·ăƒȘăƒŒă‚ș Amazon EBS
큎띌우드 넀읎티람 데읎터ëČ ìŽìŠ€ 서ëč„ìŠ€ëĄœ Oracle RAC 전환 - êč€ì§€í›ˆ :: AWS 큎띌우드 마읎귞레읎션 옚띌읞
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Kafka Security 101 and Real-World Tips
Data platform modernization with Databricks.pptx
Data Discovery at Databricks with Amundsen
Presto: SQL-on-anything
Exploring Scenarios of Flink CDC in Streaming Data Integration
Google Cloud ăźăƒăƒƒăƒˆăƒŻăƒŒă‚Żăšăƒ­ăƒŒăƒˆă‚™ăƒă‚™ăƒ©ăƒłă‚”
AWS Black Belt Online Seminar AWS Direct Connect
íššêłŒì ìž NoSQL (Elasticahe / DynamoDB) 디자읞 및 활용 방안 (씜유정 & 씜홍식, AWS ì†”ëŁšì…˜ìŠˆ 아킀텍튞) :: ...
Ad

Similar to Distributed Database Architecture for GDPR (20)

PPTX
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
PDF
Privacy by Design - Lars Albertsson, Mapflat
PPTX
Privacy by design
PDF
Protecting privacy in practice
PDF
Hpts 2011 flexible_oltp
PPTX
MongoDB.local Sydney: The Changing Face of Data Privacy & Ethics, and How Mon...
PPTX
Next generation data protection and security for oracle users - gdpr blockc...
PDF
Guide to NoSQL with MySQL
PDF
Isaca new delhi india - privacy and big data
PDF
DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today
PDF
Re-inventing the Database: What to Keep and What to Throw Away
PPTX
Security Framework for Multitenant Architecture
PPTX
Securing Open Source Databases
PDF
Isaca new delhi india privacy and big data
PPTX
Business Intelligence In Cloud Computing A Tokenization Approach Final
PPTX
Big data security the perfect storm
PDF
Where data security and value of data meet in the cloud brighttalk webinar ...
PPTX
Digital Ethics and Privacy in a GDPR World
PPTX
Assessing New Databases– Translytical Use Cases
PPTX
JasperWorld 2012: Reinventing Data Management by Max Schireson
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
Privacy by Design - Lars Albertsson, Mapflat
Privacy by design
Protecting privacy in practice
Hpts 2011 flexible_oltp
MongoDB.local Sydney: The Changing Face of Data Privacy & Ethics, and How Mon...
Next generation data protection and security for oracle users - gdpr blockc...
Guide to NoSQL with MySQL
Isaca new delhi india - privacy and big data
DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today
Re-inventing the Database: What to Keep and What to Throw Away
Security Framework for Multitenant Architecture
Securing Open Source Databases
Isaca new delhi india privacy and big data
Business Intelligence In Cloud Computing A Tokenization Approach Final
Big data security the perfect storm
Where data security and value of data meet in the cloud brighttalk webinar ...
Digital Ethics and Privacy in a GDPR World
Assessing New Databases– Translytical Use Cases
JasperWorld 2012: Reinventing Data Management by Max Schireson
Ad

More from Yugabyte (7)

PDF
Distributed SQL Databases Deconstructed
PPTX
Running Stateful Apps on Kubernetes
PPTX
How YugaByte DB Implements Distributed PostgreSQL
PPTX
YugaByte DB on Kubernetes - An Introduction
PPTX
YugaByte DB Internals - Storage Engine and Transactions
PPTX
Scale Transactional Apps Across Multiple Regions with Low Latency
PPTX
Demystifying Kubernetes Statefulsets
Distributed SQL Databases Deconstructed
Running Stateful Apps on Kubernetes
How YugaByte DB Implements Distributed PostgreSQL
YugaByte DB on Kubernetes - An Introduction
YugaByte DB Internals - Storage Engine and Transactions
Scale Transactional Apps Across Multiple Regions with Low Latency
Demystifying Kubernetes Statefulsets

Recently uploaded (20)

PDF
top salesforce developer skills in 2025.pdf
PPTX
history of c programming in notes for students .pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
medical staffing services at VALiNTRY
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
AI in Product Development-omnex systems
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Introduction to Artificial Intelligence
top salesforce developer skills in 2025.pdf
history of c programming in notes for students .pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How to Migrate SBCGlobal Email to Yahoo Easily
PTS Company Brochure 2025 (1).pdf.......
Internet Downloader Manager (IDM) Crack 6.42 Build 41
medical staffing services at VALiNTRY
Odoo Companies in India – Driving Business Transformation.pdf
L1 - Introduction to python Backend.pptx
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
ManageIQ - Sprint 268 Review - Slide Deck
AI in Product Development-omnex systems
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
How to Choose the Right IT Partner for Your Business in Malaysia
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Introduction to Artificial Intelligence

Distributed Database Architecture for GDPR

  • 1. 1© 2018 All rights reserved. Distributed Database Architecture for GDPR Karthik Ranganathan PostgresConf Silicon Valley Oct 15, 2018
  • 2. 2© 2018 All rights reserved. About Us Kannan Muthukkaruppan, CEO Nutanix ♩ Facebook ♩ Oracle IIT-Madras, University of California-Berkeley Karthik Ranganathan, CTO Nutanix ♩ Facebook ♩ Microsoft IIT-Madras, University of Texas-Austin Mikhail Bautin, Software Architect ClearStory Data ♩ Facebook ♩ D.E.Shaw Nizhny Novgorod State University, Stony Brook  Founded Feb 2016  Apache HBase committers and early engineers on Apache Cassandra  Built Facebook’s NoSQL platform powered by Apache HBase  Scaled the platform to serve many mission-critical use cases ‱ Facebook Messages (Messenger) ‱ Operational Data Store (Time series Data)  Reassembled the same Facebook team at YugaByte along with engineers from Oracle, Google, Nutanix and LinkedIn Founders
  • 3. 3© 2018 All rights reserved. WHAT IS YUGABYTE DB?
  • 4. 4© 2018 All rights reserved. A transactional, planet-scale database for building high-performance cloud services.
  • 5. 5© 2018 All rights reserved. NoSQL + SQL Cloud Native
  • 6. 6© 2018 All rights reserved. TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE Single Shard & Distributed ACID Txns Document-Based, Strongly Consistent Storage Low Latency, Tunable Reads High Throughput OPEN SOURCE Apache 2.0 Popular APIs Extended Apache Cassandra, Redis and PostgreSQL (BETA) Auto Sharding & Rebalancing Global Data Distribution Design Principles CLOUD NATIVE Built For The Container Era Self-Healing, Fault-Tolerant
  • 7. 7© 2018 All rights reserved. WHAT IS GDPR?
  • 8. 8© 2018 All rights reserved. GDPR : General Data Protection Regulation
  • 9. 9© 2018 All rights reserved. Citizens of EU can control sharing and protection of their personal data by businesses.
  • 10. 10© 2018 All rights reserved. Personal Data, also called PII (Personally Identifiable Information) ‱ User name ‱ Email address ‱ Date of birth ‱ Bank details ‱ Location details ‱ Computer IP address
  • 11. 11© 2018 All rights reserved. Control over personal data ‱ Consent & data location ‱ Data privacy and safety ‱ Right to be forgotten ‱ Data access on demand ‱ Notify on data breach ‱ Data portability ‱ Ability to fix errors in data ‱ Restrict processing Database concerns Application concerns
  • 12. 12© 2018 All rights reserved. #1 USER CONSENT AND DATA LOCATION
  • 13. 13© 2018 All rights reserved. Data must be stored in EU by default. Businesses need explicit user consent to move it outside.
  • 14. 14© 2018 All rights reserved. Why is this hard? ‱ EU user data lives in that region ‱ Other countries have compliance regulation – more geo’s ‱ Public clouds may not have coverage – hybrid deployments ‱ Architecture depends on data – multiple per service Think Global Deployments first!
  • 15. 15© 2018 All rights reserved. Example – online ecommerce site ‱ Products table needs globally replication – not PII data
  • 16. 16© 2018 All rights reserved. Read Replicas Global Replication Non-PII Data Global Replication with YugaByte DB
  • 17. 17© 2018 All rights reserved. Example – online ecommerce site ‱ Users, orders and shipments needs locality – PII data ‱ Product locations table needs scale – may be PII
  • 18. 18© 2018 All rights reserved. Primary Data in EU PII Data Non-EU Data Non-EU Data Geo-Partitioning with YugaByte DB
  • 19. 19© 2018 All rights reserved. Replicate data on demand to other geo’s ‱ User may be ok with replicating data ‱ Read replicas on demand (for remote, low-latency reads) ‱ Change data capture (for analytics)
  • 20. 20© 2018 All rights reserved. Read Replicas Primary Data in EU PII Data with YugaByte DB Read Replicas with YugaByte DB
  • 21. 21© 2018 All rights reserved. #2 DATA PRIVACY AND SAFETY
  • 22. 22© 2018 All rights reserved. Data must be secured by using best practices by default. Users need to be notified on breach.
  • 23. 23© 2018 All rights reserved. Implement end-to-end encryption on day #1
  • 24. 24© 2018 All rights reserved. ‱ Use TLS Encryption ‱ Between client and server for app interaction ‱ Between database servers for replication Encrypt All Network Communication
  • 25. 25© 2018 All rights reserved. TLS Encryption Database Cluster User Server to server communication
  • 26. 26© 2018 All rights reserved. ‱ Encryption at rest ‱ Integrate with external Key Management Systems ‱ Ability to rotate keys on demand Encryption All Storage Have a key-value table with id to cipher key. Encrypt PII data with the cipher key for fine-grained control. More in the next section.
  • 27. 27© 2018 All rights reserved. Encryption at Rest Database Cluster User Encryption on disk Key Management Service
  • 28. 28© 2018 All rights reserved. #3 RIGHT TO BE FORGOTTEN
  • 29. 29© 2018 All rights reserved. Data must be erased if on explicit request or when data is no longer relevant to original intent.
  • 30. 30© 2018 All rights reserved. ‱ Have a key-value table with id to cipher key ‱ Encrypt PII data with the cipher key on write ‱ Decrypt PII data on access ‱ Delete cipher key to forget PII data Use Encryption of Data Attributes
  • 31. 31© 2018 All rights reserved. SET email=foo@bar.com FOR USER ID=XXX Example - Storing User Profile Data SET email=ENCRYPTED FOR USER ID=XXX Get encryption key for user Encryption PII Data Store encrypted data ‱ Reads require decryption ‱ Data not accessible without key
  • 32. 32© 2018 All rights reserved. ‱ Many cases where value not needed ‱ Anonymize PII data with one way hash functions ‱ Use hashed ids for in data warehouse ‱ There is no PII data if hashed ids are used! Use Anonymization of Data Attributes
  • 33. 33© 2018 All rights reserved. USER=foo@bar.com CHECKED OUT PRODUCT=X, CATEGORY=Gadget Example – Website Analytics USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget One-way hash user id Analytics
  • 34. 34© 2018 All rights reserved. Example – Website Analytics ‱ User no longer identifiable ‱ Hashed data still useful!
  • 35. 35© 2018 All rights reserved. #4 DATA ACCESS ON DEMAND
  • 36. 36© 2018 All rights reserved. Ability to inform a user about what data is being used, for what purpose and where it is stored.
  • 37. 37© 2018 All rights reserved. ‱ Store in a separate information architecture table ‱ Make tagging a part of the process ‱ Easy to find what PII data is stored on demand Tag Tables and Columns with PII
  • 38. 38© 2018 All rights reserved. ‱ Ensure PII are encrypted ‱ Ensure non-PII columns do not have sensitive data ‱ Use Spark/Presto to perform scan periodically ‱ Run scan on a read replica to not impact production Run Continuous Compliance Checks
  • 39. 39© 2018 All rights reserved. Ensure PII columns are encrypted Ensure no PII data in other columns Tag PII Columns
  • 40. 40© 2018 All rights reserved. PUTTING IT ALL TOGETHER
  • 41. 41© 2018 All rights reserved. GDPR Reference Architecture Primary Cluster (in EU) Read Replica Clusters (Anywhere in the World) Encrypted Encrypted App clients Encrypted Async Replication Reads & Writes, Encrypted Analytics clients Read only, Encrypted At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes PII Columns Encrypted w/ Cipher Key Tag PII Columns Ensure PII columns are encrypted Ensure no PII data in other columns
  • 42. 42© 2018 All rights reserved.
  • 43. 43© 2018 All rights reserved. Questions? Try it at docs.yugabyte.com/latest/quick-start