SlideShare a Scribd company logo
1
1
Privacy Protected Data Management
with Kafka and Elasticsearch
Naveen Nandan <naveen.nandan@confluent.io>
Solutions Engineering @ Confluent
Singapore
Apr 11, 2025
2
2
Regulated industries typically look for techniques such as
encryption, masking, tokenization to ensure customer PII and
other sensitive information are classified and protected when data
moves across multiple systems and LoBs. In this talk let's explore
how some of these methods can be applied early on at ingestion
to make it easier for teams to manage and govern datasets as it
flows through multiple systems across and outside of their
organisation.
3
3
Privacy Protected Data Management - What? Why? How?
4
4
Authentication
5
5
Authorization
6
6
Masking
Advantages:
- O(1) complexity to convert original value into masked value
- Can be invoked on the Producer side before sending data to Kafka, in-flight using Kafka Connect SMT or real-time stream processing the
stored data in Kafka topics via ksqlDB/Flink SQL
- Useful when some fields of payload are to be obfuscated before sharing with another party
Disadvantages:
- Information Loss - Masked values cannot be transformed back to original
- Not useful if aggregations, reports need to be built using masked fields
7
7
Tokenization
Advantages:
- Tokenization maps a specific set of values to another value
- Useful when consumers that are authorized need to decode the original value
- Useful if aggregations, reports need to be built using tokenized fields, even by users who cannot decode the original value
Disadvantages:
- Needs external lookup to maintain the mapping of values which can grow over time
8
8
Encryption
Advantages:
- Encryption uses keys to serialize and protect specific values in the payload
- Can be applied field-level or on the entire payload
- Additional protection wherein a consumer that is authorized and has the relevant keys will be able to decrypt and deserialize the messages
to read original value
- Higher level of data protection
Disadvantages:
- Encryption/Decryption may add some latency
9
9
Encryption at REST
10
10
Encryption in Motion
11
11
Server Side Encryption
Producer
Payload
(Unencrypted)
Message Received by Broker
(Unencrypted)
Message Stored on Broker Disk
(Encrypted)
Consumer
Message
(Unencrypted)
12
12
In-Transit Encryption
13
13
Client Side Encryption
Schema
Registry
Producer
Confluent
serializer
Confluent
deserializer
Consumer
6. Produce
message with
encrypted fields
9. Consume
message with
encrypted
fields
7. Get schema & rules
8. Return schema & rules
& Encrypted DEK
2. Get schem
a & rules
3. Return
schem
a & rules &
Encrypted DEK
KMS
4. Use master key(s) to
decrypt DEK
10. Use master key(s) to
decrypt DEK
1. Schema
& Policy defined
5. encrypt fields 11. decrypt fields
14
14
Client Side Encryption - Symmetric Encryption
15
15
Client Side Encryption - Asymmetric Encryption
16
16
Client Side Encryption - Envelope Encryption
17
17
"A@#%GW#@$H@#$@#SRSH129DG#wsdfe@"
Client Side Encryption - Full Payload
"fields": [
{
"default": null,
"name": "id",
"type": [
"null",
"int"
]
},
{
"default": null,
"name": "value",
"type": [
"null",
"int"
]
},
{
"confluent:tags": [
"PII"
],
"default": null,
"name": "name",
"type": [
"null",
"string"
]
}
]
18
18
{"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"}
Client Side Encryption - Field Level
"fields": [
{
"default": null,
"name": "id",
"type": [
"null",
"int"
]
},
{
"default": null,
"name": "value",
"type": [
"null",
"int"
]
},
{
"confluent:tags": [
"PII"
],
"default": null,
"name": "name",
"type": [
"null",
"string"
]
}
]
19
19
Client Side Encryption - Deterministic
{"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"}
{"ID": 6, "VALUE": 60, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string)
{"ID": 7, "VALUE": 70, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string)
20
20
Client Side Encryption - Non-Deterministic
{"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"}
{"ID": 6, "VALUE": 60, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string)
{"ID": 7, "VALUE": 70, "NAME": "$%FS%Ggg88j"} --> NON-DETERMINISTIC (same value 'abc' results in different encrypted string)
21
21
Payload/Field Level Encryption for Search
22
22
Client Side Field Level Encryption (CSFLE) in Action with Kafka and Elasticsearch
23
23
CSFLE for Anonymised Stats/Aggregates
ID NAME REGION VALUE
C1 %sdf@121 APAC 500
C2 !@#Dadf AMER 1250
C3 AG@32199 EU 2000
C4 G23g!@4 APAC 600
C1 %sdf@121 APAC 5050
REGION VALUE
APAC 6150
AMER 1250
EU 2000
REGION VALUE
C1 5550
C2 1250
C3 2000
C4 600
REGION VALUE
%sdf@121 5550
!@#Dadf 1250
AG@32199 2000
G23g!@4 600
Information Loss when using
Non-deterministic Encryption methods
ID NAME REGION VALUE
C1 %sdf@121 APAC 500
C2 !@#Dadf AMER 1250
C3 AG@32199 EU 2000
C4 G23g!@4 APAC 600
C1 54S35S6#12 APAC 5050
REGION VALUE
%sdf@121 500
!@#Dadf 1250
AG@32199 2000
G23g!@4 600
54S35S6#12 5050
Can aggregate on Encrypted field when
using Deterministic Encryption methods
24
24
CSFLE for Privacy Protected AI Applications
Desensitize Customer Specific Info from Context
I am 32 Male Single and I would like to find out about an
insurance policy where I pay monthly premium of not
more than $1000. I would like part of it to be invested so
that I build my savings using these returns.
{You have subscribed to the policy XYZ Vantage Achiever
Prime Series with a single premium of $50000.0 covering
$5000000.0, You have subscribed to the policy ABC
LinkGuard with a monthly premium of $1500.0 covering
$1000000.0}
I am 32 Male Single and I would like to find out about an
insurance policy where I pay monthly premium of not
more than $1000. I would like part of it to be invested so
that I build my savings using these returns.
{Others in your age group having a similar income profile
have subscribed to the policy XYZ Vantage Achiever
Prime Series with a single premium of $xyz covering
$abc, You have subscribed to the policy ABC LinkGuard
with a monthly premium of $yxz covering $bac}
25
25
Best Practices
26
26
Get started with Confluent, for free
New signups receive $400 to spend during their first 30 days.
27
cnfl.io/ask-the-community
Ask questions, share knowledge
and chat with your fellow
community members!
meetup.com/Singapore-Kafka-Meetup/
Join your local Kafka User Group!
Learn Apache Kafka®
with Confluent
28
28
THANK YOU

More Related Content

PPTX
Isaca how innovation can bridge the gap between privacy and regulations
PPTX
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
PDF
Key Concepts for Protecting the Privacy of IBM i Data
PDF
Ken Smith - Tokenization
PPTX
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
PPTX
Encryption in the enterprise
PDF
Security 101: Protecting Data with Encryption, Tokenization & Anonymization
PPT
Life After Compliance march 2010 v2
Isaca how innovation can bridge the gap between privacy and regulations
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
Key Concepts for Protecting the Privacy of IBM i Data
Ken Smith - Tokenization
Protect Sensitive Data on Your IBM i (Social Distance Your IBM i/AS400)
Encryption in the enterprise
Security 101: Protecting Data with Encryption, Tokenization & Anonymization
Life After Compliance march 2010 v2

Similar to Elastic Kafka Meetup Singapore_Privacy Protected Data Management.pdf (20)

PDF
MongoDB .local Bengaluru 2019: New Encryption Capabilities in MongoDB 4.2: A ...
PPT
Protecting Sensitive Data using Encryption and Key Management
PDF
Security 101: Protecting Data with Encryption, Tokenization & Anonymization
PDF
SafeNet DataSecure vs. Native SQL Server Encryption
PPTX
Build new age applications on azures intelligent data platform
PPTX
PCI DSS Conference in London UK 2011
PPTX
SafeNet Enterprise Key and Crypto Management
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
PDF
Securing Sensitive IBM i Data At-Rest and In-Motion
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
PPTX
Security of the database
PPTX
New york oracle users group 2013 spring general meeting ulf mattsson
PDF
Where data security and value of data meet in the cloud brighttalk webinar ...
PPTX
Encryption in Microsoft 365 - session for CollabDays UK - Bletchley Park
PPTX
Streamlining Data Encryption While Maintaining IBM i Availability
PPT
BigData and Privacy webinar at Brighttalk
DOCX
Data Security
PPTX
Fine Grain Access Control for Big Data: ORC Column Encryption
PDF
Enhanced Hybrid Blowfish and ECC Encryption to Secure Cloud Data Access and S...
PDF
Isaca journal - bridging the gap between access and security in big data...
MongoDB .local Bengaluru 2019: New Encryption Capabilities in MongoDB 4.2: A ...
Protecting Sensitive Data using Encryption and Key Management
Security 101: Protecting Data with Encryption, Tokenization & Anonymization
SafeNet DataSecure vs. Native SQL Server Encryption
Build new age applications on azures intelligent data platform
PCI DSS Conference in London UK 2011
SafeNet Enterprise Key and Crypto Management
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Securing Sensitive IBM i Data At-Rest and In-Motion
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Security of the database
New york oracle users group 2013 spring general meeting ulf mattsson
Where data security and value of data meet in the cloud brighttalk webinar ...
Encryption in Microsoft 365 - session for CollabDays UK - Bletchley Park
Streamlining Data Encryption While Maintaining IBM i Availability
BigData and Privacy webinar at Brighttalk
Data Security
Fine Grain Access Control for Big Data: ORC Column Encryption
Enhanced Hybrid Blowfish and ECC Encryption to Secure Cloud Data Access and S...
Isaca journal - bridging the gap between access and security in big data...
Ad

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Predictive modeling basics in data cleaning process
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Leprosy and NLEP programme community medicine
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Data_Analytics_and_PowerBI_Presentation.pptx
Database Infoormation System (DBIS).pptx
Predictive modeling basics in data cleaning process
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
Leprosy and NLEP programme community medicine
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Ad

Elastic Kafka Meetup Singapore_Privacy Protected Data Management.pdf

  • 1. 1 1 Privacy Protected Data Management with Kafka and Elasticsearch Naveen Nandan <naveen.nandan@confluent.io> Solutions Engineering @ Confluent Singapore Apr 11, 2025
  • 2. 2 2 Regulated industries typically look for techniques such as encryption, masking, tokenization to ensure customer PII and other sensitive information are classified and protected when data moves across multiple systems and LoBs. In this talk let's explore how some of these methods can be applied early on at ingestion to make it easier for teams to manage and govern datasets as it flows through multiple systems across and outside of their organisation.
  • 3. 3 3 Privacy Protected Data Management - What? Why? How?
  • 6. 6 6 Masking Advantages: - O(1) complexity to convert original value into masked value - Can be invoked on the Producer side before sending data to Kafka, in-flight using Kafka Connect SMT or real-time stream processing the stored data in Kafka topics via ksqlDB/Flink SQL - Useful when some fields of payload are to be obfuscated before sharing with another party Disadvantages: - Information Loss - Masked values cannot be transformed back to original - Not useful if aggregations, reports need to be built using masked fields
  • 7. 7 7 Tokenization Advantages: - Tokenization maps a specific set of values to another value - Useful when consumers that are authorized need to decode the original value - Useful if aggregations, reports need to be built using tokenized fields, even by users who cannot decode the original value Disadvantages: - Needs external lookup to maintain the mapping of values which can grow over time
  • 8. 8 8 Encryption Advantages: - Encryption uses keys to serialize and protect specific values in the payload - Can be applied field-level or on the entire payload - Additional protection wherein a consumer that is authorized and has the relevant keys will be able to decrypt and deserialize the messages to read original value - Higher level of data protection Disadvantages: - Encryption/Decryption may add some latency
  • 11. 11 11 Server Side Encryption Producer Payload (Unencrypted) Message Received by Broker (Unencrypted) Message Stored on Broker Disk (Encrypted) Consumer Message (Unencrypted)
  • 13. 13 13 Client Side Encryption Schema Registry Producer Confluent serializer Confluent deserializer Consumer 6. Produce message with encrypted fields 9. Consume message with encrypted fields 7. Get schema & rules 8. Return schema & rules & Encrypted DEK 2. Get schem a & rules 3. Return schem a & rules & Encrypted DEK KMS 4. Use master key(s) to decrypt DEK 10. Use master key(s) to decrypt DEK 1. Schema & Policy defined 5. encrypt fields 11. decrypt fields
  • 14. 14 14 Client Side Encryption - Symmetric Encryption
  • 15. 15 15 Client Side Encryption - Asymmetric Encryption
  • 16. 16 16 Client Side Encryption - Envelope Encryption
  • 17. 17 17 "A@#%GW#@$H@#$@#SRSH129DG#wsdfe@" Client Side Encryption - Full Payload "fields": [ { "default": null, "name": "id", "type": [ "null", "int" ] }, { "default": null, "name": "value", "type": [ "null", "int" ] }, { "confluent:tags": [ "PII" ], "default": null, "name": "name", "type": [ "null", "string" ] } ]
  • 18. 18 18 {"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"} Client Side Encryption - Field Level "fields": [ { "default": null, "name": "id", "type": [ "null", "int" ] }, { "default": null, "name": "value", "type": [ "null", "int" ] }, { "confluent:tags": [ "PII" ], "default": null, "name": "name", "type": [ "null", "string" ] } ]
  • 19. 19 19 Client Side Encryption - Deterministic {"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"} {"ID": 6, "VALUE": 60, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string) {"ID": 7, "VALUE": 70, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string)
  • 20. 20 20 Client Side Encryption - Non-Deterministic {"ID": 1, "VALUE": 10, "NAME": "#()2323ahuf"} {"ID": 6, "VALUE": 60, "NAME": "#()2323ahuf"} --> DETERMINISTIC (same value 'abc' results in same encrypted string) {"ID": 7, "VALUE": 70, "NAME": "$%FS%Ggg88j"} --> NON-DETERMINISTIC (same value 'abc' results in different encrypted string)
  • 22. 22 22 Client Side Field Level Encryption (CSFLE) in Action with Kafka and Elasticsearch
  • 23. 23 23 CSFLE for Anonymised Stats/Aggregates ID NAME REGION VALUE C1 %sdf@121 APAC 500 C2 !@#Dadf AMER 1250 C3 AG@32199 EU 2000 C4 G23g!@4 APAC 600 C1 %sdf@121 APAC 5050 REGION VALUE APAC 6150 AMER 1250 EU 2000 REGION VALUE C1 5550 C2 1250 C3 2000 C4 600 REGION VALUE %sdf@121 5550 !@#Dadf 1250 AG@32199 2000 G23g!@4 600 Information Loss when using Non-deterministic Encryption methods ID NAME REGION VALUE C1 %sdf@121 APAC 500 C2 !@#Dadf AMER 1250 C3 AG@32199 EU 2000 C4 G23g!@4 APAC 600 C1 54S35S6#12 APAC 5050 REGION VALUE %sdf@121 500 !@#Dadf 1250 AG@32199 2000 G23g!@4 600 54S35S6#12 5050 Can aggregate on Encrypted field when using Deterministic Encryption methods
  • 24. 24 24 CSFLE for Privacy Protected AI Applications Desensitize Customer Specific Info from Context I am 32 Male Single and I would like to find out about an insurance policy where I pay monthly premium of not more than $1000. I would like part of it to be invested so that I build my savings using these returns. {You have subscribed to the policy XYZ Vantage Achiever Prime Series with a single premium of $50000.0 covering $5000000.0, You have subscribed to the policy ABC LinkGuard with a monthly premium of $1500.0 covering $1000000.0} I am 32 Male Single and I would like to find out about an insurance policy where I pay monthly premium of not more than $1000. I would like part of it to be invested so that I build my savings using these returns. {Others in your age group having a similar income profile have subscribed to the policy XYZ Vantage Achiever Prime Series with a single premium of $xyz covering $abc, You have subscribed to the policy ABC LinkGuard with a monthly premium of $yxz covering $bac}
  • 26. 26 26 Get started with Confluent, for free New signups receive $400 to spend during their first 30 days.
  • 27. 27 cnfl.io/ask-the-community Ask questions, share knowledge and chat with your fellow community members! meetup.com/Singapore-Kafka-Meetup/ Join your local Kafka User Group! Learn Apache Kafka® with Confluent