SlideShare a Scribd company logo
Five Steps to Secure Big Data
Ulf Mattsson, CTO
Protegrity
ulf.mattsson AT protegrity.com
Ulf Mattsson, CTO Protegrity
20 years with IBM
• Research & Development & Global Services

Inventor
• Encryption, Tokenization & Intrusion Prevention

Involvement
• PCI Security Standards Council (PCI SSC)
• American National Standards Institute (ANSI) X9
• Encryption & Tokenization

• International Federation for Information Processing
• IFIP WG 11.3 Data and Application Security

• ISACA New York Metro chapter

2
Big Data
What is Big Data?
Hadoop
• Designed to handle the emerging “4 V’s”
• Massively Parallel Processing (MPP)
• Elastic scale
• Usually Read-Only
• Allows for data insights on massive, heterogeneous
data sets
• Includes an ecosystem of components:
Hive

Pig

Other

Application Layers
MapReduce
HDFS
Storage Layers
Physical Storage

4
Has Your Organization Already Invested in Big Data?

Source: Gartner
5
http://www.ey.com/Publication/vwLUAssets/EY_-_2013_Global_Information_Security_Survey/$FILE/EY-GISS-Under-cyber-attack.pdf

6
Holes in Big Data…

Source: Gartner
7
Many Ways to Hack Big Data

BI Reporting

RDBMS

Hackers

Pig (Data Flow)

Hive (SQL)

Sqoop

Unvetted
Applications
Or
Ad Hoc
Processes

MapReduce
(Job Scheduling/Execution System)
Hbase (Column DB)
HDFS
(Hadoop Distributed File System)

Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase
8

Avro (Serialization)

Zookeeper (Coordination)

ETL Tools

Privileged
Users
Current Data Security for Big data
Authentication
• Who am I and how do I prove it?
•

Ensure the identity of the users, services and hosts that make up and
use the system is authoritatively known

Authorization
• What am I allowed to see and do?
•

Ensure services and data are accessed only by entitled identities

Data Protection
• How is my Data being Protected?
•

Ensure data cannot be usefully stolen or undetectably tampered with

Auditing
• What have I attempted to do or done?
•

Ensure a permanent record of who did what, when
Data
Security

10

Taking Data Security
to the Next Level
Achieving Best Data Security for Big Data
Massively Scalable Data Security
Maximum Transparency
Maximum Performance
Easy to Use
Heterogeneous System Compatibility
Enterprise Ready
Many Layers of Defense
Corporate Enterprise

Kerberos Authentication
Encrypted Communications

Big Data

Corporate Firewall

Authorization through ACLs

Fine Grained
Big Data Cluster

8

Data Security Policy

Protegrity

Coarse Grained
Protecting the Big Data Ecosystem
BI Applications

BI Applications are authorized to access
sensitive data through the policy.

Data Access Framework
Pig

Hive

Data Processing Framework
(MapReduce)
Data Storage Framework
(HDFS)

User Defined Functions (UDFs) enable
Field Level data protection with Policy
based access controls with Monitoring.
Java API enables Field Level data
protection with Policy based access
controls with Monitoring.
File level data protection with Policy
based access controls for existing and
new data.
Volume or File Encryption with Policy
based access controls at the OS file
system level.
Coarse
Grained

14

Policy Based
File and Disk
Encryption
File Based Encryption Example
Files with personal identifiable information
Stored in Hadoop cluster
Root user logged-in to one of the nodes
Search for sensitive information on disk
Fine
Grained

16

Policy Based
Field Level Data
Protection
Fine Grained Protection: Field Protection

Production Systems

Encryption
• Reversible
• Policy Control (Authorized / Unauthorized Access)
• Lacks Integration Transparency
• Complex Key Management
• Example !@#$%a^.,mhu7///&*B()_+!@
Tokenization / Pseudonymization
• Reversible
• Policy Control (Authorized / Unauthorized Access)
• Integrates Transparently
• No Complex Key Management
• Business Intelligence Credit Card: 0389 3778 3652 0038

Non-Production Systems

17

Masking
• Not reversible
• No Policy, Everyone Can Access the Data
• Integrates Transparently
• No Complex Key Management
• Example 0389 3778 3652 0038
Field Level Protection Example
Files with personal identifiable information
Loaded in to a Hive table
Select data from that table
Root user logged-in to one of the nodes
Search for sensitive information on disk
Security
Policy

19

Take Control Of Data
Security
Policy Based Access Control

Combination of what
data needs to be
protected and who has
access to that data is
the key to creating a
meaningful policy

20

What

Who

What is the sensitive data that needs to be
protected. Data Element.

Who should have access to sensitive data and
who should not. Security access control. Roles &
Members.
Protegrity Data Security Policy

What

What is the sensitive data that needs to be protected. Data
Element.

How

How you want to protect and present sensitive data. There are
several methods for protecting sensitive data. Encryption,
tokenization, monitoring, etc.

Who

Who should have access to sensitive data and who
should not. Security access control. Roles &
Members.

When

When should sensitive data access be granted to those
who have access. Day of week, time of day.

Where

Where is the sensitive data stored? This will be
where the policy is enforced. At the protector.

Audit

Audit authorized or un-authorized access to sensitive
data. Optional audit of protect/unprotect.
Policy Based Filed Protection Example
Files with personal identifiable information
Loaded in to a Hive table
Create a view on that table
Select data as authorized user
Select data as privileged user
Enterprise Strength

Enterprise

23

Protection platforms must
protect sensitive data end to
end – at rest, in transit and on
any technology platform
End to End Data Security Across the Enterprise

Enterprise Heterogeneous Coverage
• File Protectors: AIX, HPUX, Linux, Solaris, Windows
• Database Protectors : DB2, SQL Server, Oracle, Teradata, Informix, Netezza, Greenplum
• Big Data Protectors: BigInsights, Cloudera, Greenplum, mapR, Aster, Apache Hadoop, Hortonworks
• Big Iron Platform: zSeries, HP Non-Stop

24
Best Practices for Protecting Big Data
Start Early
Fine Grained protection
Select the optimal protection for the future
Enterprise coverage
Protection against insider threat
Transparent protection to the analysis process
Policy based protection and audit

25
Five Point Data Protection
Methodology

1. Classify

26

2. Discovery

3. Protect

4. Enforce

5. Monitor
Classify
Determine what data is
sensitive to your organization.

27
Select US Regulations for Security and Privacy
Financial Services
Healthcare and Pharmaceuticals
Infrastructure and Energy
Federal Government

28
1. Classify: Examples of Sensitive Data
Sensitive Information
Credit Card Numbers

PCI DSS

Names

HIPAA, State Privacy Laws

Address

HIPAA, State Privacy Laws

Dates

HIPAA, State Privacy Laws

Phone Numbers

HIPAA, State Privacy Laws

Personal ID Numbers

HIPAA, State Privacy Laws

Personally owned property numbers

HIPAA, State Privacy Laws

Personal Characteristics

HIPAA, State Privacy Laws

Asset Information

29

Compliance Regulation / Laws

HIPAA, State Privacy Laws
Discovery
Discover where the sensitive
data is located and how it flows

30
2. Discovery in a large enterprise with many systems
System

System

System

System

System

System

System

System

System

System

System

System

Corporate Firewall
System
031
2. Discovery: Determine the context to the Business
System

Retail

System

System

Employees
System

System

Corporate IP
System

Healthcare

Corporate Firewall
System
032

032
2. Discover: Context to the Business and to Security
Collecting
transactions

Stores &
Ecommerce

Databases

Data Protection
Solution
Requirements

File Server

Hadoop

Applications

File Server
containing IP

Corporate Firewall
Research
Databases
033
Protect
Protect the sensitive data at
rest and in transit.

34
Balancing Security and Data Insight
Tug of war between security and data insight
Big Data is designed for access
Privacy regulations require de-identification
Granular data-level protection
Traditional security don’t allow for seamless
data use

35
Protection Beyond Kerberos

ETL Tools

BI Reporting

RDBMS

Pig (Data Flow)

Hive (SQL)

Sqoop

MapReduce
(Job Scheduling/Execution System)

API enabled Field level data protection

API enabled Field level data protection

Hbase (Column DB)

HDFS

Field level data protection for existing
and new data.

(Hadoop Distributed File System)
Volume Encryption

36
Volume Encryption

Entire file is in the
clear when analyzed

MapReduce

HDFS

Protected with
Volume Encryption

37
File Encryption – Authorized User

Entire file is in the
clear when analyzed

MapReduce

HDFS

Protected with
File Encryption

38
File Encryption – Non Authorized User

Entire file is in
unreadable when
analyzed

MapReduce

HDFS

Protected with
File Encryption

39
Volume Encryption + Gateway Field Protection

Granular Field
Level Protection

MapReduce

HDFS

Data Protection File
Gateway

40

Kerberos
Access
Control

Protected with
Volume Encryption
Volume Encryption + Internal MapReduce Field Protection

Analytics
Granular Field
Level Protection

MapReduce
Hadoop
Staging

HDFS

MapReduce

41

Kerberos
Access Control

Protected with
Volume Encryption
Enforce
Policies are used to enforce
rules about how sensitive data
should be treated in the
enterprise.

42
A Data Security Policy
What

What is the sensitive data that needs to be protected. Data
Element.

How

How you want to protect and present sensitive data. There are
several methods for protecting sensitive data. Encryption,
tokenization, monitoring, etc.

Who

Who should have access to sensitive data and who should not.
Security access control. Roles & Members.

When

Where

Where is the sensitive data stored? This will be where the policy
is enforced. At the protector.

Audit

43

When should sensitive data access be granted to those who
have access. Day of week, time of day.

Audit authorized or un-authorized access to sensitive data.
Optional audit of protect/unprotect.
Volume Encryption + Field Protection + Policy Enforcement

MapReduce

HDFS
Protected with
Volume Encryption

Data Protection Policy

44
Volume Encryption + Field Protection + Policy Enforcement

MapReduce

HDFS
Protected with
Volume Encryption

Data Protection Policy

45
4. Authorized User Example
Presentation to requestor
Name: Joe Smith
Address: 100 Main Street, Pleasantville, CA

Data Scientist,
Business Analyst

Selected data displayed (least privilege)

Response

Request

Policy
Enforcement

Authorized

Does the requestor have the authority to
access the protected data?

Protection at rest
Name: csu wusoj
Address: 476 srta coetse, cysieondusbak, CA

46
4. Un-Authorized User Example
Presentation to requestor
Name: csu wusoj
Address: 476 srta coetse, cysieondusbak, CA

Privileged Used,
DBA, System
Administrators,
Bad Guy

Response

Request

Policy
Enforcement

Not
Authorized

Does the requestor have the authority to
access the protected data?

Protection at rest
Name: csu wusoj
Address: 476 srta coetse, cysieondusbak, CA

47
Monitor
A critically important part of a
security solution is the ongoing
monitoring of any activity on
sensitive data.

48
Best Practices for Protecting Big Data
Start early
Granular protection
Select the optimal protection
Enterprise coverage
Protection against insider threat
Protect highly sensitive data in a way that is mostly
transparent to the analysis process
Policy based protection
Record data access events

49
How Protegrity Can Help

1
2

We can help you Discover where the sensitive data sits

3

We can help you Protect your sensitive data in a flexible way

4

We can help you Enforce policies that will enable business
functions and preventing sensitive data from the wrong hands.

5
50

We can help you Classify the sensitive data

We can help you Monitor sensitive data to gain insights on
abnormal behaviors.
Protegrity Summary
Proven enterprise data security
software and innovation leader
•

Sole focus on the protection of
data

•

Patented Technology,
Continuing to Drive Innovation

Cross-industry applicability
•
•

Financial Services, Insurance,
Banking

•

Healthcare

•

Telecommunications, Media and
Entertainment

•

51

Retail, Hospitality, Travel and
Transportation

Manufacturing and Government
Please contact us for more information
Ulf.Mattsson@protegrity.com
Info@protegrity.com

More Related Content

PDF
Cloud data governance, risk management and compliance ny metro joint cyber...
PPTX
Securing data today and in the future - Oracle NYC
PDF
GDPR/CCPA Compliance and Data Governance in Hadoop
PPT
BigData and Privacy webinar at Brighttalk
PDF
Bloombase storage-protection-entrust-hsm-sb
PDF
iaetsd Using encryption to increase the security of network storage
PDF
Bio-Cryptography Based Secured Data Replication Management in Cloud Storage
PDF
Current trends in data security nursing research ppt
Cloud data governance, risk management and compliance ny metro joint cyber...
Securing data today and in the future - Oracle NYC
GDPR/CCPA Compliance and Data Governance in Hadoop
BigData and Privacy webinar at Brighttalk
Bloombase storage-protection-entrust-hsm-sb
iaetsd Using encryption to increase the security of network storage
Bio-Cryptography Based Secured Data Replication Management in Cloud Storage
Current trends in data security nursing research ppt

What's hot (20)

PDF
Building the Governance Ready Enterprise for GDPR Compliance December 2017
PDF
Multi-part Dynamic Key Generation For Secure Data Encryption
PDF
Building the Governance Ready Enterprise for GDPR Compliance
PPTX
Webinar: Practical Technology Playbook for the GDPR
PDF
FinalCode-At-A-Glance-Webcopy-Optimized
PPTX
New technologies for data protection
PDF
TP564_DriveTrust_Oct06
PDF
NEMZOW PATENT PORTFOLIO
PDF
Extending Information Security to Non-Production Environments
PPTX
Encryption 2021
PDF
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
PPT
IBM Share Conference 2010, Boston, Ulf Mattsson
PPTX
Information Security in Big Data : Privacy and Data Mining
DOC
Network security
PDF
Practical advice for cloud data protection ulf mattsson - bright talk webin...
PPT
Privacy Preserving DB Systems
PDF
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
PDF
SafeNet DataSecure vs. Native SQL Server Encryption
PDF
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
PDF
Trusted information protection
Building the Governance Ready Enterprise for GDPR Compliance December 2017
Multi-part Dynamic Key Generation For Secure Data Encryption
Building the Governance Ready Enterprise for GDPR Compliance
Webinar: Practical Technology Playbook for the GDPR
FinalCode-At-A-Glance-Webcopy-Optimized
New technologies for data protection
TP564_DriveTrust_Oct06
NEMZOW PATENT PORTFOLIO
Extending Information Security to Non-Production Environments
Encryption 2021
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
IBM Share Conference 2010, Boston, Ulf Mattsson
Information Security in Big Data : Privacy and Data Mining
Network security
Practical advice for cloud data protection ulf mattsson - bright talk webin...
Privacy Preserving DB Systems
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
SafeNet DataSecure vs. Native SQL Server Encryption
BigDataRevealed SecureSequesterEncrypt - iot easy as 1-2-3 - catalog-metadata...
Trusted information protection
Ad

Viewers also liked (7)

PPTX
Why Hacking into Your Company is so Easy
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PPTX
Practical advice for cloud data protection ulf mattsson - jun 2014
PPT
How to evaluate data protection technologies - Mastercard conference
PDF
20160628 Tania Martin Data Protection
PDF
Choosing the Right Data Security Solution
PDF
Verizon 2014 data breach investigation report and the target breach
Why Hacking into Your Company is so Easy
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Practical advice for cloud data protection ulf mattsson - jun 2014
How to evaluate data protection technologies - Mastercard conference
20160628 Tania Martin Data Protection
Choosing the Right Data Security Solution
Verizon 2014 data breach investigation report and the target breach
Ad

Similar to Five steps to secure big data (20)

PPTX
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
PDF
BigData Security - A Point of View
PPTX
Understanding Database Encryption & Protecting Against the Insider Threat wit...
PDF
Security for Big Data
PDF
The past, present, and future of big data security
PDF
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
PDF
Hitachi datasheet-hcp-and-bloombase-storesafe
PDF
Solving the Really Big Tech Problems with IoT
PPTX
Fighting cyber fraud with hadoop
PPTX
Data lake protection ft 3119 -ver1.0
PDF
Storage Made Easy - File Fabric Use Cases
PDF
IRJET- Secured Hadoop Environment
PPTX
Hadoop Security Features That make your risk officer happy
PPTX
Hadoop Security Features that make your risk officer happy
PDF
Isaca journal - bridging the gap between access and security in big data...
PPTX
big data and hadoop
PDF
Voltage Security, Protecting Sensitive Data in Hadoop
PPTX
Security Threats to Hadoop: Data Leakage Attacks and Investigation
PPTX
Analytics with unified file and object
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
BigData Security - A Point of View
Understanding Database Encryption & Protecting Against the Insider Threat wit...
Security for Big Data
The past, present, and future of big data security
Ds 354-a hitachi-datasheet-hcp-and-bloombase-storesafe
Hitachi datasheet-hcp-and-bloombase-storesafe
Solving the Really Big Tech Problems with IoT
Fighting cyber fraud with hadoop
Data lake protection ft 3119 -ver1.0
Storage Made Easy - File Fabric Use Cases
IRJET- Secured Hadoop Environment
Hadoop Security Features That make your risk officer happy
Hadoop Security Features that make your risk officer happy
Isaca journal - bridging the gap between access and security in big data...
big data and hadoop
Voltage Security, Protecting Sensitive Data in Hadoop
Security Threats to Hadoop: Data Leakage Attacks and Investigation
Analytics with unified file and object
Automatic Detection, Classification and Authorization of Sensitive Personal D...

More from Ulf Mattsson (20)

PPTX
Jun 29 new privacy technologies for unicode and international data standards ...
PPTX
Jun 15 privacy in the cloud at financial institutions at the object managemen...
PPTX
PPTX
May 6 evolving international privacy regulations and cross border data tran...
PPTX
Qubit conference-new-york-2021
PDF
Secure analytics and machine learning in cloud use cases
PPTX
Evolving international privacy regulations and cross border data transfer - g...
PDF
Data encryption and tokenization for international unicode
PPTX
The future of data security and blockchain
PPTX
GDPR and evolving international privacy regulations
PPTX
Privacy preserving computing and secure multi-party computation ISACA Atlanta
PPTX
Safeguarding customer and financial data in analytics and machine learning
PPTX
Protecting data privacy in analytics and machine learning ISACA London UK
PPTX
New opportunities and business risks with evolving privacy regulations
PPTX
What is tokenization in blockchain - BCS London
PPTX
Protecting data privacy in analytics and machine learning - ISACA
PPTX
What is tokenization in blockchain?
PPTX
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
PPTX
Unlock the potential of data security 2020
PPTX
What is tokenization in blockchain?
Jun 29 new privacy technologies for unicode and international data standards ...
Jun 15 privacy in the cloud at financial institutions at the object managemen...
May 6 evolving international privacy regulations and cross border data tran...
Qubit conference-new-york-2021
Secure analytics and machine learning in cloud use cases
Evolving international privacy regulations and cross border data transfer - g...
Data encryption and tokenization for international unicode
The future of data security and blockchain
GDPR and evolving international privacy regulations
Privacy preserving computing and secure multi-party computation ISACA Atlanta
Safeguarding customer and financial data in analytics and machine learning
Protecting data privacy in analytics and machine learning ISACA London UK
New opportunities and business risks with evolving privacy regulations
What is tokenization in blockchain - BCS London
Protecting data privacy in analytics and machine learning - ISACA
What is tokenization in blockchain?
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
Unlock the potential of data security 2020
What is tokenization in blockchain?

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Modernizing your data center with Dell and AMD
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Approach and Philosophy of On baking technology
Modernizing your data center with Dell and AMD
Unlocking AI with Model Context Protocol (MCP)
Advanced Soft Computing BINUS July 2025.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Five steps to secure big data

  • 1. Five Steps to Secure Big Data Ulf Mattsson, CTO Protegrity ulf.mattsson AT protegrity.com
  • 2. Ulf Mattsson, CTO Protegrity 20 years with IBM • Research & Development & Global Services Inventor • Encryption, Tokenization & Intrusion Prevention Involvement • PCI Security Standards Council (PCI SSC) • American National Standards Institute (ANSI) X9 • Encryption & Tokenization • International Federation for Information Processing • IFIP WG 11.3 Data and Application Security • ISACA New York Metro chapter 2
  • 4. What is Big Data? Hadoop • Designed to handle the emerging “4 V’s” • Massively Parallel Processing (MPP) • Elastic scale • Usually Read-Only • Allows for data insights on massive, heterogeneous data sets • Includes an ecosystem of components: Hive Pig Other Application Layers MapReduce HDFS Storage Layers Physical Storage 4
  • 5. Has Your Organization Already Invested in Big Data? Source: Gartner 5
  • 7. Holes in Big Data… Source: Gartner 7
  • 8. Many Ways to Hack Big Data BI Reporting RDBMS Hackers Pig (Data Flow) Hive (SQL) Sqoop Unvetted Applications Or Ad Hoc Processes MapReduce (Job Scheduling/Execution System) Hbase (Column DB) HDFS (Hadoop Distributed File System) Source: http://nosql.mypopescu.com/post/1473423255/apache-hadoop-and-hbase 8 Avro (Serialization) Zookeeper (Coordination) ETL Tools Privileged Users
  • 9. Current Data Security for Big data Authentication • Who am I and how do I prove it? • Ensure the identity of the users, services and hosts that make up and use the system is authoritatively known Authorization • What am I allowed to see and do? • Ensure services and data are accessed only by entitled identities Data Protection • How is my Data being Protected? • Ensure data cannot be usefully stolen or undetectably tampered with Auditing • What have I attempted to do or done? • Ensure a permanent record of who did what, when
  • 11. Achieving Best Data Security for Big Data Massively Scalable Data Security Maximum Transparency Maximum Performance Easy to Use Heterogeneous System Compatibility Enterprise Ready
  • 12. Many Layers of Defense Corporate Enterprise Kerberos Authentication Encrypted Communications Big Data Corporate Firewall Authorization through ACLs Fine Grained Big Data Cluster 8 Data Security Policy Protegrity Coarse Grained
  • 13. Protecting the Big Data Ecosystem BI Applications BI Applications are authorized to access sensitive data through the policy. Data Access Framework Pig Hive Data Processing Framework (MapReduce) Data Storage Framework (HDFS) User Defined Functions (UDFs) enable Field Level data protection with Policy based access controls with Monitoring. Java API enables Field Level data protection with Policy based access controls with Monitoring. File level data protection with Policy based access controls for existing and new data. Volume or File Encryption with Policy based access controls at the OS file system level.
  • 15. File Based Encryption Example Files with personal identifiable information Stored in Hadoop cluster Root user logged-in to one of the nodes Search for sensitive information on disk
  • 17. Fine Grained Protection: Field Protection Production Systems Encryption • Reversible • Policy Control (Authorized / Unauthorized Access) • Lacks Integration Transparency • Complex Key Management • Example !@#$%a^.,mhu7///&*B()_+!@ Tokenization / Pseudonymization • Reversible • Policy Control (Authorized / Unauthorized Access) • Integrates Transparently • No Complex Key Management • Business Intelligence Credit Card: 0389 3778 3652 0038 Non-Production Systems 17 Masking • Not reversible • No Policy, Everyone Can Access the Data • Integrates Transparently • No Complex Key Management • Example 0389 3778 3652 0038
  • 18. Field Level Protection Example Files with personal identifiable information Loaded in to a Hive table Select data from that table Root user logged-in to one of the nodes Search for sensitive information on disk
  • 20. Policy Based Access Control Combination of what data needs to be protected and who has access to that data is the key to creating a meaningful policy 20 What Who What is the sensitive data that needs to be protected. Data Element. Who should have access to sensitive data and who should not. Security access control. Roles & Members.
  • 21. Protegrity Data Security Policy What What is the sensitive data that needs to be protected. Data Element. How How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc. Who Who should have access to sensitive data and who should not. Security access control. Roles & Members. When When should sensitive data access be granted to those who have access. Day of week, time of day. Where Where is the sensitive data stored? This will be where the policy is enforced. At the protector. Audit Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
  • 22. Policy Based Filed Protection Example Files with personal identifiable information Loaded in to a Hive table Create a view on that table Select data as authorized user Select data as privileged user
  • 23. Enterprise Strength Enterprise 23 Protection platforms must protect sensitive data end to end – at rest, in transit and on any technology platform
  • 24. End to End Data Security Across the Enterprise Enterprise Heterogeneous Coverage • File Protectors: AIX, HPUX, Linux, Solaris, Windows • Database Protectors : DB2, SQL Server, Oracle, Teradata, Informix, Netezza, Greenplum • Big Data Protectors: BigInsights, Cloudera, Greenplum, mapR, Aster, Apache Hadoop, Hortonworks • Big Iron Platform: zSeries, HP Non-Stop 24
  • 25. Best Practices for Protecting Big Data Start Early Fine Grained protection Select the optimal protection for the future Enterprise coverage Protection against insider threat Transparent protection to the analysis process Policy based protection and audit 25
  • 26. Five Point Data Protection Methodology 1. Classify 26 2. Discovery 3. Protect 4. Enforce 5. Monitor
  • 27. Classify Determine what data is sensitive to your organization. 27
  • 28. Select US Regulations for Security and Privacy Financial Services Healthcare and Pharmaceuticals Infrastructure and Energy Federal Government 28
  • 29. 1. Classify: Examples of Sensitive Data Sensitive Information Credit Card Numbers PCI DSS Names HIPAA, State Privacy Laws Address HIPAA, State Privacy Laws Dates HIPAA, State Privacy Laws Phone Numbers HIPAA, State Privacy Laws Personal ID Numbers HIPAA, State Privacy Laws Personally owned property numbers HIPAA, State Privacy Laws Personal Characteristics HIPAA, State Privacy Laws Asset Information 29 Compliance Regulation / Laws HIPAA, State Privacy Laws
  • 30. Discovery Discover where the sensitive data is located and how it flows 30
  • 31. 2. Discovery in a large enterprise with many systems System System System System System System System System System System System System Corporate Firewall System 031
  • 32. 2. Discovery: Determine the context to the Business System Retail System System Employees System System Corporate IP System Healthcare Corporate Firewall System 032 032
  • 33. 2. Discover: Context to the Business and to Security Collecting transactions Stores & Ecommerce Databases Data Protection Solution Requirements File Server Hadoop Applications File Server containing IP Corporate Firewall Research Databases 033
  • 34. Protect Protect the sensitive data at rest and in transit. 34
  • 35. Balancing Security and Data Insight Tug of war between security and data insight Big Data is designed for access Privacy regulations require de-identification Granular data-level protection Traditional security don’t allow for seamless data use 35
  • 36. Protection Beyond Kerberos ETL Tools BI Reporting RDBMS Pig (Data Flow) Hive (SQL) Sqoop MapReduce (Job Scheduling/Execution System) API enabled Field level data protection API enabled Field level data protection Hbase (Column DB) HDFS Field level data protection for existing and new data. (Hadoop Distributed File System) Volume Encryption 36
  • 37. Volume Encryption Entire file is in the clear when analyzed MapReduce HDFS Protected with Volume Encryption 37
  • 38. File Encryption – Authorized User Entire file is in the clear when analyzed MapReduce HDFS Protected with File Encryption 38
  • 39. File Encryption – Non Authorized User Entire file is in unreadable when analyzed MapReduce HDFS Protected with File Encryption 39
  • 40. Volume Encryption + Gateway Field Protection Granular Field Level Protection MapReduce HDFS Data Protection File Gateway 40 Kerberos Access Control Protected with Volume Encryption
  • 41. Volume Encryption + Internal MapReduce Field Protection Analytics Granular Field Level Protection MapReduce Hadoop Staging HDFS MapReduce 41 Kerberos Access Control Protected with Volume Encryption
  • 42. Enforce Policies are used to enforce rules about how sensitive data should be treated in the enterprise. 42
  • 43. A Data Security Policy What What is the sensitive data that needs to be protected. Data Element. How How you want to protect and present sensitive data. There are several methods for protecting sensitive data. Encryption, tokenization, monitoring, etc. Who Who should have access to sensitive data and who should not. Security access control. Roles & Members. When Where Where is the sensitive data stored? This will be where the policy is enforced. At the protector. Audit 43 When should sensitive data access be granted to those who have access. Day of week, time of day. Audit authorized or un-authorized access to sensitive data. Optional audit of protect/unprotect.
  • 44. Volume Encryption + Field Protection + Policy Enforcement MapReduce HDFS Protected with Volume Encryption Data Protection Policy 44
  • 45. Volume Encryption + Field Protection + Policy Enforcement MapReduce HDFS Protected with Volume Encryption Data Protection Policy 45
  • 46. 4. Authorized User Example Presentation to requestor Name: Joe Smith Address: 100 Main Street, Pleasantville, CA Data Scientist, Business Analyst Selected data displayed (least privilege) Response Request Policy Enforcement Authorized Does the requestor have the authority to access the protected data? Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA 46
  • 47. 4. Un-Authorized User Example Presentation to requestor Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA Privileged Used, DBA, System Administrators, Bad Guy Response Request Policy Enforcement Not Authorized Does the requestor have the authority to access the protected data? Protection at rest Name: csu wusoj Address: 476 srta coetse, cysieondusbak, CA 47
  • 48. Monitor A critically important part of a security solution is the ongoing monitoring of any activity on sensitive data. 48
  • 49. Best Practices for Protecting Big Data Start early Granular protection Select the optimal protection Enterprise coverage Protection against insider threat Protect highly sensitive data in a way that is mostly transparent to the analysis process Policy based protection Record data access events 49
  • 50. How Protegrity Can Help 1 2 We can help you Discover where the sensitive data sits 3 We can help you Protect your sensitive data in a flexible way 4 We can help you Enforce policies that will enable business functions and preventing sensitive data from the wrong hands. 5 50 We can help you Classify the sensitive data We can help you Monitor sensitive data to gain insights on abnormal behaviors.
  • 51. Protegrity Summary Proven enterprise data security software and innovation leader • Sole focus on the protection of data • Patented Technology, Continuing to Drive Innovation Cross-industry applicability • • Financial Services, Insurance, Banking • Healthcare • Telecommunications, Media and Entertainment • 51 Retail, Hospitality, Travel and Transportation Manufacturing and Government
  • 52. Please contact us for more information Ulf.Mattsson@protegrity.com Info@protegrity.com