SlideShare a Scribd company logo
+
Hadoop Security Landscape
Sujee Maniyam
Founder / Principal
http://elephantscale.com/
sujee@ElephantScale.com
+
Approach to Security in Hadoop
Until Recently…
+
But Security Picture Has Improved
Rapidly…
n  Lot of work going on in the eco system
n  Hadoop vendors (Cloudera / HortonWorks ..) have been
very actively working on security features
n  ‘the core’ features are in
n  Ease of use improving as well
+
What Does It Mean to be ‘Secure’?
n  1) Control who can get in?
n  2) Verify the person’s identity
n  3) safeguard communications with user
n  4) What is allowed for this user
n  5) Audit / log access
n  6) Secure NOSQL
n  7) And finally…
Protect data at rest
+
1) Who can get in
n  Control which machines can connect to NoSQL cluster
n  Don’t expose the cluster to public
n  Too many open ports
n  Too vulnerable
n  Solutions:
n  Run cluster behind firewall
n  Restrict which machines
can connect to cluster
n  Linux / Network level security
n  Outside the actual NoSQL
+
Trusted Environment
+
Apache Knox Gateway
+
2) User Authentication
n  How can we verify the user?
n  Username / password (gmail)
n  Or use a third person (referee)
n  Kerberos
Source : http://1.bp.blogspot.com/
Wolf : Knock
Knock…
Wolf : It’s me,
little piggy
Pig :who is it?
+
Kerberos : Quick Primer
n  Kerberos is a authentication protocol for networked
machines
n  Validates client to server and vice-versa
n  Strong crypto algorithms (AES, 3DES…)
+
Kerberos Protocol for Getting a
Beer in a Carnival / Fair J_
+
Kerberos Protocol Explained :
Getting Beer @ Fair / Party
n  Prove your age (identity) to wrist-band issuer
n  Ticket Granting Ticket
n  Get a wristband à qualifies you to get beer
n  Service Ticket
n  Go to bartender and ask for beer using your wrist-band
n  Service Request
n  Get Beer ! J
n  For technically correct explanation see :
http://www.roguelynn.com/words/explain-like-im-5-
kerberos/
+
3) Secure Client Communication
n  Guard client / server communication (‘on the wire’)
n  Done by using SASL (certificates)
n  Prevents snooping by third parties
+
4) What Is Allowed For This User?
n  In unsecured environment users can read / write to any table
n  à not very secure!
n  Control which data users can see..
+
5) Audit logging
n  See what is going on…
USER : tim, resource = hdfs:/data/logs , type = read,
time=….
USER : tim, resource = hive:click_logs , type = read, query =
“select *….”
+
6) Secure NOSQL
n  NoSQL solutions :
n  On Hadoop : Hbase, Accumulo
n  Other : Cassandra, ……
n  Access control
n  Table level access : can I read / can I write-update-insert ?
n  Within a table, column level access
n  Who can read column ‘social_security_number’ ?
+
Accumulo : Quick Intro
n  Developed by the National Security Agency (NSA) !
n  Google Big Table implementation
n  Nosql store on top of HDFS
n  Security is a first grade concept
HDFS
Accumulo
+
Accumulo Data Model
Family : info
Columns à name email Last 4 ssn Ssn Gmail
password
Visibility
tokens à
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
•  Every thing in HBase data model
•  Plus each row has a ‘Visibility Token’
+
Users Are Assigned ‘Visibility
Tokens’
User id Visibility levels
User 1 Level 1
User 2 Level 1 + Level 2
Edward Snowden Level 1 + Level 2 + Top
Clearance
+
Accumulo only returns cells visible
to user
family
Columns à name email Last 4 SSN Full SSN Gmail
password
person1 Joe joe@gma
il.com
6789 123-45-67
89
JoeSuper
Man!
Visibility
tokens à
Level 1 Level 1 Level 1 Level 2
OR
Top
clearance
Top
clearance
+
What Users Can See…
User Visibility Privilage Visible Cells
User 1 Level 1 Name
Email
Last 4 ssn
User 2 Level 1 +
Level 2
Name
Email
Last 4 SSN
Full SSN
Edward Snowden Level 1 +
Level 2 +
Top Clearance
Name
Email
Last 4 SSN
Full SSN
Gmail Password
+
6) Final Step : Encrypt Data At Rest
n  Eventually data ends up in disk
n  We need to protect the ‘raw data’ on disk
n  To prevent
n  Users going to disk directly
n  Theft of hardware
+
Transparent Encryption
+
OK, so where are we…
Project /
Solution
Purpose Status Vendor
kerberos Identity
management
Available neutral
Knox Secure gateway Hortonworks
CLoudera ?
Sentry Access control incubating Cloudera
Ranger
(similar to
Sentry)
Access control
+ Audit
In development
(HDP 2.2)
(originally XA
secure)
Hortonworks
Rhino Secure HDFS
data at rest
Available from
Hadoop 2.6
Neutral
(originally from
Intel)
Accumulo Secure nosql Available neutral
+
Future….
n  Really need a unified standard (no fragmentation)
n  Ease of use
n  Easy to setup policies
n  Integrate with outside systems
n  Easy audit tools
+
Thanks! & Questions?
Sujee Maniyam
Founder / Principal
http://elephantscale.com/
sujee@ElephantScale.com

More Related Content

PDF
Hadoop Security Now and Future
PDF
Launching your career in Big Data
PDF
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
PDF
Reference architecture for Internet of Things
PPTX
Hadoop security
PPTX
Risk Management for Data: Secured and Governed
PDF
Hadoop to spark_v2
PDF
Hadoop & Security - Past, Present, Future
Hadoop Security Now and Future
Launching your career in Big Data
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Reference architecture for Internet of Things
Hadoop security
Risk Management for Data: Secured and Governed
Hadoop to spark_v2
Hadoop & Security - Past, Present, Future

Viewers also liked (14)

PDF
Apache Spark
PDF
Apache Sentry for Hadoop security
PPTX
The Future of Hadoop Security - Hadoop Summit 2014
PDF
Hadoop Security: Overview
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
PPTX
Apache Hadoop Security - Ranger
PPTX
Hadoop and Data Access Security
PDF
Sentry - An Introduction
PPTX
Securing Hadoop with Apache Ranger
PDF
Big Data Security with Hadoop
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
PDF
Big Data Security and Governance
PDF
Introduction to Spark Internals
PPT
Hadoop Security Architecture
Apache Spark
Apache Sentry for Hadoop security
The Future of Hadoop Security - Hadoop Summit 2014
Hadoop Security: Overview
Hadoop Security Today & Tomorrow with Apache Knox
Apache Hadoop Security - Ranger
Hadoop and Data Access Security
Sentry - An Introduction
Securing Hadoop with Apache Ranger
Big Data Security with Hadoop
Security and Data Governance using Apache Ranger and Apache Atlas
Big Data Security and Governance
Introduction to Spark Internals
Hadoop Security Architecture
Ad

Similar to Hadoop security landscape (20)

PDF
Building secure NoSQL applications nosqlnow_conf_2014
PPTX
Big data security
PPTX
Open Source Security Tools for Big Data
PPTX
Open Source Security Tools for Big Data
PPTX
Securing Data in Hadoop at Uber
PDF
BigData Security - A Point of View
PPTX
Hadoop and Big Data Security
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PDF
IRJET- Secured Hadoop Environment
PPTX
Securing Your Apache Spark Applications
PPTX
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
PDF
Охота на уязвимости Hadoop
PDF
Doing hadoop securely
PPTX
Fine Grain Access Control for Big Data: ORC Column Encryption
PPTX
Securing Hadoop - MapR Technologies
PDF
2014 sept 4_hadoop_security
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
PDF
Nl HUG 2016 Feb Hadoop security from the trenches
PDF
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Building secure NoSQL applications nosqlnow_conf_2014
Big data security
Open Source Security Tools for Big Data
Open Source Security Tools for Big Data
Securing Data in Hadoop at Uber
BigData Security - A Point of View
Hadoop and Big Data Security
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
IRJET- Secured Hadoop Environment
Securing Your Apache Spark Applications
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Охота на уязвимости Hadoop
Doing hadoop securely
Fine Grain Access Control for Big Data: ORC Column Encryption
Securing Hadoop - MapR Technologies
2014 sept 4_hadoop_security
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
Nl HUG 2016 Feb Hadoop security from the trenches
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Ad

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
cuic standard and advanced reporting.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Per capita expenditure prediction using model stacking based on satellite ima...
cuic standard and advanced reporting.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx

Hadoop security landscape

  • 1. + Hadoop Security Landscape Sujee Maniyam Founder / Principal http://elephantscale.com/ sujee@ElephantScale.com
  • 2. + Approach to Security in Hadoop Until Recently…
  • 3. + But Security Picture Has Improved Rapidly… n  Lot of work going on in the eco system n  Hadoop vendors (Cloudera / HortonWorks ..) have been very actively working on security features n  ‘the core’ features are in n  Ease of use improving as well
  • 4. + What Does It Mean to be ‘Secure’? n  1) Control who can get in? n  2) Verify the person’s identity n  3) safeguard communications with user n  4) What is allowed for this user n  5) Audit / log access n  6) Secure NOSQL n  7) And finally… Protect data at rest
  • 5. + 1) Who can get in n  Control which machines can connect to NoSQL cluster n  Don’t expose the cluster to public n  Too many open ports n  Too vulnerable n  Solutions: n  Run cluster behind firewall n  Restrict which machines can connect to cluster n  Linux / Network level security n  Outside the actual NoSQL
  • 8. + 2) User Authentication n  How can we verify the user? n  Username / password (gmail) n  Or use a third person (referee) n  Kerberos Source : http://1.bp.blogspot.com/ Wolf : Knock Knock… Wolf : It’s me, little piggy Pig :who is it?
  • 9. + Kerberos : Quick Primer n  Kerberos is a authentication protocol for networked machines n  Validates client to server and vice-versa n  Strong crypto algorithms (AES, 3DES…)
  • 10. + Kerberos Protocol for Getting a Beer in a Carnival / Fair J_
  • 11. + Kerberos Protocol Explained : Getting Beer @ Fair / Party n  Prove your age (identity) to wrist-band issuer n  Ticket Granting Ticket n  Get a wristband à qualifies you to get beer n  Service Ticket n  Go to bartender and ask for beer using your wrist-band n  Service Request n  Get Beer ! J n  For technically correct explanation see : http://www.roguelynn.com/words/explain-like-im-5- kerberos/
  • 12. + 3) Secure Client Communication n  Guard client / server communication (‘on the wire’) n  Done by using SASL (certificates) n  Prevents snooping by third parties
  • 13. + 4) What Is Allowed For This User? n  In unsecured environment users can read / write to any table n  à not very secure! n  Control which data users can see..
  • 14. + 5) Audit logging n  See what is going on… USER : tim, resource = hdfs:/data/logs , type = read, time=…. USER : tim, resource = hive:click_logs , type = read, query = “select *….”
  • 15. + 6) Secure NOSQL n  NoSQL solutions : n  On Hadoop : Hbase, Accumulo n  Other : Cassandra, …… n  Access control n  Table level access : can I read / can I write-update-insert ? n  Within a table, column level access n  Who can read column ‘social_security_number’ ?
  • 16. + Accumulo : Quick Intro n  Developed by the National Security Agency (NSA) ! n  Google Big Table implementation n  Nosql store on top of HDFS n  Security is a first grade concept HDFS Accumulo
  • 17. + Accumulo Data Model Family : info Columns à name email Last 4 ssn Ssn Gmail password Visibility tokens à Level 1 Level 1 Level 1 Level 2 OR Top clearance Top clearance •  Every thing in HBase data model •  Plus each row has a ‘Visibility Token’
  • 18. + Users Are Assigned ‘Visibility Tokens’ User id Visibility levels User 1 Level 1 User 2 Level 1 + Level 2 Edward Snowden Level 1 + Level 2 + Top Clearance
  • 19. + Accumulo only returns cells visible to user family Columns à name email Last 4 SSN Full SSN Gmail password person1 Joe joe@gma il.com 6789 123-45-67 89 JoeSuper Man! Visibility tokens à Level 1 Level 1 Level 1 Level 2 OR Top clearance Top clearance
  • 20. + What Users Can See… User Visibility Privilage Visible Cells User 1 Level 1 Name Email Last 4 ssn User 2 Level 1 + Level 2 Name Email Last 4 SSN Full SSN Edward Snowden Level 1 + Level 2 + Top Clearance Name Email Last 4 SSN Full SSN Gmail Password
  • 21. + 6) Final Step : Encrypt Data At Rest n  Eventually data ends up in disk n  We need to protect the ‘raw data’ on disk n  To prevent n  Users going to disk directly n  Theft of hardware
  • 23. + OK, so where are we… Project / Solution Purpose Status Vendor kerberos Identity management Available neutral Knox Secure gateway Hortonworks CLoudera ? Sentry Access control incubating Cloudera Ranger (similar to Sentry) Access control + Audit In development (HDP 2.2) (originally XA secure) Hortonworks Rhino Secure HDFS data at rest Available from Hadoop 2.6 Neutral (originally from Intel) Accumulo Secure nosql Available neutral
  • 24. + Future…. n  Really need a unified standard (no fragmentation) n  Ease of use n  Easy to setup policies n  Integrate with outside systems n  Easy audit tools
  • 25. + Thanks! & Questions? Sujee Maniyam Founder / Principal http://elephantscale.com/ sujee@ElephantScale.com