SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Built-in Security For The Cloud
DataWorks Summit Sydney
September 2017
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenters
Jeff Sposetti
Senior Director of Product Management, Cloud
Hortonworks Data Cloud, Cloudbreak
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
 Introduction
 Quick Demo
 Security Building Blocks: Apache Ranger and Knox
 Bringing It Together: Cloud and Data Lake Security
 Longer Demo
 Wrap Up
 Q & A
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Ephemeral Workloads + Cloud Storage
 Cloud is driving more ephemeral data processing use cases
 Cloud requires a robust integration with cloud storage
CLOUD STORAGE
S3
ADLS
WASB
WORKLOAD CLUSTERS
Durable Ephemeral
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Hortonworks Data Cloud for AWS
 Focuses on business agility, rather than
infinite configurability and cluster
management
 Addresses prescriptive, ephemeral use
cases around Apache Spark + Apache Hive
 Pre-tuned and configured for use with
Amazon S3
Learn more:
http://hortonworks.com/products/cloud/aws/
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Quick demo…
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Security Building Blocks:
Apache Ranger and Knox
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Protecting the Elephant in the Castle…..
Kerberos,
Wire Encryption
HDFS Encryption
Apache Ranger
Network Segmentation,
Firewalls
LDAP/AD
Apache Knox
9 © Hortonworks Inc. 2011 – 2017. All Rights
Reserved
Apache Knox Proxying Services
★ Provide access to Hadoop via proxying of
HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented
dispatching for Kerberos + doAs
(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger
Comprehensive and Extensible Security Model
• Centralized platform to define, administer and manage
security policies across Hadoop components (HDFS, Hive,
HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas)
• Extensible Architecture with ability to add custom policy
conditions, user context enrichers
Fine-Grained Authorization
• For data access control for Database, Table, Column, LDAP
Groups & Specific Users
Centralized Auditing
• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
Advanced Security
• Dynamic Security Policies: Prohibition, Time, Location and
Tag (Atlas)
• Dynamic Column Masking & Row Filtering
OPERATIONS SECURITY
GOVERNANCE
STORAGE
STORAGE
Machine
Learning
Batch
StreamingInteractive
Search
SECURITY
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Bringing It Together:
Cloud and Data Lake Services
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
CLOUD
DATA LAKE
SECURITY
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Components for Enterprise Security
SCHEMA POLICY AUDIT DIRECTORY
WHAT
Provides Hive schema (tables,
views, etc).
WHY
If you have 2+ workloads
accessing the same data, need
to share schema across those
workloads.
HOW
Externalize Hive Metastore
into for schema definition.
WHAT
Defines security policies
around Hive schema.
WHY
If you have 2+ users accessing
the same data, need policies
to be consistently available
and enforced.
HOW
Externalize and share Ranger
across workloads and store
policies external.
WHAT
Audit user access.
WHY
Capture data access activity.
HOW
Externalize and share Ranger
across workloads, leverage
cloud storage for audit data.
GATEWAY
WHAT
Provide single endpoint that
can be protected with SSL and
enabled for authentication to
access to cluster resources.
WHY
Avoid opening many ports,
some potentially w/o
authentication or SSL
protection.
HOW
Deploy a centralized protected
gateway automatically.
WHAT
Users and groups.
WHY
Provide authentication source
for users and authorization
source for groups.
HOW
Leverage external LDAP or
Active Directory.
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads: With Enterprise Security
Ephemeral Enterprise Security
Tuned and Optimized
Infrastructure
Simplified, Automated
Operations
S3 Integration
Protected Network Access
Schema Shared (Hive Metastore) Shared (Hive Metastore)
Authentication Single-user Multi-User (LDAP/AD)
Authorization - Security Policies (Ranger)
Audit - Audit (Ranger)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services
CLOUD STORAGE
S3
ADLS
WASB
WORKLOAD CLUSTERS
Durable Ephemeral
SHARED DATA LAKE SERVICES
Metastore
SCHEMA
Long Running
Define your data schema and
security policies once for your
ephemeral and always-on
workloads
Ranger
POLICY
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Schema: Hive Metastore
 Register external “Amazon RDS” instances to use with Hive Metastore
 Preserve Hive schema across multiple ephemeral clusters
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Protected Network Access: Knox
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Security Policies: Ranger
 Create a set of “Shared Data Lake Services”
 Preserve Ranger Security Policies across multiple ephemeral clusters
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment Architecture
Access your cluster
components through the
protected gateway via SSL
on port 443 open on the
controller security group.
CONTROLLER
PROTECTED
GATEWAY
USER ACCESS
Zeppelin
HIVE LLAP / SPARK WORKLOADS
Hive
LLAP
SHARED DATA LAKE SERVICES
Ranger
POLICY
(RDS)
AUDIT
(S3)
SCHEMA
(RDS)
DIRECTORY
(LDAP/AD)
Spark
Hive
Metastore
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Cloud + Shared Data Lake Services
1
2
3
Register an Authentication Source (i.e. LDAP/AD).
Create a “Shared Data Lake”, specify S3 Bucket & RDS.
When you create a cluster, ”attach” to the Shared Data Lake Services:
• for Multi-User AuthN (LDAP/AD)
• for AuthZ + Audit (Ranger)
• for Schema (Hive Metastore)
PREREQUISITES
• LDAP/AD
• S3 Bucket
• RDS Instance
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Longer demo…
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
General Guidelines
 Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not
create tables or files in the local HDFS.
 The Hive warehouse is setup to be on S3 for data lakes, create tables in this location
instead of individual S3 buckets, it will make them easier to manage.
 Use Hive “external tables” for tables that are outside this warehouse, typically if the
data is being ingested through some path outside of Hadoop
 Create S3 bucket policies that exactly match usage so that you can spin up clusters with
the least privilege.
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Wrap Up
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Takeaways
 Cloud driving more ephemeral data processing use cases
 Ephemeral workloads leverage cloud storage
 This pattern is driving an architectural approach for “Shared Data Lake Services”
 Building blocks are Apache Ranger and Apache Knox
Resource Link
Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/
Apache Ranger https://hortonworks.com/apache/ranger/
Apache Knox https://hortonworks.com/apache/knox-gateway/
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Learn More
Enterprise ready security
and governance for
Hadoop ecosystem
Breakout Session
Thursday, September 21 @ 3:10p
https://dataworkssummit.com/sydney-
2017/sessions/treat-your-enterprise-data-lake-
indigestion-enterprise-ready-security-and-governance-
for-hadoop-ecosystem
Security, Governance and
Cybersecurity
Bird of a Feather
Thursday, September 21 @ 6:00p
https://dataworkssummit.com/sydney-2017/birds-of-a-
feather/security-governance-cybersecurity/
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Thank You
https://hortonworks.com/products/cloud/aws/
https://hortonworks.com/apache/ranger/
https://hortonworks.com/apache/atlas/

More Related Content

PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
PPTX
Enabling Modern Application Architecture using Data.gov open government data
PPTX
Hybrid Data Platform
PDF
Leveraging docker for hadoop build automation and big data stack provisioning
PDF
Realizing the Promise of Portable Data Processing with Apache Beam
PPTX
Cloudy with a chance of Hadoop - real world considerations
PPTX
Build Big Data Enterprise solutions faster on Azure HDInsight
PPTX
Hadoop in the Cloud - The what, why and how from the experts
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Enabling Modern Application Architecture using Data.gov open government data
Hybrid Data Platform
Leveraging docker for hadoop build automation and big data stack provisioning
Realizing the Promise of Portable Data Processing with Apache Beam
Cloudy with a chance of Hadoop - real world considerations
Build Big Data Enterprise solutions faster on Azure HDInsight
Hadoop in the Cloud - The what, why and how from the experts

What's hot (20)

PPTX
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
PPTX
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
PPTX
Insights into Real-world Data Management Challenges
PPTX
Accelerating Big Data Insights
PDF
HAWQ Meets Hive - Querying Unmanaged Data
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
PDF
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
PPTX
Dancing elephants - efficiently working with object stores from Apache Spark ...
PPTX
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
PPTX
Saving the elephant—now, not later
PPTX
Dynamic DDL: Adding structure to streaming IoT data on the fly
PPTX
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
PPTX
Implementing Security on a Large Multi-Tenant Cluster the Right Way
PPTX
LLAP: Sub-Second Analytical Queries in Hive
PPTX
A New "Sparkitecture" for modernizing your data warehouse
PPTX
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
PPTX
Ravi Namboori 's Open stack framework introduction
PPTX
Securing Data in Hadoop at Uber
PPTX
Cloudy with a Chance of Hadoop - Real World Considerations
PPTX
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Insights into Real-world Data Management Challenges
Accelerating Big Data Insights
HAWQ Meets Hive - Querying Unmanaged Data
Big Data in the Cloud - The What, Why and How from the Experts
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Dancing elephants - efficiently working with object stores from Apache Spark ...
Accelerate Your Big Data Analytics Efforts with SAS and Hadoop
Saving the elephant—now, not later
Dynamic DDL: Adding structure to streaming IoT data on the fly
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
Implementing Security on a Large Multi-Tenant Cluster the Right Way
LLAP: Sub-Second Analytical Queries in Hive
A New "Sparkitecture" for modernizing your data warehouse
How to Use Innovative Data Handling and Processing Techniques to Drive Alpha ...
Ravi Namboori 's Open stack framework introduction
Securing Data in Hadoop at Uber
Cloudy with a Chance of Hadoop - Real World Considerations
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Ad

Viewers also liked (20)

PDF
Big Data Security with Hadoop
PDF
Hadoop & Security - Past, Present, Future
PPTX
Apache Knox setup and hive and hdfs Access using KNOX
PPT
Information security in big data -privacy and data mining
PPTX
Improvements in Hadoop Security
PPTX
Troubleshooting Kerberos in Hadoop: Taming the Beast
PPTX
An Approach for Multi-Tenancy Through Apache Knox
PPTX
Hdp security overview
PPTX
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
PPTX
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
PPTX
Big Data and Security - Where are we now? (2015)
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
PDF
Hadoop Internals (2.3.0 or later)
PPTX
Hadoop and Data Access Security
PDF
OAuth - Open API Authentication
PPT
Hadoop Security Architecture
PPTX
HADOOP TECHNOLOGY ppt
PDF
Cours Big Data Chap1
PDF
Hadoop Overview & Architecture
 
Big Data Security with Hadoop
Hadoop & Security - Past, Present, Future
Apache Knox setup and hive and hdfs Access using KNOX
Information security in big data -privacy and data mining
Improvements in Hadoop Security
Troubleshooting Kerberos in Hadoop: Taming the Beast
An Approach for Multi-Tenancy Through Apache Knox
Hdp security overview
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Big Data and Security - Where are we now? (2015)
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Internals (2.3.0 or later)
Hadoop and Data Access Security
OAuth - Open API Authentication
Hadoop Security Architecture
HADOOP TECHNOLOGY ppt
Cours Big Data Chap1
Hadoop Overview & Architecture
 
Ad

Similar to Built-In Security for the Cloud (20)

PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
PPTX
Improvements in Hadoop Security
PDF
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
PPTX
Hadoop security
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
PPTX
Curb your insecurity with HDP
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PPTX
Apache Ranger
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
PPTX
Open Source Security Tools for Big Data
PPTX
Open Source Security Tools for Big Data
PDF
TriHUG October: Apache Ranger
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
PPTX
Classification based security in Hadoop
PPTX
Moving towards enterprise ready Hadoop clusters on the cloud
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
PDF
How to Secure your Data Lake
PDF
How to secure your data lake
PPTX
Managing enterprise users in Hadoop ecosystem
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Hortonworks Hybrid Cloud - Putting you back in control of your data
Improvements in Hadoop Security
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Hadoop security
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb your insecurity with HDP
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Apache Ranger
Curb your insecurity with HDP - Tips for a Secure Cluster
Open Source Security Tools for Big Data
Open Source Security Tools for Big Data
TriHUG October: Apache Ranger
Is your Enterprise Data lake Metadata Driven AND Secure?
Classification based security in Hadoop
Moving towards enterprise ready Hadoop clusters on the cloud
Security and Data Governance using Apache Ranger and Apache Atlas
How to Secure your Data Lake
How to secure your data lake
Managing enterprise users in Hadoop ecosystem
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
KodekX | Application Modernization Development
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
KodekX | Application Modernization Development
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”

Built-In Security for the Cloud

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Built-in Security For The Cloud DataWorks Summit Sydney September 2017
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Presenters Jeff Sposetti Senior Director of Product Management, Cloud Hortonworks Data Cloud, Cloudbreak
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda  Introduction  Quick Demo  Security Building Blocks: Apache Ranger and Knox  Bringing It Together: Cloud and Data Lake Security  Longer Demo  Wrap Up  Q & A
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background: Ephemeral Workloads + Cloud Storage  Cloud is driving more ephemeral data processing use cases  Cloud requires a robust integration with cloud storage CLOUD STORAGE S3 ADLS WASB WORKLOAD CLUSTERS Durable Ephemeral
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background: Hortonworks Data Cloud for AWS  Focuses on business agility, rather than infinite configurability and cluster management  Addresses prescriptive, ephemeral use cases around Apache Spark + Apache Hive  Pre-tuned and configured for use with Amazon S3 Learn more: http://hortonworks.com/products/cloud/aws/
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Quick demo…
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Security Building Blocks: Apache Ranger and Knox
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Protecting the Elephant in the Castle….. Kerberos, Wire Encryption HDFS Encryption Apache Ranger Network Segmentation, Firewalls LDAP/AD Apache Knox
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Knox Proxying Services ★ Provide access to Hadoop via proxying of HTTP resources ★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs (impersonation) etc. Authentication Services ★ REST API access, WebSSO flow for UIs ★ LDAP/AD, Header based PreAuth ★ Kerberos, SAML, OAuth Client DSL/SDK Services ★ Scripting through DSL ★ Using Knox Shell classes directly as SDK
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Ranger Comprehensive and Extensible Security Model • Centralized platform to define, administer and manage security policies across Hadoop components (HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas) • Extensible Architecture with ability to add custom policy conditions, user context enrichers Fine-Grained Authorization • For data access control for Database, Table, Column, LDAP Groups & Specific Users Centralized Auditing • Central audit location for all access requests • Support multiple destination sources (HDFS, Solr, etc.) • Real-time visual query interface Advanced Security • Dynamic Security Policies: Prohibition, Time, Location and Tag (Atlas) • Dynamic Column Masking & Row Filtering OPERATIONS SECURITY GOVERNANCE STORAGE STORAGE Machine Learning Batch StreamingInteractive Search SECURITY
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Bringing It Together: Cloud and Data Lake Services
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved CLOUD DATA LAKE SECURITY
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Key Components for Enterprise Security SCHEMA POLICY AUDIT DIRECTORY WHAT Provides Hive schema (tables, views, etc). WHY If you have 2+ workloads accessing the same data, need to share schema across those workloads. HOW Externalize Hive Metastore into for schema definition. WHAT Defines security policies around Hive schema. WHY If you have 2+ users accessing the same data, need policies to be consistently available and enforced. HOW Externalize and share Ranger across workloads and store policies external. WHAT Audit user access. WHY Capture data access activity. HOW Externalize and share Ranger across workloads, leverage cloud storage for audit data. GATEWAY WHAT Provide single endpoint that can be protected with SSL and enabled for authentication to access to cluster resources. WHY Avoid opening many ports, some potentially w/o authentication or SSL protection. HOW Deploy a centralized protected gateway automatically. WHAT Users and groups. WHY Provide authentication source for users and authorization source for groups. HOW Leverage external LDAP or Active Directory.
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ephemeral Workloads: With Enterprise Security Ephemeral Enterprise Security Tuned and Optimized Infrastructure Simplified, Automated Operations S3 Integration Protected Network Access Schema Shared (Hive Metastore) Shared (Hive Metastore) Authentication Single-user Multi-User (LDAP/AD) Authorization - Security Policies (Ranger) Audit - Audit (Ranger)
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services CLOUD STORAGE S3 ADLS WASB WORKLOAD CLUSTERS Durable Ephemeral SHARED DATA LAKE SERVICES Metastore SCHEMA Long Running Define your data schema and security policies once for your ephemeral and always-on workloads Ranger POLICY Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS.
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Shared Schema: Hive Metastore  Register external “Amazon RDS” instances to use with Hive Metastore  Preserve Hive schema across multiple ephemeral clusters
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Protected Network Access: Knox
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Shared Security Policies: Ranger  Create a set of “Shared Data Lake Services”  Preserve Ranger Security Policies across multiple ephemeral clusters
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Deployment Architecture Access your cluster components through the protected gateway via SSL on port 443 open on the controller security group. CONTROLLER PROTECTED GATEWAY USER ACCESS Zeppelin HIVE LLAP / SPARK WORKLOADS Hive LLAP SHARED DATA LAKE SERVICES Ranger POLICY (RDS) AUDIT (S3) SCHEMA (RDS) DIRECTORY (LDAP/AD) Spark Hive Metastore
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Data Cloud + Shared Data Lake Services 1 2 3 Register an Authentication Source (i.e. LDAP/AD). Create a “Shared Data Lake”, specify S3 Bucket & RDS. When you create a cluster, ”attach” to the Shared Data Lake Services: • for Multi-User AuthN (LDAP/AD) • for AuthZ + Audit (Ranger) • for Schema (Hive Metastore) PREREQUISITES • LDAP/AD • S3 Bucket • RDS Instance
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Longer demo…
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved General Guidelines  Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not create tables or files in the local HDFS.  The Hive warehouse is setup to be on S3 for data lakes, create tables in this location instead of individual S3 buckets, it will make them easier to manage.  Use Hive “external tables” for tables that are outside this warehouse, typically if the data is being ingested through some path outside of Hadoop  Create S3 bucket policies that exactly match usage so that you can spin up clusters with the least privilege.
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Wrap Up
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Takeaways  Cloud driving more ephemeral data processing use cases  Ephemeral workloads leverage cloud storage  This pattern is driving an architectural approach for “Shared Data Lake Services”  Building blocks are Apache Ranger and Apache Knox Resource Link Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/ Apache Ranger https://hortonworks.com/apache/ranger/ Apache Knox https://hortonworks.com/apache/knox-gateway/
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Learn More Enterprise ready security and governance for Hadoop ecosystem Breakout Session Thursday, September 21 @ 3:10p https://dataworkssummit.com/sydney- 2017/sessions/treat-your-enterprise-data-lake- indigestion-enterprise-ready-security-and-governance- for-hadoop-ecosystem Security, Governance and Cybersecurity Bird of a Feather Thursday, September 21 @ 6:00p https://dataworkssummit.com/sydney-2017/birds-of-a- feather/security-governance-cybersecurity/
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Thank You https://hortonworks.com/products/cloud/aws/ https://hortonworks.com/apache/ranger/ https://hortonworks.com/apache/atlas/