SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Inc. Talend Inc. Arcadia Data Protegrity
Ali Bajwa, Partner Solutions Laurent Bride, CTO Shant Hovsepian, CTO Sunil Sabat, Director, Partner
Solutions
Srikanth Venkat, Product Management
DataWorks Summit - San Jose
Partner Ecosystem Showcase For
Apache Ranger And Apache Atlas
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Apache Ranger & Apache Atlas
Journey, Ecosystem & Partners
Hortonworks Partner Certification Program
SEC Ready & GOV Ready program
Partner Technology Showcase
3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved
Apache Ranger Community Snapshot
May 2014
XASecure
Acquisition
July 2014
Enters Apache
Incubation
Nov 2014
Ranger 0.4.0
Release
July 2015
Ranger 0.5/
HDP2.3
Aug 2016
Ranger 0.6/
HDP2.5
Nov 2016
Ranger 0.6.2/
HDP2.5.3
Jan 2017
Ranger TLP
graduation!
Apr 2017
Ranger 0.7/
HDP2.6
TBD
1.0.0
Target
Release
Date
• Committers: 22
• Contributors from:
Ebay, MSFT, Huawei,
Pandora, Accenture, ING,
Talend
Ranger 0.7/HDP 2.6
• Export/import of Policies
• $User and macros
• Plugin status tab
• “Show columns” and “describe extended
support”
• Incremental LDAP Sync
• SmartSense Metrics
Ranger 0.6/HDP2.5
• Classification (tag) based security (ABAC)
• Dynamic Column Masking & Row Filtering
• KMS HSM Integration (Safenet)
• Dynamic Policies & Deny Conditions
• LDAP Improvements & Audit Scalability
Jun 2017
Ranger 0.7.1/
HDP2.6.1
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Ranger: Ecosystem
PartnerPartner Integrations
Apache Ranger
Apache
Kafka
Native Hadoop
Service Authorizers
Azure Data Lake
Store (ADLS)*
(Future)
Authorizer
Extensions
for Non-
Hadoop
Filesystems
& Stores
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May
2015
Apache
Atlas
Incubation
DGI group
Kickoff
Dec
2014
Apr
2017
HDP 2.6/
Apache 0.8
Release
Global
Financial
Company
* DGI: Data Governance Initiative
Aug
2016
HDP 2.5/
Apache 0.7
Foundation
Release
Apache 0.8/HDP 2.6
• Simplified Search UI
• Simplified APIs
• Classification-based security for
HDFS, Kafka, HBase
• Knox SSO
• Performance/scalability
improvements
Apache 0.7.1/HDP 2.5.3
• High availability support
• LDAP Authentication/Authorization
• Classification based security for Hive
• UI Redesign
• Committers – 35
• Code contributors from
- IBM, Aetna, Merck, Target,
JPMC
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Atlas: Ecosystem
Custom
Integration
Apache Atlas
RDBMS
Apache
Kafka
Pending:PartnerPartner
7©2017 Talend Inc
Talend Studio Jobs lineage with
Apache Atlas
Laurent Bride, CTO Talend
8
Agenda
 Integration Goals
 Design
 Technical Details
 Demo
9
Integration Goals
 Support lineage of Talend Studio jobs on Apache Atlas /
Hortonworks HDP
 Similar (or improved) functionality to what we offer for other
lineage providers.
 Lineage for Talend Big Data jobs both on Spark/Hadoop.
 Authentication with Lineage Backend.
 Die-on-error: Lineage failure does not affect job execution.
10
Design
 Goal: Support a similar generic lineage model.
 Solution:
 Send the transformation graph representation with each node as a HashMap of properties.
 Translate the graph into the given model in an integration layer.
 For the Atlas case it uses the Atlas REST API via atlas-client JAR.
 Let the specific lineage provider functionality open for advanced functionality
• Future Roadmap items
11
Technical Details - Talend Model for Atlas
Note that Lineage view only shows Entities
that are in the “DataSet – Process – DataSet”
form.
So we had to represent every Component as
a DataSet (tComponent) and create artificial
components (tArtificialComponent) as a
Process so we can show them in the Lineage
view.
12
Technical Details – Open Issues
 The entity connection constraint is our biggest issue.
 Breaking changes on the API (atlas-client 0.8 but compatible with 0.7 through
redirect).
 Inherited properties are shown even if not assigned (this is not an issue, but
due to our reuse of DataSet we have issues like this:
 DataSet has an owner, but an owner does not make sense for a Talend transform.
 Atlas Model is flexible but strict at the same time, data is constrained to
evolve with metadata, if we pass new arguments that are not defined in the
metadata model they are ignored.
13
Demo / Talend Studio side
14
Demo / How it looks like in Apache Atlas
Arcadia Data. Proprietary and Confidential
Securing Visual Analytics for Big Data
with Apache Ranger
Shant Hovsepian – CTO & co-Founder
@superdupershant
June 14, 2017
Arcadia Data. Proprietary and Confidential
Arcadia Visualization Engine
The First Native Visual Analytics Platform for Big Data
Arcadia Analytic Platform
(Smart Acceleration™)
On-Premises
Drag-and-drop Visual Analytics & Dashboards
HybridCloud
Custom Data Applications
…BIG DATA OS
Distributed execution,
data storage, metadata, security
IN-CLUSTER ANALYTICS ENGINE
Scales linearly with cluster for
speed and easier management
WEB-BASED INTERFACE
Drag & drop interface for
visual analytics & app workflow
DataPlatform
Arcadia Data. Proprietary and Confidential
The Challenge
Arcadia Data. Proprietary and Confidential
What is Apache Ranger?
• Centralized authorization and auditing across Hadoop components
• Access authorization based on resources
• Policy based behavior such as column masking
• Extensible Architecture
18
Arcadia Data. Proprietary and Confidential
The Value of a Robust Policy Engine
• It’s complicated code to get right
• I am Lazy, I don’t want to implement it
• Zero Knowledge Proofs
19
Arcadia Data. Proprietary and Confidential
Native Security Integration
Arcadia analytics
platform
HDFS
SINGLE COPY OF DATA TO SECURE
 Reduces footprint of data copies with the same or summarized
information
 Single policy definition for access control
 Easier compliance
ENTERPRISE GRADE
 Kerberos, LDAPS/AD, PAM and SAML
 Single sign on for business users
 Role-based access control with delegation
INTEGRATED ROLE-BASED ACCESS
 Use role definitions from Ranger for access at BI tier
 No risk of mismatching policies between data management tier
and BI tier
Arcadia Data. Proprietary and Confidential
Configuration
• Tight integration with Ranger + Ambari makes installation and
configuration very easy!
21
Arcadia Data. Proprietary and Confidential
Arcadia Data OLAP Engine
• In order to accelerate data access and reporting we have an on-cluster
engine
• Cubes are pre-computed and stored in memory and in HDFS via
HCatalog.
• We had to make sure all Hive catalog accesses were first authorized
through Ranger
• Simple implementation just requires an Authorizer class with
isAccessAllowed()
22
Arcadia Data. Proprietary and Confidential
Arcadia Data Visualization Server (BETA)
• While table level privileges like SELECT/INSERT make sense for tables
visuals tend to have a richer set of verbs
• Need to define custom “resources” in Ranger
• Define custom “privileges” Edit / Clone / Export / Interact
• A little tricky to do if you are not Java based
• Wildcard support is awesome!!!!!
• See Yesterday’s talk on Ranger + HAWQ for more details (EXTENDING
APACHE RANGER AUTHORIZATION BEYOND HADOOP)
23
Arcadia Data. Proprietary and Confidential
Policy Page
• Arcadia Policy Shows Up Along others
24
Arcadia Data. Proprietary and Confidential
Admin Level Access
25
Arcadia Data. Proprietary and Confidential
Restricted Access For The Public
26
Arcadia Data. Proprietary and Confidential
In Conclusion
Arcadia Data. Proprietary and Confidential
Thank you.
Visit us at
Booth 606
Protegrity Big Data Protector and Apache
Ranger
Ranger Integration
By
Sunil Sabat
Copyright – Protegrity Inc.
WHATDO WE DO?
Deliver centralized
policy enforcement
across enterprise
Apply security as
close to the data as
possible
Protect the entire
data flow – at rest,
in transit, in use
HOW WE DO IT
Spending
Healthcare
Financial
ASSOCIATED DATAIDENTIFIED DATA
SSN (023-45-1288)
Name (Jane Doe)
Email (joe@yahoo.com)
DE-IDENTIFIED DATA
SSN (153-51-4363)
Name (Hfhe Jes)
Email (fhj@jjwvw.chw)
IDENTITY IS KNOWN
IDENTITY IS NOT KNOWN
To Unauthorized Users
To Authorized Users
ACROSSTHE ENTERPRSE
ESA
1/02/1966 xxxx2278 ysieondusbak
Tokenized In the clearMaskedDe-identified
Joe Smith
12/25/1966
076-39-2778
CENTRAL
MANAGEMENT
POLICY
ENFORCED
TECHNOLOGY
CONSISTENT
PROTECTION
Protegrity’s Big Data Protector for Hadoop
Hive
MapReduce
YARN
HDFS
OS File System
Pig Other
Name
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Edge
Node
Data
Node
Edge
Node
Data
Node
Edge
Node
Edge
Node
Edge
Node
Edge
Node
Data
Node
Data
Node
Data
Node
Edge
Node
Hadoop Cluster Hadoop Node
Policy
Audit
Protegrity Big Data Protector for Hadoop delivers protection at every
node and is delivered with our own cluster management capability.
All nodes are managed by the Enterprise Security Administrator that
delivers policy and accepts audit logs
Protegrity Data Security Policy contains information about how data is de-
identified and who is authorized to have access to that data.
Policy is enforced at different levels of protection in Hadoop.
Coarse Grained Encryption
Fine Grained Encryption
Spark ( Java
and Scala )
Perfect data security and governance
• Combine best of two products – Apache Ranger and Protegrity ESA (
enterprise security administrator )
• Apache Ranger controls access and authorization
• Protegrity protects data at fine grained level using tokenization
• Modern Data Lakes benefit from both products
• Data lake is protected according to enterprise security policy while Hadoop
access and authorization in in the hands of Ranger
Process Flow
Protegrity
coexists with
Apache Ranger
policies
Ranger controls
column access
policy
Ranger KMS
coexists along
with Protegrity
KMS
Protegrity
protects column
data based on
ESA policy
Ranger logs along with ESA
logs give comprehensive
security audit ( access and
data protection ) logs for
forensic analysis, fraud
alerts and other benefits
Ranger custom
masking function
can be a
Protegrity UDF
Protegrity and Ranger Integration
Protegrity coexists with Apache Ranger policies
•Ranger controls column access policy
•Ranger KMS coexists along with Protegrity KMS
•Protegrity protects column data based on ESA policy
•Ranger logs along with ESA logs give comprehensive
security audit ( access and data protection ) logs for
forensic analysis, fraud alerts and other benefits
•Ranger custom masking function can be a Protegrity UDF
Future Exploration
•Embed access policy in Ranger with Protegrity Data
Element protection policy for better alert and
management
•Inherit access policies from Ranger into ESA policy design
•Single KMS - Best
Use Cases
• Data Protection is provided by Protegrity across the enterprise while
Hadoop authorization and access is controlled by Ranger
• Enhance Apache Ranger Column masking using custom function in
the form of Protegrity UDFs.
• Result is Ranger in control of data access and protection
Clear Data in Hive table
• Original Data present in table “clear_table”
•
• select * from clear_table;
• +-------------------+--+
• | clear_table.ccn |
• +-------------------+--+
• | 5539455602750205 |
• | 5464987835837424 |
• | 6226540862865375 |
• | 6226600538383292 |
• | 376235139103947 |
• +-------------------+--+
Custom masking function - Protect
Custom masking function - Unprotect
Summary of Demo
Original Data Protected Data Unprotected Data
5539455602750200 8295281832577430 5539455602750200
5464987835837420 8437400318738670 5464987835837420
6226540862865370 9683356798323010 6226540862865370
6226600538383290 9885536985189730 6226600538383290
376235139103947 222096775455034 376235139103947
THANK YOU
www.protegrity.com
46 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP SEC READY & GOV READY Programs
✔ Choice: Customers choose features that they want to deploy—a la carte
✔ Curated & Fast: Partners to provide rich, complimentary and complete features ready to
deploy
✔ Agile: Faster deployment and accelerate innovation
✔ Centralized : Open metadata/governance and security infrastructure
✔ Flexibility: Portfolio of partner reference architectures and integration patterns
✔ Safe: HDP at core to provide stability and interoperability
47 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Certified Technology Program
HDP YARN Ready
Integrates with YARN
(native, Tez, Slider) or
uses/runs on a YARN
Ready engine
HDP Operations Ready
Integrates with Ambari
APIs, Stacks, Blueprints,
or Views
HDP Governance Ready
Integrates with Atlas
HDP Security Ready
Integrates with
Ranger, Knox, or other
security features
Sign up to be a partner and request certification kit!
http://hortonworks.com/partners/product-integration-certification/
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions

More Related Content

PPTX
GDPR Community Showcase for Apache Ranger and Apache Atlas
PPTX
Atlas and ranger epam meetup
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
PPTX
Ranger admin dev overview
PPTX
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
PDF
Apache ranger meetup
PPTX
Classification based security in Hadoop
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
GDPR Community Showcase for Apache Ranger and Apache Atlas
Atlas and ranger epam meetup
Security and Data Governance using Apache Ranger and Apache Atlas
Ranger admin dev overview
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache ranger meetup
Classification based security in Hadoop
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...

What's hot (20)

PPTX
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
PPTX
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
PDF
Data Governance - Atlas 7.12.2015
PPTX
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
PPTX
Unleashing the power of apache atlas with apache - virtual dataconnector
PDF
History of Privacera
PPTX
Best Practices for Enterprise User Management in Hadoop Environment
PPTX
Apache Ranger Hive Metastore Security
PPTX
The Apache Way
PDF
Apache Atlas. Data Governance for Hadoop. Strata London 2015
PPT
Running Zeppelin in Enterprise
PPTX
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
PPTX
Built-In Security for the Cloud
PPTX
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
PPTX
Integrating Apache Spark and NiFi for Data Lakes
PPTX
Hadoop first ETL on Apache Falcon
PPTX
Dynamic DDL: Adding structure to streaming IoT data on the fly
PPTX
Cloudy with a chance of Hadoop - real world considerations
PPTX
Hadoop & Cloud Storage: Object Store Integration in Production
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Securing Enterprise Healthcare Big Data by the Combination of Knox/F5, Ranger...
Data Governance - Atlas 7.12.2015
Security Updates: More Seamless Access Controls with Apache Spark and Apache ...
Unleashing the power of apache atlas with apache - virtual dataconnector
History of Privacera
Best Practices for Enterprise User Management in Hadoop Environment
Apache Ranger Hive Metastore Security
The Apache Way
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Running Zeppelin in Enterprise
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Built-In Security for the Cloud
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Integrating Apache Spark and NiFi for Data Lakes
Hadoop first ETL on Apache Falcon
Dynamic DDL: Adding structure to streaming IoT data on the fly
Cloudy with a chance of Hadoop - real world considerations
Hadoop & Cloud Storage: Object Store Integration in Production
Ad

Viewers also liked (20)

PPTX
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
PPTX
The Evolution of Data Architecture
PPTX
Ibm watson
PDF
CWIN17 Frankfurt / Cloudera
PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PDF
빅데이터윈윈 컨퍼런스_데이터시각화자료
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PDF
Softnix Messaging Server
PDF
The Fast Path to Building Operational Applications with Spark
PPTX
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
PDF
Zoomdata
PDF
Building the Ideal Stack for Real-Time Analytics
PDF
Cloudera and Qlik: Big Data Analytics for Business
PPTX
Security implementation on hadoop
PDF
Spark meetup - Zoomdata Streaming
PDF
Softnix Security Data Lake
PPTX
Put Alternative Data to Use in Capital Markets

PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
PPTX
Benefits of Transferring Real-Time Data to Hadoop at Scale
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
The Evolution of Data Architecture
Ibm watson
CWIN17 Frankfurt / Cloudera
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
빅데이터윈윈 컨퍼런스_데이터시각화자료
Using Big Data to Transform Your Customer’s Experience - Part 1

Softnix Messaging Server
The Fast Path to Building Operational Applications with Spark
Webinar - Sehr empfehlenswert: wie man aus Daten durch maschinelles Lernen We...
Zoomdata
Building the Ideal Stack for Real-Time Analytics
Cloudera and Qlik: Big Data Analytics for Business
Security implementation on hadoop
Spark meetup - Zoomdata Streaming
Softnix Security Data Lake
Put Alternative Data to Use in Capital Markets

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Benefits of Transferring Real-Time Data to Hadoop at Scale
Ad

Similar to Partner Ecosystem Showcase for Apache Ranger and Apache Atlas (20)

PPTX
How Hewlett Packard Enterprise Gets Real with IoT Analytics
PPTX
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
PPTX
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
PPTX
The Power of Data
PPTX
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
PPTX
Is your Enterprise Data lake Metadata Driven AND Secure?
PPTX
Big Data IDEA 101 2019
PPTX
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
PDF
A Tale of Two BI Standards
PPTX
Munich HUG 21.11.2013
PPTX
Back to school: Big Data IDEA 101
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
PPTX
Building a data-driven authorization framework
PPTX
Apache Atlas: Tracking dataset lineage across Hadoop components
PPTX
Apache Atlas: Governance for your Data
PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PDF
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
How Hewlett Packard Enterprise Gets Real with IoT Analytics
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
The Power of Data
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Is your Enterprise Data lake Metadata Driven AND Secure?
Big Data IDEA 101 2019
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
A Tale of Two BI Standards
Munich HUG 21.11.2013
Back to school: Big Data IDEA 101
Implementing a Data Lake with Enterprise Grade Data Governance
Building a data-driven authorization framework
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Governance for your Data
Hortonworks Hybrid Cloud - Putting you back in control of your data
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Enterprise Hadoop with Hortonworks and Nimble Storage

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
sap open course for s4hana steps from ECC to s4
Network Security Unit 5.pdf for BCA BBA.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding

Partner Ecosystem Showcase for Apache Ranger and Apache Atlas

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Inc. Talend Inc. Arcadia Data Protegrity Ali Bajwa, Partner Solutions Laurent Bride, CTO Shant Hovsepian, CTO Sunil Sabat, Director, Partner Solutions Srikanth Venkat, Product Management DataWorks Summit - San Jose Partner Ecosystem Showcase For Apache Ranger And Apache Atlas
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Apache Ranger & Apache Atlas Journey, Ecosystem & Partners Hortonworks Partner Certification Program SEC Ready & GOV Ready program Partner Technology Showcase
  • 3. 3 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Apache Ranger Community Snapshot May 2014 XASecure Acquisition July 2014 Enters Apache Incubation Nov 2014 Ranger 0.4.0 Release July 2015 Ranger 0.5/ HDP2.3 Aug 2016 Ranger 0.6/ HDP2.5 Nov 2016 Ranger 0.6.2/ HDP2.5.3 Jan 2017 Ranger TLP graduation! Apr 2017 Ranger 0.7/ HDP2.6 TBD 1.0.0 Target Release Date • Committers: 22 • Contributors from: Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend Ranger 0.7/HDP 2.6 • Export/import of Policies • $User and macros • Plugin status tab • “Show columns” and “describe extended support” • Incremental LDAP Sync • SmartSense Metrics Ranger 0.6/HDP2.5 • Classification (tag) based security (ABAC) • Dynamic Column Masking & Row Filtering • KMS HSM Integration (Safenet) • Dynamic Policies & Deny Conditions • LDAP Improvements & Audit Scalability Jun 2017 Ranger 0.7.1/ HDP2.6.1
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Ranger: Ecosystem PartnerPartner Integrations Apache Ranger Apache Kafka Native Hadoop Service Authorizers Azure Data Lake Store (ADLS)* (Future) Authorizer Extensions for Non- Hadoop Filesystems & Stores
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Background: DGI Community becomes Apache Atlas May 2015 Apache Atlas Incubation DGI group Kickoff Dec 2014 Apr 2017 HDP 2.6/ Apache 0.8 Release Global Financial Company * DGI: Data Governance Initiative Aug 2016 HDP 2.5/ Apache 0.7 Foundation Release Apache 0.8/HDP 2.6 • Simplified Search UI • Simplified APIs • Classification-based security for HDFS, Kafka, HBase • Knox SSO • Performance/scalability improvements Apache 0.7.1/HDP 2.5.3 • High availability support • LDAP Authentication/Authorization • Classification based security for Hive • UI Redesign • Committers – 35 • Code contributors from - IBM, Aetna, Merck, Target, JPMC
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache Atlas: Ecosystem Custom Integration Apache Atlas RDBMS Apache Kafka Pending:PartnerPartner
  • 7. 7©2017 Talend Inc Talend Studio Jobs lineage with Apache Atlas Laurent Bride, CTO Talend
  • 8. 8 Agenda  Integration Goals  Design  Technical Details  Demo
  • 9. 9 Integration Goals  Support lineage of Talend Studio jobs on Apache Atlas / Hortonworks HDP  Similar (or improved) functionality to what we offer for other lineage providers.  Lineage for Talend Big Data jobs both on Spark/Hadoop.  Authentication with Lineage Backend.  Die-on-error: Lineage failure does not affect job execution.
  • 10. 10 Design  Goal: Support a similar generic lineage model.  Solution:  Send the transformation graph representation with each node as a HashMap of properties.  Translate the graph into the given model in an integration layer.  For the Atlas case it uses the Atlas REST API via atlas-client JAR.  Let the specific lineage provider functionality open for advanced functionality • Future Roadmap items
  • 11. 11 Technical Details - Talend Model for Atlas Note that Lineage view only shows Entities that are in the “DataSet – Process – DataSet” form. So we had to represent every Component as a DataSet (tComponent) and create artificial components (tArtificialComponent) as a Process so we can show them in the Lineage view.
  • 12. 12 Technical Details – Open Issues  The entity connection constraint is our biggest issue.  Breaking changes on the API (atlas-client 0.8 but compatible with 0.7 through redirect).  Inherited properties are shown even if not assigned (this is not an issue, but due to our reuse of DataSet we have issues like this:  DataSet has an owner, but an owner does not make sense for a Talend transform.  Atlas Model is flexible but strict at the same time, data is constrained to evolve with metadata, if we pass new arguments that are not defined in the metadata model they are ignored.
  • 13. 13 Demo / Talend Studio side
  • 14. 14 Demo / How it looks like in Apache Atlas
  • 15. Arcadia Data. Proprietary and Confidential Securing Visual Analytics for Big Data with Apache Ranger Shant Hovsepian – CTO & co-Founder @superdupershant June 14, 2017
  • 16. Arcadia Data. Proprietary and Confidential Arcadia Visualization Engine The First Native Visual Analytics Platform for Big Data Arcadia Analytic Platform (Smart Acceleration™) On-Premises Drag-and-drop Visual Analytics & Dashboards HybridCloud Custom Data Applications …BIG DATA OS Distributed execution, data storage, metadata, security IN-CLUSTER ANALYTICS ENGINE Scales linearly with cluster for speed and easier management WEB-BASED INTERFACE Drag & drop interface for visual analytics & app workflow DataPlatform
  • 17. Arcadia Data. Proprietary and Confidential The Challenge
  • 18. Arcadia Data. Proprietary and Confidential What is Apache Ranger? • Centralized authorization and auditing across Hadoop components • Access authorization based on resources • Policy based behavior such as column masking • Extensible Architecture 18
  • 19. Arcadia Data. Proprietary and Confidential The Value of a Robust Policy Engine • It’s complicated code to get right • I am Lazy, I don’t want to implement it • Zero Knowledge Proofs 19
  • 20. Arcadia Data. Proprietary and Confidential Native Security Integration Arcadia analytics platform HDFS SINGLE COPY OF DATA TO SECURE  Reduces footprint of data copies with the same or summarized information  Single policy definition for access control  Easier compliance ENTERPRISE GRADE  Kerberos, LDAPS/AD, PAM and SAML  Single sign on for business users  Role-based access control with delegation INTEGRATED ROLE-BASED ACCESS  Use role definitions from Ranger for access at BI tier  No risk of mismatching policies between data management tier and BI tier
  • 21. Arcadia Data. Proprietary and Confidential Configuration • Tight integration with Ranger + Ambari makes installation and configuration very easy! 21
  • 22. Arcadia Data. Proprietary and Confidential Arcadia Data OLAP Engine • In order to accelerate data access and reporting we have an on-cluster engine • Cubes are pre-computed and stored in memory and in HDFS via HCatalog. • We had to make sure all Hive catalog accesses were first authorized through Ranger • Simple implementation just requires an Authorizer class with isAccessAllowed() 22
  • 23. Arcadia Data. Proprietary and Confidential Arcadia Data Visualization Server (BETA) • While table level privileges like SELECT/INSERT make sense for tables visuals tend to have a richer set of verbs • Need to define custom “resources” in Ranger • Define custom “privileges” Edit / Clone / Export / Interact • A little tricky to do if you are not Java based • Wildcard support is awesome!!!!! • See Yesterday’s talk on Ranger + HAWQ for more details (EXTENDING APACHE RANGER AUTHORIZATION BEYOND HADOOP) 23
  • 24. Arcadia Data. Proprietary and Confidential Policy Page • Arcadia Policy Shows Up Along others 24
  • 25. Arcadia Data. Proprietary and Confidential Admin Level Access 25
  • 26. Arcadia Data. Proprietary and Confidential Restricted Access For The Public 26
  • 27. Arcadia Data. Proprietary and Confidential In Conclusion
  • 28. Arcadia Data. Proprietary and Confidential Thank you. Visit us at Booth 606
  • 29. Protegrity Big Data Protector and Apache Ranger Ranger Integration By Sunil Sabat Copyright – Protegrity Inc.
  • 30. WHATDO WE DO? Deliver centralized policy enforcement across enterprise Apply security as close to the data as possible Protect the entire data flow – at rest, in transit, in use
  • 31. HOW WE DO IT Spending Healthcare Financial ASSOCIATED DATAIDENTIFIED DATA SSN (023-45-1288) Name (Jane Doe) Email (joe@yahoo.com) DE-IDENTIFIED DATA SSN (153-51-4363) Name (Hfhe Jes) Email (fhj@jjwvw.chw) IDENTITY IS KNOWN IDENTITY IS NOT KNOWN To Unauthorized Users To Authorized Users
  • 32. ACROSSTHE ENTERPRSE ESA 1/02/1966 xxxx2278 ysieondusbak Tokenized In the clearMaskedDe-identified Joe Smith 12/25/1966 076-39-2778 CENTRAL MANAGEMENT POLICY ENFORCED TECHNOLOGY CONSISTENT PROTECTION
  • 33. Protegrity’s Big Data Protector for Hadoop Hive MapReduce YARN HDFS OS File System Pig Other Name Node Data Node Data Node Data Node Edge Node Edge Node Data Node Edge Node Data Node Edge Node Edge Node Edge Node Edge Node Data Node Data Node Data Node Edge Node Hadoop Cluster Hadoop Node Policy Audit Protegrity Big Data Protector for Hadoop delivers protection at every node and is delivered with our own cluster management capability. All nodes are managed by the Enterprise Security Administrator that delivers policy and accepts audit logs Protegrity Data Security Policy contains information about how data is de- identified and who is authorized to have access to that data. Policy is enforced at different levels of protection in Hadoop. Coarse Grained Encryption Fine Grained Encryption Spark ( Java and Scala )
  • 34. Perfect data security and governance • Combine best of two products – Apache Ranger and Protegrity ESA ( enterprise security administrator ) • Apache Ranger controls access and authorization • Protegrity protects data at fine grained level using tokenization • Modern Data Lakes benefit from both products • Data lake is protected according to enterprise security policy while Hadoop access and authorization in in the hands of Ranger
  • 35. Process Flow Protegrity coexists with Apache Ranger policies Ranger controls column access policy Ranger KMS coexists along with Protegrity KMS Protegrity protects column data based on ESA policy Ranger logs along with ESA logs give comprehensive security audit ( access and data protection ) logs for forensic analysis, fraud alerts and other benefits Ranger custom masking function can be a Protegrity UDF
  • 36. Protegrity and Ranger Integration Protegrity coexists with Apache Ranger policies •Ranger controls column access policy •Ranger KMS coexists along with Protegrity KMS •Protegrity protects column data based on ESA policy •Ranger logs along with ESA logs give comprehensive security audit ( access and data protection ) logs for forensic analysis, fraud alerts and other benefits •Ranger custom masking function can be a Protegrity UDF Future Exploration •Embed access policy in Ranger with Protegrity Data Element protection policy for better alert and management •Inherit access policies from Ranger into ESA policy design •Single KMS - Best
  • 37. Use Cases • Data Protection is provided by Protegrity across the enterprise while Hadoop authorization and access is controlled by Ranger • Enhance Apache Ranger Column masking using custom function in the form of Protegrity UDFs. • Result is Ranger in control of data access and protection
  • 38. Clear Data in Hive table • Original Data present in table “clear_table” • • select * from clear_table; • +-------------------+--+ • | clear_table.ccn | • +-------------------+--+ • | 5539455602750205 | • | 5464987835837424 | • | 6226540862865375 | • | 6226600538383292 | • | 376235139103947 | • +-------------------+--+
  • 40. Custom masking function - Unprotect
  • 41. Summary of Demo Original Data Protected Data Unprotected Data 5539455602750200 8295281832577430 5539455602750200 5464987835837420 8437400318738670 5464987835837420 6226540862865370 9683356798323010 6226540862865370 6226600538383290 9885536985189730 6226600538383290 376235139103947 222096775455034 376235139103947
  • 43. 46 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HDP SEC READY & GOV READY Programs ✔ Choice: Customers choose features that they want to deploy—a la carte ✔ Curated & Fast: Partners to provide rich, complimentary and complete features ready to deploy ✔ Agile: Faster deployment and accelerate innovation ✔ Centralized : Open metadata/governance and security infrastructure ✔ Flexibility: Portfolio of partner reference architectures and integration patterns ✔ Safe: HDP at core to provide stability and interoperability
  • 44. 47 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Certified Technology Program HDP YARN Ready Integrates with YARN (native, Tez, Slider) or uses/runs on a YARN Ready engine HDP Operations Ready Integrates with Ambari APIs, Stacks, Blueprints, or Views HDP Governance Ready Integrates with Atlas HDP Security Ready Integrates with Ranger, Knox, or other security features Sign up to be a partner and request certification kit! http://hortonworks.com/partners/product-integration-certification/
  • 45. 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions