SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved.
DATA PROTECTION IN
HYBRID DATA LAKE ENVIRONMENT
Murali Ramasami, Staff Software Engineer
© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
AGENDA
Hybrid Data Lake Environment
Data Lifecycle Manager
Replicate data HDFS <-> Cloud
Replicate data Hive <-> Cloud
Demo
© Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved.
Digital content is doubling every 18 months
Structured Data
- Database
- Data Warehouse
- ERPs
- CRMs
Unstructured Data
- Web blogs
- Social media
- Audio, Video
- Software file-systems
Source: Frost & Sullivan - World’s Top Global Mega Trends To 2025 and Implications
to Business, Society and Cultures
1
2
3
BIG DATA TREND
90% of the data in the world has been created in the last two years alone
© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
HYBRID ENTERPRISE DATA LAKE ENVIRONMENT
ON-
PREM
Clusters
CLOUD
Dataplane Service
HIVE SPARK
YARN
HDFS
ATLAS, RANGER,
HIVE METASTORE.
KNOX
AMBARI
• LARGE
• SHARED WORKLOADS
• MULTI-TENANT CLUSTERS
• SHARED SERVICES
• MULTI-CLUSTER
• MULTI-CLOUD
Cloudbreak
ATLAS, RANGER, HIVE, KNOX
PUBLIC
CLOUD
COMPUTE
PUBLIC
CLOUD
STORAGE
HIVE LLAP
AMBARI
SPARK
AMBARI
© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
Cloudera DataPlane
is a portfolio of data solutions that enable the
enterprise to manage & discover data across
hybrid environments.
CLOUDERA DATAPLANE
CLOUDERA
DATAPLANE
SERVICE
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
DATA AT REST
Cloudera
Data Platform
DATA IN MOTION
Cloudera
Data Flow
© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
DATAPLANE: AT A GLANCE
Core Capabilities Data Lakes, Clusters
Platform Services Extensibility, Metering
DP PLATFORM
CDP CDF
© Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved.
DATAPLANE APPLICATIONS
© Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
DATA LIFECYCLE MANAGER (DLM) SERVICE
A Portfolio of Service
• Replication/failback to another
cloud/on-prem site for Disaster
Recovery
• Backup & Restore of business
critical data, for protection against
accidental deletion
• Auto Tiering of hot/warm/cold data
to cloud object storage/on-prem for
TCO reduction.
Production
Site Disaster Recovery Site
Offsite Replication
Failback
Sunday Monday Tuesday Wednesda
y
Thursday Friday Saturday Sunday
Full Backup
Cumulative incremental backup
Accidental Deletion
Solid State
Drive
Hard Drive Archive
Access to
Data
0days 30days 90days Forever
ProbabilityofReuse
Time
100%
0%
Disaster
Recovery
Backup &
Restore
Auto
Tiering
S3
© Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.
DLM FEATURES
• Incremental Hive replication & Hive
metadata
• HDFS snapshot based replication
between HDP clusters
• Ranger policy replication
• Atlas tag/lineage replication
• Cloud storage replication (AWS,
Azure, GCP)
• Active/standby behavior on DR site
using Ranger
• TDE & TLS support, Support multiple
keys/KMS
© Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
DLM UI
DLM App
Dataplane
Knox
DLM Engine
Yarn RangerHive
HDFS
Knox
DLM
Engine
RangerHive
HDFS
Source Cluster Destination Cluster
Atlas Atlas
DLM ARCHITECTURE
© Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved.
DLM ENGINE INTERNALS
• DLM engine is stateless server,
state in external DB.
• Schedules replication jobs using
Quartz.
• Job manager handles job
concurrency, failures, retries and
recovery.
• Metric collection for job and data
transfer
• Specific plugins handles replication
of the specific data from source
service to target service.
Knox
Scheduler Job manager Data Store
HDFS Hive Ranger
REST API
DLM Engine
Plugin Manager
© Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.
REPLICATING DATA ON-PREM <-> CLOUD
© Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
HDFS - CLOUD STORAGE REPLICATION
• Replication to/from cloud storage directly,
no cluster required on target
• Supported on AWS S3, Azure ADLS, GCP
GCS
• Data is pushed/pulled from on-prem
clusters using Hadoop Connectors for the
filesystem
• Supports cloud native encryptions
• File ACLs, Ranger Policies and Atlas
Metadata are copied in meta files for
restore
Knox
DLM Engine
Distcp
(Yarn)
Ranger
HDFS
On-prem Cluster
Cloud
Storage
Replication Policy
© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
REPLICATION—SECURITY POLICIES & GOVERNANCE METADATA
• Copies resource and descendent policies for the directories/DBs replicated
• Copies row filters and column masking rules
• Adds deny policy on target
• Atlas metadata, tags associated for the directories/DBs created will also get
replicate on to the target
© Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
HIVE ON-PREM TO CLOUD REPLICATION
• Minimal cluster on the cloud -
shared services cluster
• Push based replication to avoid
opening data ports
• Uses cloud SDK for secure data
transfer
• Supports cloud storage
encryption
• Uses Hadoop connector to
transfer data
Knox
DLM Engine
Distcp
(Yarn)
Ran
ger
HDFS
On-prem Cluster
Replication Policy
HS2
Cloud Storage
Knox
DLM
Engine
Ran
ger
Cloud Cluster
HMS
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.
EVENT BASED HIVE REPLICATION ( CLOUD )
Master Cluster
Cloud
Metastore -
RDBMS
Event Table
Metastore -
RDBMS
Cloud Storage
HDFS
HiveServer2
Distcp
REPL DUMP REPL LOAD
Serialize New Events Dump ( Data + Metadata )
© Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
Hadoop Storage in the Cloud Native Era
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Hadoop 3 (2017 hadoop taiwan workshop)
PPTX
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
PPTX
Scalable HiveServer2 as a Service
PDF
Tales from the Cloudera Field
PPTX
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Practical NoSQL: Accumulo's dirlist Example
Hadoop Storage in the Cloud Native Era
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Hadoop 3 (2017 hadoop taiwan workshop)
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Scalable HiveServer2 as a Service
Tales from the Cloudera Field
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable

What's hot (20)

PPTX
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
PDF
Big data processing meets non-volatile memory: opportunities and challenges
PPTX
A New "Sparkitecture" for modernizing your data warehouse
PPTX
HDFS Tiered Storage: Mounting Object Stores in HDFS
PDF
dplyr Interfaces to Large-Scale Data
PPTX
Taming the Elephant: Efficient and Effective Apache Hadoop Management
PPTX
Dancing elephants - efficiently working with object stores from Apache Spark ...
PPTX
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
PPTX
To The Cloud and Back: A Look At Hybrid Analytics
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Apache Hadoop 3 updates with migration story
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PPTX
HDFS Tiered Storage
PPTX
Apache Hadoop YARN 3.x in Alibaba
PPTX
Big Data Platform Industrialization
PPTX
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
PPTX
HDFS tiered storage
PPTX
MapR-DB – The First In-Hadoop Document Database
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
PDF
HDFS Analysis for Small Files
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Big data processing meets non-volatile memory: opportunities and challenges
A New "Sparkitecture" for modernizing your data warehouse
HDFS Tiered Storage: Mounting Object Stores in HDFS
dplyr Interfaces to Large-Scale Data
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Dancing elephants - efficiently working with object stores from Apache Spark ...
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
To The Cloud and Back: A Look At Hybrid Analytics
Floating on a RAFT: HBase Durability with Apache Ratis
Apache Hadoop 3 updates with migration story
From limited Hadoop compute capacity to increased data scientist efficiency
HDFS Tiered Storage
Apache Hadoop YARN 3.x in Alibaba
Big Data Platform Industrialization
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
HDFS tiered storage
MapR-DB – The First In-Hadoop Document Database
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
HDFS Analysis for Small Files
Ad

Similar to Data Protection in Hybrid Enterprise Data Lake Environment (20)

PPTX
Five Tips for Running Cloudera on AWS
PDF
Hadoop security implementationon 20171003
PPTX
Security implementation on hadoop
PPTX
Self-service Big Data Analytics on Microsoft Azure
PPTX
Cloudera SDX
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PDF
Hadoop on Cloud: Why and How?
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
PDF
One Hadoop, Multiple Clouds - NYC Big Data Meetup
PDF
One Hadoop, Multiple Clouds
PPTX
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
PDF
Hybrid is the New Normal
PPTX
Modern Data Warehouse Fundamentals Part 2
PDF
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
PPTX
A deep dive into running data analytic workloads in the cloud
PDF
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
PDF
Data Science and Machine Learning for the Enterprise
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Five Tips for Running Cloudera on AWS
Hadoop security implementationon 20171003
Security implementation on hadoop
Self-service Big Data Analytics on Microsoft Azure
Cloudera SDX
Leveraging the cloud for analytics and machine learning 1.29.19
Hadoop on Cloud: Why and How?
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Hybrid is the New Normal
Modern Data Warehouse Fundamentals Part 2
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
A deep dive into running data analytic workloads in the cloud
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud Data Warehousing with Cloudera Altus 7.24.18
Data Science and Machine Learning for the Enterprise
Leveraging the Cloud for Big Data Analytics 12.11.18
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
PPTX
Applying Noisy Knowledge Graphs to Real Problems
Data Science Crash Course
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Applying Noisy Knowledge Graphs to Real Problems

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf

Data Protection in Hybrid Enterprise Data Lake Environment

  • 1. © Cloudera, Inc. All rights reserved. DATA PROTECTION IN HYBRID DATA LAKE ENVIRONMENT Murali Ramasami, Staff Software Engineer
  • 2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved. AGENDA Hybrid Data Lake Environment Data Lifecycle Manager Replicate data HDFS <-> Cloud Replicate data Hive <-> Cloud Demo
  • 3. © Cloudera, Inc. All rights reserved. 3© Cloudera, Inc. All rights reserved. Digital content is doubling every 18 months Structured Data - Database - Data Warehouse - ERPs - CRMs Unstructured Data - Web blogs - Social media - Audio, Video - Software file-systems Source: Frost & Sullivan - World’s Top Global Mega Trends To 2025 and Implications to Business, Society and Cultures 1 2 3 BIG DATA TREND 90% of the data in the world has been created in the last two years alone
  • 4. © Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved. HYBRID ENTERPRISE DATA LAKE ENVIRONMENT ON- PREM Clusters CLOUD Dataplane Service HIVE SPARK YARN HDFS ATLAS, RANGER, HIVE METASTORE. KNOX AMBARI • LARGE • SHARED WORKLOADS • MULTI-TENANT CLUSTERS • SHARED SERVICES • MULTI-CLUSTER • MULTI-CLOUD Cloudbreak ATLAS, RANGER, HIVE, KNOX PUBLIC CLOUD COMPUTE PUBLIC CLOUD STORAGE HIVE LLAP AMBARI SPARK AMBARI
  • 5. © Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved. Cloudera DataPlane is a portfolio of data solutions that enable the enterprise to manage & discover data across hybrid environments. CLOUDERA DATAPLANE CLOUDERA DATAPLANE SERVICE MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID DATA AT REST Cloudera Data Platform DATA IN MOTION Cloudera Data Flow
  • 6. © Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved. DATAPLANE: AT A GLANCE Core Capabilities Data Lakes, Clusters Platform Services Extensibility, Metering DP PLATFORM CDP CDF
  • 7. © Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved. DATAPLANE APPLICATIONS
  • 8. © Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved. DATA LIFECYCLE MANAGER (DLM) SERVICE A Portfolio of Service • Replication/failback to another cloud/on-prem site for Disaster Recovery • Backup & Restore of business critical data, for protection against accidental deletion • Auto Tiering of hot/warm/cold data to cloud object storage/on-prem for TCO reduction. Production Site Disaster Recovery Site Offsite Replication Failback Sunday Monday Tuesday Wednesda y Thursday Friday Saturday Sunday Full Backup Cumulative incremental backup Accidental Deletion Solid State Drive Hard Drive Archive Access to Data 0days 30days 90days Forever ProbabilityofReuse Time 100% 0% Disaster Recovery Backup & Restore Auto Tiering S3
  • 9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved. DLM FEATURES • Incremental Hive replication & Hive metadata • HDFS snapshot based replication between HDP clusters • Ranger policy replication • Atlas tag/lineage replication • Cloud storage replication (AWS, Azure, GCP) • Active/standby behavior on DR site using Ranger • TDE & TLS support, Support multiple keys/KMS
  • 10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved. DLM UI DLM App Dataplane Knox DLM Engine Yarn RangerHive HDFS Knox DLM Engine RangerHive HDFS Source Cluster Destination Cluster Atlas Atlas DLM ARCHITECTURE
  • 11. © Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved. DLM ENGINE INTERNALS • DLM engine is stateless server, state in external DB. • Schedules replication jobs using Quartz. • Job manager handles job concurrency, failures, retries and recovery. • Metric collection for job and data transfer • Specific plugins handles replication of the specific data from source service to target service. Knox Scheduler Job manager Data Store HDFS Hive Ranger REST API DLM Engine Plugin Manager
  • 12. © Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved. REPLICATING DATA ON-PREM <-> CLOUD
  • 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. HDFS - CLOUD STORAGE REPLICATION • Replication to/from cloud storage directly, no cluster required on target • Supported on AWS S3, Azure ADLS, GCP GCS • Data is pushed/pulled from on-prem clusters using Hadoop Connectors for the filesystem • Supports cloud native encryptions • File ACLs, Ranger Policies and Atlas Metadata are copied in meta files for restore Knox DLM Engine Distcp (Yarn) Ranger HDFS On-prem Cluster Cloud Storage Replication Policy
  • 14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved. REPLICATION—SECURITY POLICIES & GOVERNANCE METADATA • Copies resource and descendent policies for the directories/DBs replicated • Copies row filters and column masking rules • Adds deny policy on target • Atlas metadata, tags associated for the directories/DBs created will also get replicate on to the target
  • 15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved. DEMO
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. HIVE ON-PREM TO CLOUD REPLICATION • Minimal cluster on the cloud - shared services cluster • Push based replication to avoid opening data ports • Uses cloud SDK for secure data transfer • Supports cloud storage encryption • Uses Hadoop connector to transfer data Knox DLM Engine Distcp (Yarn) Ran ger HDFS On-prem Cluster Replication Policy HS2 Cloud Storage Knox DLM Engine Ran ger Cloud Cluster HMS
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved. EVENT BASED HIVE REPLICATION ( CLOUD ) Master Cluster Cloud Metastore - RDBMS Event Table Metastore - RDBMS Cloud Storage HDFS HiveServer2 Distcp REPL DUMP REPL LOAD Serialize New Events Dump ( Data + Metadata )
  • 18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved. DEMO
  • 19. © Cloudera, Inc. All rights reserved. THANK YOU