Disaster Recovery and Cloud Migration for your Apache Hive Warehouse

1 © Hortonworks Inc. Confidential 2011 – 2017. All Rights Reserved
DISASTER RECOVERY AND CLOUD
MIGRATION FOR YOUR APACHE
HIVE WAREHOUSE
Sankar Hariappan
Senior Software Engineer, Hortonworks

 About Apache Hive
 Disaster Recovery
 Replication Modes
 Fail Over
 Fail Back
 Replication at Hive-Scale
 Event Based Replication
 Change Management
 Bootstrapping
 REPL Commands
 Demonstration
 Cloud Migration Challenges
 Future Work
Agenda

About Apache Hive
 Data warehouse tool built on top of Apache Hadoop.
 Handle data warehousing tasks such as extract/transform/load (ETL), reporting, and
data analysis.
 Manage large datasets residing in distributed storage.
 SQL with Hive specific extensions.
 Query optimization powered by Apache Calcite and execution via Apache Tez, Apache
Spark, or MapReduce.
 Access to files stored either directly in Apache HDFS or in other data storage systems
such as Apache HBase.
 Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Apache Hive Architecture
Hive Thrift
Server
JDBC/ODBC
Driver
Compiler Optimizer Executor
HiveServer2
Hive
Metastore
HDFS
YARN
MS
Client RDBMS

Disaster Recovery
 Deployment of clusters in more than one data center for business continuity or geo
localization.
 Hybrid cloud deployment for off-premise processing.
 Robust replication solution to achieve seamless disaster recovery.
– Prevent severe data loss.
– Eliminate single point of failure.
– Fault-tolerant.

Replication Modes
 Master-Slave
Master Slave
Unidirectional
Read ReadWrite
 Master-Master
Master
Bidirectional
Read Write
Master
Read Write

Replication Modes
 Hub and Spoke pattern
Master
Slave
Read
Read
Write
Slave
Read
Slave
Read
Slave
Read
 Relay pattern
Master Slave
Read ReadWrite
Slave
Read

Fail Over
 Slave take over the Master
responsibilities instantaneously.
 Ensure business continuity with minimal
data loss based on Recovery Point
Objective (RPO).
 Almost zero down-time.
Master Slave
Unidirectional
Read Write
Fail over
Read Write

Fail Back
 Slave cluster usually have minimal
processing capabilities which makes Fail
Back an important requirement.
 Original Master comes alive with latest
data.
 Ensure removal of stale data which was
not replicated to the Slave.
 Reverse replicate the delta of data
loaded into the Slave after Fail Over.
Master Slave
Unidirectional
Read ReadWrite
Fail back

Replication at Hive-Scale
 Event based replication.
 First version of Hive Replication (Replv1) uses EXPORT-IMPORT semantics to replicate
data.
– Inefficient mechanism.
– 4X copy problem.
– Rubber-banding issue.
– Depends on external tools such as Falcon/Oozie to manage replication state.
 Second version of Hive Replication (Replv2) uses REPL commands.
– Point-in time replication.
– Reduce number of copies.
– Hive maintains the replicated state.
– Additional support for functions, constraint replication.

Event Logging
HiveServer2
Hive
Metastore
Metastore
RDBMS
Events Table
JDBC/ODBC
Runs Query Manage Metadata

Event Logging
 Capture event: Create/Alter/Drop on DB/Table/Partition/Function/Constraint objects.
 Stored in Metastore RDBMS.
 Event is self-contained to recover the state of the object (metadata + data).
 Events are serialized using sequence number (event id).

Event Based Replication
Metastore
RDBMS
Events Table
HDFS
Serialize new events
batch
Master Cluster
Slave Cluster
HiveServer2
Dump
(metadata + data)
HDFS
Meatastore
RDBMS
HiveServer2
DistcpMetastore API to
write objects
Data files
copy
Read repl
dump dir
REPL DUMP
REPL LOAD

Event Based Replication
 Read batch of events from the Metastore RDBMS in the generated sequence.
 "repl dump <db name> from <event id> "
– get events newer than <event id>.
– includes data files information.
– "<event id>" is last replicated event id for DB from the destination cluster
 "repl load <db name> from <hdfs URI>"
– apply the events on destination
 State replicated in batches currently, can be optimized in future

Change Management
 Replicating the following batch –
– Insert to table
– Drop table
 Need inserted files after drop for replication
 Trash like directory for capturing such files (CM dir)
 Use checksum to verify file, else lookup from CM dir using checksum
 Necessary for ordered replication - State in destination DB would correspond to state in
source X duration back.

Bootstrapping
 What about data generated before event capturing was enabled?
 Bootstrapping - Uses same repl dump/load commands, but is not event based
 Incremental replication catches up with events during bootstrap to make change
consistent with state of source at time X in past.
 Optimized for large database.
 Parallel dump of large number of partitions.

REPL Commands
 REPL DUMP <db-name> [FROM <start-evid> [TO <end-evid>] [LIMIT <num-evids>] ];
– Execute this command in source cluster.
– REPL DUMP <db-name>; bootstrap the whole database.
– REPL DUMP <db-name> FROM <start-evid>; to replicate all events after start-evid.
– REPL DUMP <db-name> FROM <start-evid> TO <end-evid>; to replicate a range of events.
– REPL DUMP <db-name> FROM <start-evid> LIMIT <num-evids>; to replicate a limited set of events.
 REPL LOAD <db-name> FROM <dump-dir>;
– dump-dir is the HDFS URI returned by REPL DUMP command.
– Execute this command in destination cluster.
 REPL STATUS <db-name>;
– Execute this command in destination cluster.
– Gets the last replicated state of the database in destination which should be the input for REPL
DUMP as start-evid.

Demonstration

Cloud Migration Challenges
 Move is expensive
– Cloud file systems has implemented “move” as “copy”.
– Repl load does atomic move/rename from temp directory to warehouse location.
– ACID and micro-managed tables can potentially help avoiding the move operation.
 On-Prem to Cloud
– Shall run distcp from on-prem cluster to avoid resource overhead on cloud.
– Need to depend on checksum of source files to verify the copied files.
 Cloud to Cloud
– Optimize distcp to use vendor specific tool to copy between cloud file systems.
– Checksum is not consistent/available on all filesystem.

Future Work
 Replicate ACID/Micro-managed tables.
 Replication to/from cloud storage such as S3 or WASB etc.
 Hot Data Replication.
 Faster Bootstrapping
 Optimize Fail Back.
 Replicate Column Statistics, Index etc
 Table level replication.

References: REPL Configurations
 hive.metastore.transactional.event.listeners =org.apache.hive.hcatalog.listener.DbNotificationListener
(Capture events)
 hive.repl.rootdir (Root directory used by repl dump)
 hive.metastore.dml.events=true (Enable event generation for DML operations)
 hive.repl.cm.enabled=true (Change Manager to be enabled in source cluster)
 hive.repl.cm.retain=24hr (Expiry time for CM backed-up data files)
 hive.repl.cm.interval=3600s (Time interval to validate expired data files in CM)
 hive.repl.cmrootdir (Root directory for Change Manager)
 hive.repl.replica.functions.root.dir (Root directory to store UDFs/UDAFs jars)
 hive.repl.approx.max.load.tasks=1000 (Limit the DAG size to control the memory consumption)
 hive.repl.partitions.dump.parallelism=5 (Number of threads to concurrently dump partitions)

References: Hive Doc
https://cwiki.apache.org/confluence/display/Hive/Home
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment
https://cwiki.apache.org/confluence/display/Hive/Replication
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport
https://issues.apache.org/jira/browse/HIVE-14841

THANK YOU!

Disaster Recovery and Cloud Migration for your Apache Hive Warehouse

More Related Content

What's hot (20)

Similar to Disaster Recovery and Cloud Migration for your Apache Hive Warehouse (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Disaster Recovery and Cloud Migration for your Apache Hive Warehouse