© Cloudera, Inc. All rights reserved.
HBase Replication
Wellington Chevreuil
© Cloudera, Inc. All rights reserved.
Overview
● Replication Basics
● Requirements
● HBase Shell Commands
● Implementation Details
● Monitoring
● Extra Tools
● Hands-on labs
© Cloudera, Inc. All rights reserved.
Replication Basics
● Source-push strategy
● Master, Source, Originator - means the cluster sending data.
● Slave, Destination, Target - means cluster receiving data.
● Can be cyclic and allows for multiple masters and slaves
○ A master can have multiple slaves
○ A slave can have multiple masters
○ A cluster can perform both master/slave roles on a given topology
● Eventual consistency
● Asynchronous
● Configurable at column family level
● Relies on WAL data
○ Any changes that bypass WAL won't be replicated, such as bulk load, truncate command, or if skip wal
has been enabled.
● Tracked via ZooKeeper
● Work done by RegionServers
● Adds a source cluster ID to edit's metadata
© Cloudera, Inc. All rights reserved.
Requirements
● All RegionServers must be accessible from all RegionServers from each cluster
● Zookeeper Quorum from slaves must be accessible by masters
● Table structure must be the same in master and slave clusters
○ The column family target for replication must match on master/slave clusters
● If same Zookeeper Quorum is used for master/slave clusters,
zookeeper.znode.parent must be different
● Clusters can have varying sizes
● Clusters can have pre-existing data on target tables
○ In this case, only data added on master after replication has been enabled will be replicated
© Cloudera, Inc. All rights reserved.
HBase Shell Commands
● add_peer
○ Sets a new slave to the current cluster.
● list_peers
○ Shows current list of slaves "known" by this cluster.
● disable_peer
○ Pause replication, but stays tracking new edits to be replicated.
● enable_peer
○ Resumes replication. All edits added since disable_peer execution will now be sent to related
slaves.
● remove_peer
○ Disables replication for the given slave.
○ No edits will be sent to the slave.
© Cloudera, Inc. All rights reserved.
HBase Shell Commands
● enable_table_replication
○ Sets replication flag as true on all column families from specified table.
● disable_table_replication
○ The opposite from the above.
● append_peer_tableCFs, remove_peer_tableCFs, set_peer_tableCFs,
show_peer_tableCFs, update_peer_config, get_peer_config, list_peer_configs,
list_replicated_tables.
○ General admin commands that allow for changing/monitoring configuration of tables currently
targeted for replication
© Cloudera, Inc. All rights reserved.
Implementation Details - Deployment Overview
● This is a deployment diagram
in the context of replication
only, so only major replication
flow relevant components are
highlighted.
● Note no presence of HMasters
either on master (source) or
slave (destination) clusters.
● Zookeeper is of vital
importance, as it keeps the
registry of edits to be
replicated, as well as peers to
replicate to.
● RSes on Master cluster depend
on ZK from Slave cluster.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup/Maintenance commands
● Shell commands interact directly with Zookeeper.
● Replication is kept on master cluster's Zookeeper znodes.
● No interaction within RSes when replication shell commands are ran.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● RS init phase where
replication service classes are
created.
● Once replication related
classes are properly
initialized, Replication
instance is added to the list
of WALActionListener.
● WALFactory instance is
created, with the list of
listeners containing
Replication instance.
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Replication related classes are only initialised if "hbase.replication" is set to true.
● This will happen between the following log messages from RS startup logs:
● Replication Source/Sink implementation default: org.apache.hadoop.hbase.replication.regionserver.Replication
○ This is configurable by hbase.replication.source.service and hbase.replication.source.service
INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=...
INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer
cluster=remote_peer_host:2181:/hbase
INFO org.apache.hadoop.hbase.wal.WALFactory: Instantiating WALProvider of type class
org.apache.hadoop.hbase.wal.BoundedRegionGroupingProvider
Watch out for possible customer
specific configurations
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● During WAL related classes creation, WAL file is rolled.
● Replication was added as a WAL listener before, so ReplicationSourceManager will be
notified about log roll.
● Using Zookeeper, ReplicationSourceManager adds the new WAL file to the queue of
logs (this will be under replication znodes).
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Over WAL file rolling, no replication specific log message is recorded.
● ReplicationSourceManager code will be notified about new WAL file creation
between below messages:
INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: WAL configuration: blocksize=128 MB, ...
...
INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/…
….
© Cloudera, Inc. All rights reserved.
Implementation Details - Setup WAL and Replication
● Potential errors involving replication on this phase will be mostly related to znodes
access, preventing ZK queue from being initialized:
ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
java.io.IOException: Failed replication handler create
at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:130)
at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:2662)
at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:2632)
at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1647)
at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1388)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:918)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.replication.ReplicationException: Could not initialize replication queues.
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.init(ReplicationQueuesZKImpl.java:85)
at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:122)
... 6 more
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
● From HRegionServer.startServiceThreads method, replication source and sink
threads are set and started.
● ReplicationSourceManager initialization involves several steps, to be detailed next.
● ReplicationSink instance will be used to perform the actual sink if the cluster act as a
destination cluster. To be detailed later.
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
© Cloudera, Inc. All rights reserved.
Implementation Details - Start Replication Thread
● Once ReplicationSourceManager.addSource completed properly for each peer,
following message would be seen:
● Upon startup, ReplicationSource.run method will also log below message:
● Since this is asynchronously, it may occur before or after the previous message.
● It should be logged for each peer id.
INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Current list of replicators:
[host-1,60020,1510938412878, host1,60020,1510929825829] other RSs: [host-1,60020,1510938412878]
…
INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating
9fa10771-97b2-48ed-b635-b0bd474a99b2 -> 5f54f936-a5f8-4726-9d09-7bf1c709eeab
© Cloudera, Inc. All rights reserved.
Implementation Details - New Peers
● ReplicationTrackerZKImpl receives notification about changes on replication znodes.
● New peer addition triggers peer list update on ReplicationPeersZKImpl.
● With at least one peer, ReplicationQueuesZKImpl will get notified about WAL file
creation.
INFO org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl: /hbase/replication/peers znode expired, triggering
peerListChanged event
...
INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=peer-host:2181:/hbase
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits
● Main work done by ReplicationSourceWorkerThread instances.
○ One per WAL group
○ Every WAL group has its own queue of WAL files to be processed.
○ Runs in the background indefinitely. Will sleep for replication.source.sleepforretries if peer is
disabled.
○ On each loop iteration:
■ Reads current WAL being written.
■ Apply editlog filters (get only edits for CFs marked for replication, whose cluster origin ID is not same as peer).
■ For editlogs filtered, connect to a RS on the remote cluster and send those (via RPC).
■ Edits must be read (and processed) sequentially. If shipment fails, replication will not progress for that WAL
group, and lags may be seen
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
● HBaseInterClusterReplicationEndpoint.replicate() method detailed flow
● Uses its own thread pool for performing RPC calls
● Replicator class implements java.util.concurrent.Callable for async execution.
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Source Side)
● Replicator uses SinkPeer to discover remote RS responsible to run the sink.
● ReplicationProtbufUtil is used for convert request to protobuff and perform RPC.
© Cloudera, Inc. All rights reserved.
Implementation Details - Shipping Edits (Destination Side)
● ReplicationSink uses default client API to process put/delete operations.
● Not necessarily the RS running the sink is the same for the regions where entries will
be placed.
● Coprocessors may get invoked.
© Cloudera, Inc. All rights reserved.
Monitoring
● Some classes provide additional TRACE/DEBUG messages that can be turned on for
further troubleshooting.
● Worth enable it using RS UI for specific classes only, instead of turn TRACE to whole
HBase service:
○ ReplicationSource, HBaseReplicationEndpoint, HBaseInterClusterReplicationEndpoint,
● JMX Metrics might also help get a state of replication:
○ shippedBatches, AgeOfLastShippedOP, logReadInBytes.
■ Global and per WAL group id.
● ReplicationStatisticsThread also logs replication stats every 5 minutes:
IINFO org.apache.hadoop.hbase.replication.regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2, current progress:
walGroup [host-1%2C60020%2C1511034265841.null0]: currently replicating from:
hdfs://nameservice1/hbase/WALs/host-1,60020,1511034265841/host-1%2C60020%2C1511034265841.null0.1511196279542 at position: 83
© Cloudera, Inc. All rights reserved.
Monitoring
● HBase shell status 'replication' command:
○ On source cluster:
○ On destination cluster:
1 live servers
Host-10-17-101-41.coe.cloudera.com:
SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Mon Nov 20 10:02:05 PST 2017, Replication Lag=0
SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Sat Nov 18 11:49:29 PST 2017
1 live servers
Host-10-17-103-206.coe.cloudera.com:
SOURCE:
SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Nov 20 08:40:19 PST 2017
© Cloudera, Inc. All rights reserved.
Monitoring
● VerifyReplication
○ MR job that compares the records for the table in source and destination cluster.
○ Prints counter within its findings:
1 test-1
...
17/11/20 10:43:12 INFO mapreduce.Job: map 0% reduce 0%
17/11/20 10:43:18 INFO mapreduce.Job: map 33% reduce 0%
17/11/20 10:43:19 INFO mapreduce.Job: map 67% reduce 0%
17/11/20 10:43:23 INFO mapreduce.Job: map 100% reduce 0%
17/11/20 10:43:24 INFO mapreduce.Job: Job job_1506585949780_0005 completed successfully
…
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
BADROWS=25
GOODROWS=11
ONLY_IN_SOURCE_TABLE_ROWS=25
...
© Cloudera, Inc. All rights reserved.
Monitoring
● DumpReplicationQueues
hbase org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues --distributed
...
Dumping replication peers and configurations:
Peer: 2
State: ENABLED
Cluster Name:
clusterKey=host-10-17-103-187.coe.cloudera.com,host-10-17-103-189.coe.cloudera.com,host-10-17-103-193.coe.cloudera.com:2181:/hbase,replicationEndpoint
Impl=null
Peer Table CFs: null
…
Dumping replication queue info for RegionServer: [host-10-17-101-41.coe.cloudera.com,60020,1511971261591]
replication queue: 1
Replication position for host-10-17-101-41.coe.cloudera.com%2C60020%2C1511971261591.null0.1512140473468: 13227
...
© Cloudera, Inc. All rights reserved.
Extra Tools
● In case data is already available on either source/destination cluster tables, some
tools can be used to sync data:
○ CopyTable
■ https://hbase.apache.org/book.html#copy.table
○ Export Snapshots
■ https://hbase.apache.org/book.html#ops.snapshots.export
○ Bulk Load
■ https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
○ HashTable/SyncTable
■ Now documented here.
■ Best option, can be used even after replication is already enabled.
■ Allows for syncing deleted rows.
■ Only available from CDH 5.9.0 onwards
© Cloudera, Inc. All rights reserved.
Extra Tools
● HashTable/SyncTable:
○ Two MR jobs
■ org.apache.hadoop.hbase.mapreduce.HashTable
■ org.apache.hadoop.hbase.mapreduce.SyncTable
○ Usage:
■ First, run HashTable MR job on the cluster whose state should be propagated to the remote peer. For example, if
we want to sync table "test-1" state on destination cluster with state from source cluster, run below at source:
● Where first param is the table name, and second param is an hdfs path where HashTable job should
output table's summary
$ hbase org.apache.hadoop.hbase.mapreduce.HashTable test-1 /tmp/test-1
© Cloudera, Inc. All rights reserved.
Extra Tools
● HashTable/SyncTable:
○ Usage
■ Once HashTable has finished on source cluster, run SyncTable on destination cluster:
■ First and second params are the ZK address and NN address of source cluster, respectively
■ Last two params are the table names on source and destination cluster
■ This command would cause the table data on destination cluster to be in sync with the source
cluster
● If source cluster had more rows prior to the command, these additional rows would be
copied to destination.
● If destination cluster had more rows then source, these rows would be deleted from
destination.
$ hbase org.apache.hadoop.hbase.mapreduce.SyncTable --sourcezkcluster=source_zk:2181:/hbase hdfs://source_nn:8020/tmp/test-1 test-1 test-1
© Cloudera, Inc. All rights reserved.
Labs Exercises
1. Problem 1: Replication related znodes not readable by RSes
2. Problem 2: Remote cluster not reachable by source cluster
3. Problem 3: Remote cluster is reachable, but sinks are not completing

More Related Content

PDF
HBase Advanced - Lars George
PPT
HBaseCon 2013: Apache HBase Replication
PPTX
Apache HBase Performance Tuning
PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PPTX
Performance Optimizations in Apache Impala
PDF
HBase Storage Internals
PDF
Facebook Messages & HBase
PPTX
The Impala Cookbook
HBase Advanced - Lars George
HBaseCon 2013: Apache HBase Replication
Apache HBase Performance Tuning
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Performance Optimizations in Apache Impala
HBase Storage Internals
Facebook Messages & HBase
The Impala Cookbook

What's hot (20)

PPTX
Apache Phoenix + Apache HBase
PDF
The State of HBase Replication
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PPTX
HBase in Practice
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PDF
Apache HBase Improvements and Practices at Xiaomi
PPTX
HBaseCon 2013: Apache HBase Table Snapshots
PPTX
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
PPTX
HBase Low Latency
PPTX
Tuning Apache Phoenix/HBase
PDF
What's New in Apache Hive
PPTX
Apache hive
PPT
Hive User Meeting August 2009 Facebook
PPTX
Kudu Deep-Dive
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
PPTX
Apache Kudu: Technical Deep Dive


PPTX
Scaling HBase for Big Data
PPTX
Apache hive introduction
PPT
Hadoop Security Architecture
Apache Phoenix + Apache HBase
The State of HBase Replication
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
HBase in Practice
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Apache HBase Improvements and Practices at Xiaomi
HBaseCon 2013: Apache HBase Table Snapshots
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBase Low Latency
Tuning Apache Phoenix/HBase
What's New in Apache Hive
Apache hive
Hive User Meeting August 2009 Facebook
Kudu Deep-Dive
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Apache Kudu: Technical Deep Dive


Scaling HBase for Big Data
Apache hive introduction
Hadoop Security Architecture
Ad

Similar to HBase replication (20)

PDF
Highly Available Load Balanced Galera MySql Cluster
PPTX
Hbase 89 fb online configuration
PPTX
How Yelp does Service Discovery
PDF
What’s new in Galera 4
PDF
Galera Cluster 4 presentation at Percona Live Austin 2019
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
PDF
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
PDF
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
PDF
M|18 Under the Hood: Galera Cluster
PPT
Galera webinar migration to galera cluster from my sql async replication
PDF
HBase tales from the trenches
PDF
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
PPTX
MySqL Failover by Weatherly Cloud Computing USA
PPTX
My sql failover test using orchestrator
PDF
Scale Apache with Nginx
PDF
Openstack HA
PPT
Deploy Rails Application by Capistrano
PDF
Multi Source Replication With MySQL 5.7 @ Verisure
PDF
03 h base-2-installation_andshell
Highly Available Load Balanced Galera MySql Cluster
Hbase 89 fb online configuration
How Yelp does Service Discovery
What’s new in Galera 4
Galera Cluster 4 presentation at Percona Live Austin 2019
HBase Tales From the Trenches - Short stories about most common HBase operati...
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
M|18 Under the Hood: Galera Cluster
Galera webinar migration to galera cluster from my sql async replication
HBase tales from the trenches
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
MySqL Failover by Weatherly Cloud Computing USA
My sql failover test using orchestrator
Scale Apache with Nginx
Openstack HA
Deploy Rails Application by Capistrano
Multi Source Replication With MySQL 5.7 @ Verisure
03 h base-2-installation_andshell
Ad

More from wchevreuil (9)

PDF
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
PDF
HBase System Tables / Metadata Info
PDF
HDFS client write/read implementation details
PDF
HBase RITs
PPTX
Hbasecon2019 hbck2 (1)
PDF
Web hdfs and httpfs
PPT
Hadoop tuning
PPT
I nd t_bigdata(1)
PDF
Hadoop - TDC 2012
Cloudera Enabling Native Integration of NoSQL HBase with Cloud Providers.pdf
HBase System Tables / Metadata Info
HDFS client write/read implementation details
HBase RITs
Hbasecon2019 hbck2 (1)
Web hdfs and httpfs
Hadoop tuning
I nd t_bigdata(1)
Hadoop - TDC 2012

Recently uploaded (20)

PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Types of Token_ From Utility to Security.pdf
PPTX
"Secure File Sharing Solutions on AWS".pptx
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
AI Guide for Business Growth - Arna Softech
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Time Tracking Features That Teams and Organizations Actually Need
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Microsoft Office 365 Crack Download Free
PDF
iTop VPN Crack Latest Version Full Key 2025
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Trending Python Topics for Data Visualization in 2025
Types of Token_ From Utility to Security.pdf
"Secure File Sharing Solutions on AWS".pptx
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
AI Guide for Business Growth - Arna Softech
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Tech Workshop Escape Room Tech Workshop
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
MCP Security Tutorial - Beginner to Advanced
Weekly report ppt - harsh dattuprasad patel.pptx
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Time Tracking Features That Teams and Organizations Actually Need
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Salesforce Agentforce AI Implementation.pdf
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Monitoring Stack: Grafana, Loki & Promtail
Microsoft Office 365 Crack Download Free
iTop VPN Crack Latest Version Full Key 2025

HBase replication

  • 1. © Cloudera, Inc. All rights reserved. HBase Replication Wellington Chevreuil
  • 2. © Cloudera, Inc. All rights reserved. Overview ● Replication Basics ● Requirements ● HBase Shell Commands ● Implementation Details ● Monitoring ● Extra Tools ● Hands-on labs
  • 3. © Cloudera, Inc. All rights reserved. Replication Basics ● Source-push strategy ● Master, Source, Originator - means the cluster sending data. ● Slave, Destination, Target - means cluster receiving data. ● Can be cyclic and allows for multiple masters and slaves ○ A master can have multiple slaves ○ A slave can have multiple masters ○ A cluster can perform both master/slave roles on a given topology ● Eventual consistency ● Asynchronous ● Configurable at column family level ● Relies on WAL data ○ Any changes that bypass WAL won't be replicated, such as bulk load, truncate command, or if skip wal has been enabled. ● Tracked via ZooKeeper ● Work done by RegionServers ● Adds a source cluster ID to edit's metadata
  • 4. © Cloudera, Inc. All rights reserved. Requirements ● All RegionServers must be accessible from all RegionServers from each cluster ● Zookeeper Quorum from slaves must be accessible by masters ● Table structure must be the same in master and slave clusters ○ The column family target for replication must match on master/slave clusters ● If same Zookeeper Quorum is used for master/slave clusters, zookeeper.znode.parent must be different ● Clusters can have varying sizes ● Clusters can have pre-existing data on target tables ○ In this case, only data added on master after replication has been enabled will be replicated
  • 5. © Cloudera, Inc. All rights reserved. HBase Shell Commands ● add_peer ○ Sets a new slave to the current cluster. ● list_peers ○ Shows current list of slaves "known" by this cluster. ● disable_peer ○ Pause replication, but stays tracking new edits to be replicated. ● enable_peer ○ Resumes replication. All edits added since disable_peer execution will now be sent to related slaves. ● remove_peer ○ Disables replication for the given slave. ○ No edits will be sent to the slave.
  • 6. © Cloudera, Inc. All rights reserved. HBase Shell Commands ● enable_table_replication ○ Sets replication flag as true on all column families from specified table. ● disable_table_replication ○ The opposite from the above. ● append_peer_tableCFs, remove_peer_tableCFs, set_peer_tableCFs, show_peer_tableCFs, update_peer_config, get_peer_config, list_peer_configs, list_replicated_tables. ○ General admin commands that allow for changing/monitoring configuration of tables currently targeted for replication
  • 7. © Cloudera, Inc. All rights reserved. Implementation Details - Deployment Overview ● This is a deployment diagram in the context of replication only, so only major replication flow relevant components are highlighted. ● Note no presence of HMasters either on master (source) or slave (destination) clusters. ● Zookeeper is of vital importance, as it keeps the registry of edits to be replicated, as well as peers to replicate to. ● RSes on Master cluster depend on ZK from Slave cluster.
  • 8. © Cloudera, Inc. All rights reserved. Implementation Details - Setup/Maintenance commands ● Shell commands interact directly with Zookeeper. ● Replication is kept on master cluster's Zookeeper znodes. ● No interaction within RSes when replication shell commands are ran.
  • 9. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● RS init phase where replication service classes are created. ● Once replication related classes are properly initialized, Replication instance is added to the list of WALActionListener. ● WALFactory instance is created, with the list of listeners containing Replication instance.
  • 10. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Replication related classes are only initialised if "hbase.replication" is set to true. ● This will happen between the following log messages from RS startup logs: ● Replication Source/Sink implementation default: org.apache.hadoop.hbase.replication.regionserver.Replication ○ This is configurable by hbase.replication.source.service and hbase.replication.source.service INFO org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to master=... INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=remote_peer_host:2181:/hbase INFO org.apache.hadoop.hbase.wal.WALFactory: Instantiating WALProvider of type class org.apache.hadoop.hbase.wal.BoundedRegionGroupingProvider Watch out for possible customer specific configurations
  • 11. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication
  • 12. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● During WAL related classes creation, WAL file is rolled. ● Replication was added as a WAL listener before, so ReplicationSourceManager will be notified about log roll. ● Using Zookeeper, ReplicationSourceManager adds the new WAL file to the queue of logs (this will be under replication znodes).
  • 13. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Over WAL file rolling, no replication specific log message is recorded. ● ReplicationSourceManager code will be notified about new WAL file creation between below messages: INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: WAL configuration: blocksize=128 MB, ... ... INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: New WAL /hbase/WALs/… ….
  • 14. © Cloudera, Inc. All rights reserved. Implementation Details - Setup WAL and Replication ● Potential errors involving replication on this phase will be mostly related to znodes access, preventing ZK queue from being initialized: ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init java.io.IOException: Failed replication handler create at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:130) at org.apache.hadoop.hbase.regionserver.HRegionServer.newReplicationInstance(HRegionServer.java:2662) at org.apache.hadoop.hbase.regionserver.HRegionServer.createNewReplicationInstance(HRegionServer.java:2632) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1647) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1388) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:918) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.replication.ReplicationException: Could not initialize replication queues. at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.init(ReplicationQueuesZKImpl.java:85) at org.apache.hadoop.hbase.replication.regionserver.Replication.initialize(Replication.java:122) ... 6 more
  • 15. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread ● From HRegionServer.startServiceThreads method, replication source and sink threads are set and started. ● ReplicationSourceManager initialization involves several steps, to be detailed next. ● ReplicationSink instance will be used to perform the actual sink if the cluster act as a destination cluster. To be detailed later.
  • 16. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread
  • 17. © Cloudera, Inc. All rights reserved. Implementation Details - Start Replication Thread ● Once ReplicationSourceManager.addSource completed properly for each peer, following message would be seen: ● Upon startup, ReplicationSource.run method will also log below message: ● Since this is asynchronously, it may occur before or after the previous message. ● It should be logged for each peer id. INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Current list of replicators: [host-1,60020,1510938412878, host1,60020,1510929825829] other RSs: [host-1,60020,1510938412878] … INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 9fa10771-97b2-48ed-b635-b0bd474a99b2 -> 5f54f936-a5f8-4726-9d09-7bf1c709eeab
  • 18. © Cloudera, Inc. All rights reserved. Implementation Details - New Peers ● ReplicationTrackerZKImpl receives notification about changes on replication znodes. ● New peer addition triggers peer list update on ReplicationPeersZKImpl. ● With at least one peer, ReplicationQueuesZKImpl will get notified about WAL file creation. INFO org.apache.hadoop.hbase.replication.ReplicationTrackerZKImpl: /hbase/replication/peers znode expired, triggering peerListChanged event ... INFO org.apache.hadoop.hbase.replication.ReplicationPeersZKImpl: Added new peer cluster=peer-host:2181:/hbase
  • 19. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits ● Main work done by ReplicationSourceWorkerThread instances. ○ One per WAL group ○ Every WAL group has its own queue of WAL files to be processed. ○ Runs in the background indefinitely. Will sleep for replication.source.sleepforretries if peer is disabled. ○ On each loop iteration: ■ Reads current WAL being written. ■ Apply editlog filters (get only edits for CFs marked for replication, whose cluster origin ID is not same as peer). ■ For editlogs filtered, connect to a RS on the remote cluster and send those (via RPC). ■ Edits must be read (and processed) sequentially. If shipment fails, replication will not progress for that WAL group, and lags may be seen
  • 20. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side)
  • 21. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side) ● HBaseInterClusterReplicationEndpoint.replicate() method detailed flow ● Uses its own thread pool for performing RPC calls ● Replicator class implements java.util.concurrent.Callable for async execution.
  • 22. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Source Side) ● Replicator uses SinkPeer to discover remote RS responsible to run the sink. ● ReplicationProtbufUtil is used for convert request to protobuff and perform RPC.
  • 23. © Cloudera, Inc. All rights reserved. Implementation Details - Shipping Edits (Destination Side) ● ReplicationSink uses default client API to process put/delete operations. ● Not necessarily the RS running the sink is the same for the regions where entries will be placed. ● Coprocessors may get invoked.
  • 24. © Cloudera, Inc. All rights reserved. Monitoring ● Some classes provide additional TRACE/DEBUG messages that can be turned on for further troubleshooting. ● Worth enable it using RS UI for specific classes only, instead of turn TRACE to whole HBase service: ○ ReplicationSource, HBaseReplicationEndpoint, HBaseInterClusterReplicationEndpoint, ● JMX Metrics might also help get a state of replication: ○ shippedBatches, AgeOfLastShippedOP, logReadInBytes. ■ Global and per WAL group id. ● ReplicationStatisticsThread also logs replication stats every 5 minutes: IINFO org.apache.hadoop.hbase.replication.regionserver.Replication: Normal source for cluster 1: Total replicated edits: 2, current progress: walGroup [host-1%2C60020%2C1511034265841.null0]: currently replicating from: hdfs://nameservice1/hbase/WALs/host-1,60020,1511034265841/host-1%2C60020%2C1511034265841.null0.1511196279542 at position: 83
  • 25. © Cloudera, Inc. All rights reserved. Monitoring ● HBase shell status 'replication' command: ○ On source cluster: ○ On destination cluster: 1 live servers Host-10-17-101-41.coe.cloudera.com: SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, TimeStampsOfLastShippedOp=Mon Nov 20 10:02:05 PST 2017, Replication Lag=0 SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Sat Nov 18 11:49:29 PST 2017 1 live servers Host-10-17-103-206.coe.cloudera.com: SOURCE: SINK : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Mon Nov 20 08:40:19 PST 2017
  • 26. © Cloudera, Inc. All rights reserved. Monitoring ● VerifyReplication ○ MR job that compares the records for the table in source and destination cluster. ○ Prints counter within its findings: 1 test-1 ... 17/11/20 10:43:12 INFO mapreduce.Job: map 0% reduce 0% 17/11/20 10:43:18 INFO mapreduce.Job: map 33% reduce 0% 17/11/20 10:43:19 INFO mapreduce.Job: map 67% reduce 0% 17/11/20 10:43:23 INFO mapreduce.Job: map 100% reduce 0% 17/11/20 10:43:24 INFO mapreduce.Job: Job job_1506585949780_0005 completed successfully … org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters BADROWS=25 GOODROWS=11 ONLY_IN_SOURCE_TABLE_ROWS=25 ...
  • 27. © Cloudera, Inc. All rights reserved. Monitoring ● DumpReplicationQueues hbase org.apache.hadoop.hbase.replication.regionserver.DumpReplicationQueues --distributed ... Dumping replication peers and configurations: Peer: 2 State: ENABLED Cluster Name: clusterKey=host-10-17-103-187.coe.cloudera.com,host-10-17-103-189.coe.cloudera.com,host-10-17-103-193.coe.cloudera.com:2181:/hbase,replicationEndpoint Impl=null Peer Table CFs: null … Dumping replication queue info for RegionServer: [host-10-17-101-41.coe.cloudera.com,60020,1511971261591] replication queue: 1 Replication position for host-10-17-101-41.coe.cloudera.com%2C60020%2C1511971261591.null0.1512140473468: 13227 ...
  • 28. © Cloudera, Inc. All rights reserved. Extra Tools ● In case data is already available on either source/destination cluster tables, some tools can be used to sync data: ○ CopyTable ■ https://hbase.apache.org/book.html#copy.table ○ Export Snapshots ■ https://hbase.apache.org/book.html#ops.snapshots.export ○ Bulk Load ■ https://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ ○ HashTable/SyncTable ■ Now documented here. ■ Best option, can be used even after replication is already enabled. ■ Allows for syncing deleted rows. ■ Only available from CDH 5.9.0 onwards
  • 29. © Cloudera, Inc. All rights reserved. Extra Tools ● HashTable/SyncTable: ○ Two MR jobs ■ org.apache.hadoop.hbase.mapreduce.HashTable ■ org.apache.hadoop.hbase.mapreduce.SyncTable ○ Usage: ■ First, run HashTable MR job on the cluster whose state should be propagated to the remote peer. For example, if we want to sync table "test-1" state on destination cluster with state from source cluster, run below at source: ● Where first param is the table name, and second param is an hdfs path where HashTable job should output table's summary $ hbase org.apache.hadoop.hbase.mapreduce.HashTable test-1 /tmp/test-1
  • 30. © Cloudera, Inc. All rights reserved. Extra Tools ● HashTable/SyncTable: ○ Usage ■ Once HashTable has finished on source cluster, run SyncTable on destination cluster: ■ First and second params are the ZK address and NN address of source cluster, respectively ■ Last two params are the table names on source and destination cluster ■ This command would cause the table data on destination cluster to be in sync with the source cluster ● If source cluster had more rows prior to the command, these additional rows would be copied to destination. ● If destination cluster had more rows then source, these rows would be deleted from destination. $ hbase org.apache.hadoop.hbase.mapreduce.SyncTable --sourcezkcluster=source_zk:2181:/hbase hdfs://source_nn:8020/tmp/test-1 test-1 test-1
  • 31. © Cloudera, Inc. All rights reserved. Labs Exercises 1. Problem 1: Replication related znodes not readable by RSes 2. Problem 2: Remote cluster not reachable by source cluster 3. Problem 3: Remote cluster is reachable, but sinks are not completing