SlideShare a Scribd company logo
THE COMMUNITY EVENT FOR
APACHE HBASE™
BDS: A data synchronization
platform for HBase
熊嘉男(侧⽥田)
Ali-HBase 数据链路路负责⼈人
Requirement
• HBase support cross-version migration without downtime?
• HBase support data backup to OSS or other storage?
• HBase support replicate incremental data to MQ,ES,Solr?
• Replicate incremental data from RDS to HBase?
• HBase data can be archived to Spark cluster for offline analysis?
• HBase High Availability

…….
Challenges
• Table Structure transformation
• Real-time data replication
• Client double write
• HBase Replication
• Full data migration
• DataX
• CopyTable
• Create Snapshot & Export Snapshot
• Data consistency verification
HBase clusters Migration
• Cross-Version Migration
compatibility issues
• Impact on Business
• Lack of integrated solutions
Migration Step Defect
• Heterogeneous full data migration
• DataX
• Sqoop
• Heterogeneous Real-time Data Replication
• HBase Real-time Data export
• Custom Replication Endpoint
• Custom Replication Sink
Heterogeneous Data Transmission
BDS
&
&
• Master & Slave
• Stateless Slave
• Plugin-in mode
• Higher scalability and better performance
High-Level Architecture
Technical Detail
HBase full data migration
3
3
.
3
3
.
.
. 3
. 3
. 3
. 3
1 3
35
3 2 4 3
5
3 4 3
2
HBase full data migration
• Avoid the impact on business
• Only access HDFS
• Dynamic migration rate
• Decoupled from HBase
• One-click migration
• Create table automaticlly
• Perceive changes in region
• Perceive HFiles compaction
• Efficient
• 100MB/s (single node)
• Higher scalability
Data localization rate
DataNode1 DataNode2
HFile HFile
RegionServer
Region
Local	read remote	read
• Data migration takes the issue of
data localization rates into account
• Avoid low localization rate after data
migration
File split
HFile1
HFile2
HFile3
HFile4
HFile1
HFile2-1
HFile2-2
HFile3
HFile4
Split
Region1
Region2
Load • Migration will split HFiles
according to the partitions of the
original and target tables
• Increase the speed of bulkload
HBase Real-time Replication
&
&
Data pipeline
352 1162 5 3
4
• Using RingBuffer as a queue
• AckQueue maintains offset
• Write throughput support dynamic configuration
Impact on business
4
43
4
43
2 2
43
43
2
2
2 43 4 2 43 31
2 43 4 2 43 31
• Read and write affect data replication
HBASE Replication BDS
• Decoupled from HBase
• Only access HDFS
• Data Replication is not affected by HBase crash
Hotspot
4
43
4
43
2 2
43
43
2
2
2 43 4 2 43 31
2 43 4 2 43 31
2
3 2
1
1
1
HBASE Replication BDS
• Hotspot • Round robin scheduling
Replication backlog
2
3 2
1
1
1
BDS
• Add slave nodes
• Slave throughput support
dynamic configuration
增加Worker节点并发处理理⽇日志的数量量 增加AsyncWriter并发
Add Worker nodes
Operation and maintenance
•BDS
•Easy to expand
•Easy to upgrade
•monitor
•alarm mechanism
•HBase Replication
•Bug fix
•No alarm
•Configuration modification and
system upgrade requires RS to
restart
BDS in Ali-Cloud
Clusters Migration
High Availability
--
Data Backup
Archive data to Spark
1 0
RDS
About me
Thanks!

More Related Content

PDF
HBaseCon2017 Apache HBase at Didi
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
Escalando Foursquare basado en Checkins y Recomendaciones
PPTX
2015 deploying flash in the data center
PDF
Enterprise PostgreSQL - EDB's answer to conventional Databases
PDF
Thug feb 23 2015 Chen Zhang
PDF
Building Apps with Distributed In-Memory Computing Using Apache Geode
HBaseCon2017 Apache HBase at Didi
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
Escalando Foursquare basado en Checkins y Recomendaciones
2015 deploying flash in the data center
Enterprise PostgreSQL - EDB's answer to conventional Databases
Thug feb 23 2015 Chen Zhang
Building Apps with Distributed In-Memory Computing Using Apache Geode

What's hot (20)

PDF
Rails on HBase
PPTX
Chicago Data Summit: Geo-based Content Processing Using HBase
PDF
From 0 to syncing
PPTX
Innovation with Connection, The new HPCC Systems Plugins and Modules
PDF
2016 may-countdown-to-postgres-v96-parallel-query
PPTX
Installing Postgres on Linux
 
PPTX
Operationalizing Data Science Using Cloud Foundry
PPTX
Powering GIS Application with PostgreSQL and Postgres Plus
PDF
SAP OS/DB Migration using Azure Storage Account
PDF
Apachecon Europe 2012: Operating HBase - Things you need to know
PPTX
Trusted advisory on technology comparison --exadata, hana, db2
PPTX
Apache geode
PPTX
X-DB Replication Server and MMR
PDF
PostreSQL HA and DR Setup & Use Cases
PPTX
HBase: Where Online Meets Low Latency
PPTX
HBase Accelerated: In-Memory Flush and Compaction
PDF
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
PDF
DBaaS with EDB Postgres on AWS
 
PPTX
Managing storage on Prem and in Cloud
PPTX
Spark streaming with apache kafka
Rails on HBase
Chicago Data Summit: Geo-based Content Processing Using HBase
From 0 to syncing
Innovation with Connection, The new HPCC Systems Plugins and Modules
2016 may-countdown-to-postgres-v96-parallel-query
Installing Postgres on Linux
 
Operationalizing Data Science Using Cloud Foundry
Powering GIS Application with PostgreSQL and Postgres Plus
SAP OS/DB Migration using Azure Storage Account
Apachecon Europe 2012: Operating HBase - Things you need to know
Trusted advisory on technology comparison --exadata, hana, db2
Apache geode
X-DB Replication Server and MMR
PostreSQL HA and DR Setup & Use Cases
HBase: Where Online Meets Low Latency
HBase Accelerated: In-Memory Flush and Compaction
Training Slides: Basics 103: The Power of Tungsten Connector / Proxy
DBaaS with EDB Postgres on AWS
 
Managing storage on Prem and in Cloud
Spark streaming with apache kafka
Ad

Similar to hbaseconasia2019 BDS: A data synchronization platform for HBase (20)

PDF
Hbase 20141003
PDF
Facebook keynote-nicolas-qcon
PDF
Facebook Messages & HBase
PDF
支撑Facebook消息处理的h base存储系统
PPTX
PPTX
Zero-downtime Hadoop/HBase Cross-datacenter Migration
PDF
Hbase: an introduction
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
PPTX
Introduction to Apache HBase
PDF
HBaseCon2017 Data Product at AirBnB
PDF
Nyc hadoop meetup introduction to h base
PPTX
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
PPTX
Hbasepreso 111116185419-phpapp02
PPTX
Hbase Introduction
PDF
Airstream: Spark Streaming At Airbnb
PDF
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
PDF
Optimization on Key-value Stores in Cloud Environment
PPTX
Hadoop Backup and Disaster Recovery
PDF
Facebook's HBase Backups - StampedeCon 2012
PPTX
Introduction to HBase
Hbase 20141003
Facebook keynote-nicolas-qcon
Facebook Messages & HBase
支撑Facebook消息处理的h base存储系统
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Hbase: an introduction
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
Introduction to Apache HBase
HBaseCon2017 Data Product at AirBnB
Nyc hadoop meetup introduction to h base
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
Hbasepreso 111116185419-phpapp02
Hbase Introduction
Airstream: Spark Streaming At Airbnb
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Optimization on Key-value Stores in Cloud Environment
Hadoop Backup and Disaster Recovery
Facebook's HBase Backups - StampedeCon 2012
Introduction to HBase
Ad

More from Michael Stack (20)

PDF
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
PDF
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
PDF
hbaseconasia2019 HBase at Didi
PDF
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
PDF
hbaseconasia2019 HBase at Tencent
PDF
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
PDF
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
PDF
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
PDF
hbaseconasia2019 OpenTSDB at Xiaomi
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
PDF
hbaseconasia2019 Distributed Bitmap Index Solution
PDF
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
PDF
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
PDF
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
PDF
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
PDF
HBaseConAsia2019 Keynote
PDF
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 HBase at Didi
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
HBaseConAsia2019 Keynote
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies

Recently uploaded (20)

PDF
Introduction to the IoT system, how the IoT system works
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
Digital Literacy And Online Safety on internet
DOCX
Unit-3 cyber security network security of internet system
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PPTX
Introduction to Information and Communication Technology
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Introduction to the IoT system, how the IoT system works
Introuction about WHO-FIC in ICD-10.pptx
Job_Card_System_Styled_lorem_ipsum_.pptx
522797556-Unit-2-Temperature-measurement-1-1.pptx
The New Creative Director: How AI Tools for Social Media Content Creation Are...
WebRTC in SignalWire - troubleshooting media negotiation
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
introduction about ICD -10 & ICD-11 ppt.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
Digital Literacy And Online Safety on internet
Unit-3 cyber security network security of internet system
Power Point - Lesson 3_2.pptx grad school presentation
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Exploring VPS Hosting Trends for SMBs in 2025
Introduction to Information and Communication Technology
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Cloud-Scale Log Monitoring _ Datadog.pdf
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Slides PPTX World Game (s) Eco Economic Epochs.pptx

hbaseconasia2019 BDS: A data synchronization platform for HBase

  • 1. THE COMMUNITY EVENT FOR APACHE HBASE™
  • 2. BDS: A data synchronization platform for HBase 熊嘉男(侧⽥田) Ali-HBase 数据链路路负责⼈人
  • 4. • HBase support cross-version migration without downtime? • HBase support data backup to OSS or other storage? • HBase support replicate incremental data to MQ,ES,Solr? • Replicate incremental data from RDS to HBase? • HBase data can be archived to Spark cluster for offline analysis? • HBase High Availability
 …….
  • 6. • Table Structure transformation • Real-time data replication • Client double write • HBase Replication • Full data migration • DataX • CopyTable • Create Snapshot & Export Snapshot • Data consistency verification HBase clusters Migration • Cross-Version Migration compatibility issues • Impact on Business • Lack of integrated solutions Migration Step Defect
  • 7. • Heterogeneous full data migration • DataX • Sqoop • Heterogeneous Real-time Data Replication • HBase Real-time Data export • Custom Replication Endpoint • Custom Replication Sink Heterogeneous Data Transmission
  • 8. BDS
  • 9. & & • Master & Slave • Stateless Slave • Plugin-in mode • Higher scalability and better performance High-Level Architecture
  • 11. HBase full data migration 3 3 . 3 3 . . . 3 . 3 . 3 . 3 1 3 35 3 2 4 3 5 3 4 3 2
  • 12. HBase full data migration • Avoid the impact on business • Only access HDFS • Dynamic migration rate • Decoupled from HBase • One-click migration • Create table automaticlly • Perceive changes in region • Perceive HFiles compaction • Efficient • 100MB/s (single node) • Higher scalability
  • 13. Data localization rate DataNode1 DataNode2 HFile HFile RegionServer Region Local read remote read • Data migration takes the issue of data localization rates into account • Avoid low localization rate after data migration
  • 14. File split HFile1 HFile2 HFile3 HFile4 HFile1 HFile2-1 HFile2-2 HFile3 HFile4 Split Region1 Region2 Load • Migration will split HFiles according to the partitions of the original and target tables • Increase the speed of bulkload
  • 16. Data pipeline 352 1162 5 3 4 • Using RingBuffer as a queue • AckQueue maintains offset • Write throughput support dynamic configuration
  • 17. Impact on business 4 43 4 43 2 2 43 43 2 2 2 43 4 2 43 31 2 43 4 2 43 31 • Read and write affect data replication HBASE Replication BDS • Decoupled from HBase • Only access HDFS • Data Replication is not affected by HBase crash
  • 18. Hotspot 4 43 4 43 2 2 43 43 2 2 2 43 4 2 43 31 2 43 4 2 43 31 2 3 2 1 1 1 HBASE Replication BDS • Hotspot • Round robin scheduling
  • 19. Replication backlog 2 3 2 1 1 1 BDS • Add slave nodes • Slave throughput support dynamic configuration 增加Worker节点并发处理理⽇日志的数量量 增加AsyncWriter并发 Add Worker nodes
  • 20. Operation and maintenance •BDS •Easy to expand •Easy to upgrade •monitor •alarm mechanism •HBase Replication •Bug fix •No alarm •Configuration modification and system upgrade requires RS to restart
  • 25. Archive data to Spark 1 0
  • 26. RDS