SlideShare a Scribd company logo
How We Migrate PBs Data from Beijing to Shanghai
Wang Yuxi, Umeng
w@umeng.com
Agenda
● Why migrating
● Current Infrastructure
● Environment Setup
● Data Transfer(HBase)
● Data Transfer (MongoDB)
● Data Transfer (Mysql/Redis)
● Application Provision
● Monitoring
● Benchmark and Stress Testing
● Go!
● Results
● Recap
About Me
● Before 2014, the only ops at Umeng
● Now, core member of ops team
● Technical generalist, responsible for the overall reliability and performance
of Umeng
● ArchLinux user
@Jasey_Wang | http://JaseyWang.Me
About Umeng
● Founded on April 2010
● Incubated by Innovation
Works
● $10 Million raised from
Matrix China
● Acquired by Alibaba
● Largest Mobile app
analytical platform in
China
● 400K+ Apps
● ~1B mobile device
Why Migrating
● Capex/Opex
● Unlimited resources, no worries for IDC, Racks, Bandwidth, etc.
● Integration with group's internal systems, massive advanced tools
● Make our PBs data safer
Current Infrastructure
● Data center: 4
● Server: 1000
● Networking device: 100
● Bandwidth: 4Gbps+
● Realtime analytics: 150K qps
● Batch processing: 4P/5P storage usage
● Know more? see Umeng Operations Infrastructure & Practice
Current Infrastructure(Cont.)
Environment Setup
● 1G dedicated fiber between Beijing and Shanghai ready
● Due to security reason, can only send SYN from Shanghai to Beijing
● Setup DNAT for Beijing cluster
● Raw data transfer test, saturate the bandwidth(iperf -P/netperf)
Data transfer(HBase)
● HBase, 0.94 @ Beijing, 0.98 @ Shanghai
● Build-in import/export tools don’t work
● Write own import/export tool, integrity check
● Task scheduler, historical & daily incremental data
● 2 months data transfer
Data transfer(HBase)(Cont.)
Data transfer(MongoDB)
● Master/Slave
○ 2.4.11, obsolete
○ build-in replica mechanism
● Primary/Secondary/Arbiter
○ replica arch
○ since Beijing can’t connect to Shanghai, write tools to read Oplog and
replay into Shanghai DB cluster
● Oplog, lag, TCP keepalive, slow query, dead lock
Data Transfer(MySQL/Redis)
● MySQL
○ percona-xtrabackup, quite handy
● Redis
○ single instance, slaveof command
○ twemproxy, export and import later by tools
Application Provision
● Stateless or stateful?
● Kafka & Mirror
○ consumer/producer queue
○ qps, topics, io
● Storm
○ throughput, lag
● Zookeeper
○ 5 nodes, 4 letter words, log cleaning
Monitoring
● Internal monitoring system backed by HBase
● Zabbix, Ganglia
● Graphite for metrics
● Monit for process monitoring
Benchmark and Stress Testing
● Single component
○ system level metrics
○ application level metrics
● Multiple components
● Part of online traffic
Go!
● What about PLAN B or C?
○ plan B usually does not work
● Friday night from 00:00 to 08:00
● Route tens of products traffic to Shanghai smoothly
● The site is fully available without outage
Results
● Sophisticated projects, all members in
● 6 months work pays off
● Hugely successful, no roll-back
● Part of products now running on private cloud
Recap
● Tools are #1 productive forces
● Test, test and test
● Monitoring and metrics
End
Q & A

More Related Content

PPTX
Data monstersrealtimeetl new
PPTX
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
PDF
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
PDF
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
PDF
Apache Flink
PDF
Tuning Flink For Robustness And Performance
PDF
A Tool For Big Data Analysis using Apache Spark
PDF
Build real time stream processing applications using Apache Kafka
Data monstersrealtimeetl new
Flink Forward Berlin 2018: Timo Walther - "Flink SQL in Action"
Flink Forward Berlin 2018: Xiaowei Jiang - Keynote: "Unified Engine for Data ...
Flink Forward Berlin 2018: Ravi Suhag & Sumanth Nakshatrithaya - "Managing Fl...
Apache Flink
Tuning Flink For Robustness And Performance
A Tool For Big Data Analysis using Apache Spark
Build real time stream processing applications using Apache Kafka

What's hot (20)

PDF
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
PDF
Structured Streaming in Spark
PPTX
Rootconf 2017 - State of the Open Source monitoring landscape
PDF
Database ingest with Apache NiFi and MiNiFi
PDF
Streaming sql and druid
PDF
Presto Summit 2018 - 04 - Netflix Containers
PDF
Apache flink
PPTX
CDC to the Max!
PPTX
aOS Kuala Lumpur - Migrating to SharePoint Online - Real-life Experiences
PDF
Build intelligent, real-time applications using Machine Learning
PDF
Cypher for Apache Spark
PDF
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
PDF
Iceberg: a fast table format for S3
PDF
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
PDF
Improving Mobile Payments With Real time Spark
PDF
Grafana 7.0
PDF
ResourceSpace: Recent pains and future gains
PPTX
University program - writing an apache apex application
PDF
Storing State Forever: Why It Can Be Good For Your Analytics
PDF
Presto Summit 2018 - 03 - Starburst CBO
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Structured Streaming in Spark
Rootconf 2017 - State of the Open Source monitoring landscape
Database ingest with Apache NiFi and MiNiFi
Streaming sql and druid
Presto Summit 2018 - 04 - Netflix Containers
Apache flink
CDC to the Max!
aOS Kuala Lumpur - Migrating to SharePoint Online - Real-life Experiences
Build intelligent, real-time applications using Machine Learning
Cypher for Apache Spark
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Iceberg: a fast table format for S3
Unify Enterprise Data Processing System Platform Level Integration of Flink a...
Improving Mobile Payments With Real time Spark
Grafana 7.0
ResourceSpace: Recent pains and future gains
University program - writing an apache apex application
Storing State Forever: Why It Can Be Good For Your Analytics
Presto Summit 2018 - 03 - Starburst CBO
Ad

Similar to How We Migrate PBs Data from Beijing to Shanghai (20)

ODP
Are we there yet?
PDF
#RADC4L16: An API-First Archives Approach at NPR
PDF
About VisualDNA Architecture @ Rubyslava 2014
PDF
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
PDF
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
PDF
Go at uber
PDF
Testing data streaming applications
PDF
Blackray @ SAPO CodeBits 2009
PPTX
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
PPTX
AWS Big Data Demystified #1: Big data architecture lessons learned
PDF
Designing for operability and managability
PDF
Machine learning and big data @ uber a tale of two systems
PDF
TRHUG 2015 - Veloxity Big Data Migration Use Case
PPTX
Apache Airflow in Production
PPTX
Big Data in 200 km/h | AWS Big Data Demystified #1.3
PDF
Introducing TiDB Operator
PPTX
AmazonRedshift
PDF
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Are we there yet?
#RADC4L16: An API-First Archives Approach at NPR
About VisualDNA Architecture @ Rubyslava 2014
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Thinking DevOps in the era of the Cloud - Demi Ben-Ari
Go at uber
Testing data streaming applications
Blackray @ SAPO CodeBits 2009
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS Big Data Demystified #1: Big data architecture lessons learned
Designing for operability and managability
Machine learning and big data @ uber a tale of two systems
TRHUG 2015 - Veloxity Big Data Migration Use Case
Apache Airflow in Production
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Introducing TiDB Operator
AmazonRedshift
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
A Day in the Life of a Druid Implementor and Druid's Roadmap
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Ad

More from Elmer Brown (9)

PDF
CDN 行业研究报告
PDF
OAuth 简介
PDF
Archlinux 更适合开发人员的发行版
PDF
Google,产品线与开源相关
ODP
How to become a free software hacker
PDF
Ubuntu Natty 11.04 新特性
PPT
PDF
火狐俱乐部有奖问答题
PDF
Gnu linux-start
CDN 行业研究报告
OAuth 简介
Archlinux 更适合开发人员的发行版
Google,产品线与开源相关
How to become a free software hacker
Ubuntu Natty 11.04 新特性
火狐俱乐部有奖问答题
Gnu linux-start

Recently uploaded (20)

PDF
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
PPTX
E -tech empowerment technologies PowerPoint
PPTX
Slides PPTX: World Game (s): Eco Economic Epochs.pptx
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PDF
Introduction to the IoT system, how the IoT system works
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PDF
Understand the Gitlab_presentation_task.pdf
PDF
Containerization lab dddddddddddddddmanual.pdf
PDF
simpleintnettestmetiaerl for the simple testint
PPT
250152213-Excitation-SystemWERRT (1).ppt
PDF
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
PPTX
Internet Safety for Seniors presentation
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPTX
t_and_OpenAI_Combined_two_pressentations
PDF
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
E -tech empowerment technologies PowerPoint
Slides PPTX: World Game (s): Eco Economic Epochs.pptx
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
Introduction to the IoT system, how the IoT system works
Exploring VPS Hosting Trends for SMBs in 2025
Understand the Gitlab_presentation_task.pdf
Containerization lab dddddddddddddddmanual.pdf
simpleintnettestmetiaerl for the simple testint
250152213-Excitation-SystemWERRT (1).ppt
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
Internet Safety for Seniors presentation
Power Point - Lesson 3_2.pptx grad school presentation
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
t_and_OpenAI_Combined_two_pressentations
Session 1 (Week 1)fghjmgfdsfgthyjkhfdsadfghjkhgfdsa
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
Alethe Consulting Corporate Profile and Solution Aproach
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd

How We Migrate PBs Data from Beijing to Shanghai

  • 1. How We Migrate PBs Data from Beijing to Shanghai Wang Yuxi, Umeng w@umeng.com
  • 2. Agenda ● Why migrating ● Current Infrastructure ● Environment Setup ● Data Transfer(HBase) ● Data Transfer (MongoDB) ● Data Transfer (Mysql/Redis) ● Application Provision ● Monitoring ● Benchmark and Stress Testing ● Go! ● Results ● Recap
  • 3. About Me ● Before 2014, the only ops at Umeng ● Now, core member of ops team ● Technical generalist, responsible for the overall reliability and performance of Umeng ● ArchLinux user @Jasey_Wang | http://JaseyWang.Me
  • 4. About Umeng ● Founded on April 2010 ● Incubated by Innovation Works ● $10 Million raised from Matrix China ● Acquired by Alibaba ● Largest Mobile app analytical platform in China ● 400K+ Apps ● ~1B mobile device
  • 5. Why Migrating ● Capex/Opex ● Unlimited resources, no worries for IDC, Racks, Bandwidth, etc. ● Integration with group's internal systems, massive advanced tools ● Make our PBs data safer
  • 6. Current Infrastructure ● Data center: 4 ● Server: 1000 ● Networking device: 100 ● Bandwidth: 4Gbps+ ● Realtime analytics: 150K qps ● Batch processing: 4P/5P storage usage ● Know more? see Umeng Operations Infrastructure & Practice
  • 8. Environment Setup ● 1G dedicated fiber between Beijing and Shanghai ready ● Due to security reason, can only send SYN from Shanghai to Beijing ● Setup DNAT for Beijing cluster ● Raw data transfer test, saturate the bandwidth(iperf -P/netperf)
  • 9. Data transfer(HBase) ● HBase, 0.94 @ Beijing, 0.98 @ Shanghai ● Build-in import/export tools don’t work ● Write own import/export tool, integrity check ● Task scheduler, historical & daily incremental data ● 2 months data transfer
  • 11. Data transfer(MongoDB) ● Master/Slave ○ 2.4.11, obsolete ○ build-in replica mechanism ● Primary/Secondary/Arbiter ○ replica arch ○ since Beijing can’t connect to Shanghai, write tools to read Oplog and replay into Shanghai DB cluster ● Oplog, lag, TCP keepalive, slow query, dead lock
  • 12. Data Transfer(MySQL/Redis) ● MySQL ○ percona-xtrabackup, quite handy ● Redis ○ single instance, slaveof command ○ twemproxy, export and import later by tools
  • 13. Application Provision ● Stateless or stateful? ● Kafka & Mirror ○ consumer/producer queue ○ qps, topics, io ● Storm ○ throughput, lag ● Zookeeper ○ 5 nodes, 4 letter words, log cleaning
  • 14. Monitoring ● Internal monitoring system backed by HBase ● Zabbix, Ganglia ● Graphite for metrics ● Monit for process monitoring
  • 15. Benchmark and Stress Testing ● Single component ○ system level metrics ○ application level metrics ● Multiple components ● Part of online traffic
  • 16. Go! ● What about PLAN B or C? ○ plan B usually does not work ● Friday night from 00:00 to 08:00 ● Route tens of products traffic to Shanghai smoothly ● The site is fully available without outage
  • 17. Results ● Sophisticated projects, all members in ● 6 months work pays off ● Hugely successful, no roll-back ● Part of products now running on private cloud
  • 18. Recap ● Tools are #1 productive forces ● Test, test and test ● Monitoring and metrics