SlideShare a Scribd company logo
HDFS optimization for Hbase
At XiaoMi
01
02
03
01
l  Multiple replicas
l  Namenode & Datanode
r=replication, factor k=fault datanodes, N=total datanodes
NkailabilityreaddSLAAv /1 −=
( ) ( ) ( )rN,CkNi,rCki,CailabilitywriteSLAAv
r
=i
/1
1
−−×−= ∑
),,( readSLAwriteSLAailabilitynamenodeAvMintyavailabili =
Master
Worker
Cluster Configs
Namenode
DataNode
DataNodeWorker
Falcon
Metrics
HDFSMonitorCluster
02
Architecture
DFSClient
DfsClientShm
Slot Slot
slot slotDfsClientShm
Datanode
RegisteredShm RegisteredShm
Slot Slot
Block Block
Shared Mem
Domain Socket
•  Allocate Shm
•  Request slot & FDs
•  Release Slot
•  Release Shm
² 
• 
• 
•  Datanode Full GC caused by RegisteredShm
•  3000+ QPS alloction VS 1000+ QPS release
² 
² 
YCSB get 20% QPS incensement
² 
Preallocate the Shm:
•  1000 files
•  60 threads
•  seek read
² 
² 
² 
 
 
²  Listen drop on SSD cluster causes 3s delay
15:56:14.506610 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val
1590620938 ecr 0,nop,wscale 7], length 0<<<--------timeout on first try
15:56:17.506172 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val
1590623938 ecr 0,nop,wscale 7], length 0<<<--------retry
15:56:17.506211 IP y.y.y.y.29402 > x.x.x.x.62393: Flags [S.], seq 4109047318, ack 167786999, win 14480, options [mss
1460,sackOK,TS val 1589839920 ecr 1590623938,nop,wscale 7], length 0
²  HDFS-9669
²  After change backlog 128, 3s delays reduced to ~1/10 on Hbase SSD cluster
Somaxconn=128 Default Datanode backlog=50
²  Peer cache bucket adjustment
²  Connection/Socket timeout of the DFSClient & Datanode
dfs.client.socket-timeout
dfs.datanode.socket.write.timeout
l  Reduce the timeout to 15s
l  Avoid pipeline timeout, upgrade the DFSClient first
03
²  60 seconds timeout if datanode dies
²  Dead nodes not shared
²  “Dead” node not actually dead DeadNodeDetector
Global Dead Nodes
DFSInputStream
Local Dead Nodes
Namenode
DataNode
DataNode
DFSInputStream
Local Dead Nodes
DFSClient
HDFS
Suspicious Nodes Live Nodes
²  Node state machine
Init Live
Suspicious
Dead
Open
RPC Failure
Removed
Close
RPC Failure
RPC Success
Read Failure Read Success
Subtitle Text
Maintain the data on local host as much as
possible and reduce the over head of the
local read
Make sure the response to Hbase is
returned as soon as possible even a failed
one
Minor GC from both Hbase and HDFS
affects the latency. Try to easy the one
from HDFS on client side.
Thanks

More Related Content

PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
PDF
Tales from Taming the Long Tail
PPTX
Apache HBase, Accelerated: In-Memory Flush and Compaction
PPTX
Keynote: Apache HBase at Yahoo! Scale
PDF
HBaseCon 2013: Scalable Network Designs for Apache HBase
PPTX
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
PPTX
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
PPTX
Date-tiered Compaction Policy for Time-series Data
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
Tales from Taming the Long Tail
Apache HBase, Accelerated: In-Memory Flush and Compaction
Keynote: Apache HBase at Yahoo! Scale
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
Date-tiered Compaction Policy for Time-series Data

What's hot (20)

PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PDF
Hadoop Hardware @Twitter: Size does matter!
PDF
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
PDF
HBase: How to get MTTR below 1 minute
PDF
Breaking the Sound Barrier with Persistent Memory
PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PDF
Kafka on ZFS: Better Living Through Filesystems
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
PPTX
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
PPTX
Real-time HBase: Lessons from the Cloud
PDF
hbaseconasia2017: hbase-2.0.0
PPTX
Off-heaping the Apache HBase Read Path
PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
PPTX
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
PDF
TeraCache: Efficient Caching Over Fast Storage Devices
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
HBaseCon 2015: HBase Operations at Xiaomi
PDF
Accordion HBaseCon 2017
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Hadoop Hardware @Twitter: Size does matter!
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBase: How to get MTTR below 1 minute
Breaking the Sound Barrier with Persistent Memory
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Kafka on ZFS: Better Living Through Filesystems
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon 2013: How (and Why) Phoenix Puts the SQL Back into NoSQL
Real-time HBase: Lessons from the Cloud
hbaseconasia2017: hbase-2.0.0
Off-heaping the Apache HBase Read Path
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
TeraCache: Efficient Caching Over Fast Storage Devices
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon 2015: HBase Operations at Xiaomi
Accordion HBaseCon 2017
Ad

Similar to HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi (20)

PPTX
Understanding DPDK
PDF
20140513_jeffyang_demo_openstack
PDF
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
PPTX
Advanced Replication
PPTX
Exactly once with spark streaming
PDF
2013 london advanced-replication
PDF
Anchoring Trust: Rewriting DNS for the Semantic Network with Ruby and Rails
PPT
Anatomy of file write in hadoop
PDF
Introduction to tcpdump
PPTX
Putting Wings on the Elephant
PPTX
Dnsdist
PDF
Cassandra for Sysadmins
PDF
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
PPTX
Spark 计算模型
PDF
Scalable Socket Server by Aryo
PDF
Replication MongoDB Days 2013
PDF
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
PDF
High Availability With DRBD & Heartbeat
PDF
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
PDF
Performance tweaks and tools for Linux (Joe Damato)
Understanding DPDK
20140513_jeffyang_demo_openstack
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
Advanced Replication
Exactly once with spark streaming
2013 london advanced-replication
Anchoring Trust: Rewriting DNS for the Semantic Network with Ruby and Rails
Anatomy of file write in hadoop
Introduction to tcpdump
Putting Wings on the Elephant
Dnsdist
Cassandra for Sysadmins
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
Spark 计算模型
Scalable Socket Server by Aryo
Replication MongoDB Days 2013
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
High Availability With DRBD & Heartbeat
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Performance tweaks and tools for Linux (Joe Damato)
Ad

More from Michael Stack (20)

PDF
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
PDF
hbaseconasia2019 Recent work on HBase at Pinterest
PDF
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
PDF
hbaseconasia2019 HBase at Didi
PDF
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
PDF
hbaseconasia2019 HBase at Tencent
PDF
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
PDF
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
PDF
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
PDF
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
PDF
hbaseconasia2019 OpenTSDB at Xiaomi
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
PDF
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
PDF
hbaseconasia2019 Distributed Bitmap Index Solution
PDF
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
PDF
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
PDF
hbaseconasia2019 BDS: A data synchronization platform for HBase
PDF
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
PDF
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
PDF
HBaseConAsia2019 Keynote
hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud
hbaseconasia2019 Recent work on HBase at Pinterest
hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd
hbaseconasia2019 HBase at Didi
hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...
hbaseconasia2019 HBase at Tencent
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
HBaseConAsia2019 Keynote

Recently uploaded (20)

DOCX
Unit-3 cyber security network security of internet system
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
Testing WebRTC applications at scale.pdf
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
E -tech empowerment technologies PowerPoint
PDF
Sims 4 Historia para lo sims 4 para jugar
PPTX
innovation process that make everything different.pptx
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
Internet___Basics___Styled_ presentation
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
Unit-3 cyber security network security of internet system
Job_Card_System_Styled_lorem_ipsum_.pptx
Testing WebRTC applications at scale.pdf
international classification of diseases ICD-10 review PPT.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
E -tech empowerment technologies PowerPoint
Sims 4 Historia para lo sims 4 para jugar
innovation process that make everything different.pptx
Tenda Login Guide: Access Your Router in 5 Easy Steps
Unit-1 introduction to cyber security discuss about how to secure a system
Internet___Basics___Styled_ presentation
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
INTERNET------BASICS-------UPDATED PPT PRESENTATION
WebRTC in SignalWire - troubleshooting media negotiation
Cloud-Scale Log Monitoring _ Datadog.pdf
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Module 1 - Cyber Law and Ethics 101.pptx
Introuction about ICD -10 and ICD-11 PPT.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
An introduction to the IFRS (ISSB) Stndards.pdf

HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi

  • 1. HDFS optimization for Hbase At XiaoMi
  • 3. 01
  • 4. l  Multiple replicas l  Namenode & Datanode r=replication, factor k=fault datanodes, N=total datanodes NkailabilityreaddSLAAv /1 −= ( ) ( ) ( )rN,CkNi,rCki,CailabilitywriteSLAAv r =i /1 1 −−×−= ∑ ),,( readSLAwriteSLAailabilitynamenodeAvMintyavailabili =
  • 6. 02
  • 7. Architecture DFSClient DfsClientShm Slot Slot slot slotDfsClientShm Datanode RegisteredShm RegisteredShm Slot Slot Block Block Shared Mem Domain Socket •  Allocate Shm •  Request slot & FDs •  Release Slot •  Release Shm
  • 8. ²  •  •  •  Datanode Full GC caused by RegisteredShm •  3000+ QPS alloction VS 1000+ QPS release
  • 10. ²  YCSB get 20% QPS incensement
  • 11. ²  Preallocate the Shm: •  1000 files •  60 threads •  seek read
  • 13. ²  Listen drop on SSD cluster causes 3s delay 15:56:14.506610 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val 1590620938 ecr 0,nop,wscale 7], length 0<<<--------timeout on first try 15:56:17.506172 IP x.x.x.x.62393 > y.y.y.y.29402: Flags [S], seq 167786998, win 14600, options [mss 1460,sackOK,TS val 1590623938 ecr 0,nop,wscale 7], length 0<<<--------retry 15:56:17.506211 IP y.y.y.y.29402 > x.x.x.x.62393: Flags [S.], seq 4109047318, ack 167786999, win 14480, options [mss 1460,sackOK,TS val 1589839920 ecr 1590623938,nop,wscale 7], length 0 ²  HDFS-9669 ²  After change backlog 128, 3s delays reduced to ~1/10 on Hbase SSD cluster Somaxconn=128 Default Datanode backlog=50
  • 14. ²  Peer cache bucket adjustment
  • 15. ²  Connection/Socket timeout of the DFSClient & Datanode dfs.client.socket-timeout dfs.datanode.socket.write.timeout l  Reduce the timeout to 15s l  Avoid pipeline timeout, upgrade the DFSClient first
  • 16. 03
  • 17. ²  60 seconds timeout if datanode dies ²  Dead nodes not shared ²  “Dead” node not actually dead DeadNodeDetector Global Dead Nodes DFSInputStream Local Dead Nodes Namenode DataNode DataNode DFSInputStream Local Dead Nodes DFSClient HDFS Suspicious Nodes Live Nodes
  • 18. ²  Node state machine Init Live Suspicious Dead Open RPC Failure Removed Close RPC Failure RPC Success Read Failure Read Success
  • 19. Subtitle Text Maintain the data on local host as much as possible and reduce the over head of the local read Make sure the response to Hbase is returned as soon as possible even a failed one Minor GC from both Hbase and HDFS affects the latency. Try to easy the one from HDFS on client side.