SlideShare a Scribd company logo
Large Scale Web Apps @Pinterest
(Powered by Apache HBase)
May 5, 2014
Pinterest is a visual discovery tool for
collecting the things you love, and discovering
related content along the way.
What is Pinterest ?
Large-scale Web Apps @ Pinterest
Scale
Challenges @scale
• 100s of millions of pins/repins per month
• Billions of requests per week
• Millions of daily active users
• Billions of pins
• One of the largest discovery tools on the internet
Storage stack @Pinterest
!
• MySQL
• Redis (persistence and for cache)
• MemCache (Consistent Hashing)
App Tier
Manual
Sharding
Sharding
Logic
Why HBase ?
!
• High Write throughput
- Unlike MySQL/B-Tree, writes don’t ever seek on Disk
• Seamless integration with Hadoop
• Distributed operation
- Fault tolerance
- Load Balancing
- Easily add/remove nodes
!
Non-Technical Reasons
• Large active community
• Large scale online use cases
Outline
!
• Features powered by HBase
• SaaS (Storage as a Service)
- MetaStore
- HFile Service (Terrapin)
• Our HBase setup - optimizing for High availability & Low latency
Applications/Features
!
• Offline
- Analytics
- Search Indexing
- ETL/Hadoop worklows
• Online
- Personalized Feeds
- Rich Pins
- Recommendations
!
Why HBase ?
Personalized Feeds
WHY HBASE ?
Write Heavy load due
to Pin fanout.
Recommended
Pins
Users I follow
Rich Pins
WHY HBASE ?
Negative Hits with Bloom
Filters
Recommendations
HADOOP
1.0
HBASE +
HADOOP 2.0
HADOOP
2.0
WHY HBASE ?
Seamless Data Transfer from
Hadoop
Generate
Recommendations
DistCP Jobs
Serving Cluster
SaaS
• Large number of feature requests
• 1 Cluster per feature
• Scaling with organizational growth
• Need for “defensive” multi tenant storage
• Previous solutions reaching their limits
MetaStore I
• Key Value store on top of HBase
• 1 HBase Table per Feature with salted keys
• Pre split tables
• Table level rate limiting (online/offline reads/writes)
• No Scan support
• Simple client API!
!
string getValue(string feature, string key, boolean online);
void setValue(string feature, string key, string value,
boolean online);
MetaStore II
MetaStore
Thrift Server
Primary HBase Secondary HBase
Clients
Master/Master
Replication
Thrift
Salting +
Rate Limiting
ZooKeeper
Issue
Gets/Sets
Notifications
Metastore Config
- Rate Limits
- Primary Cluster
HFile Service (Terrapin)
• Solve the Bulk Upload problem
• HBase backed solution
- Bulk upload + major compact
- Major compact to delete old data
• Design solution from scratch using mashup of:
- HFile
- HBase BlockCache
- Avoid compactions
- Low latency key value lookups
!
!
!
High Level Architecture I
!
Client Library
/Service
ETL/Batch Jobs
Load/Reload
HFile
Servers
!
HFiles on
Amazon S3
Key/Value
Lookups
Multiple
HFiles/Server
High Level Architecture II
• Each HFile server runs 2 processes
- Copier: pulls HFiles from S3 to local disk
- Supershard: serves multiple HFile shards to client
• ZooKeeper
- Detecting alive servers
- Coordinating loading/swapping of new data
- Enabling clients to detect availability of new data
• Loader Module (replaces distcp)
- Trigger new data copy
- Trigger swap through zookeeper
- Update ZooKeeper and notify client
• Client library understands sharding
• Old data deleted by background process
!
Salient Features
• Multi tenancy through namespacing
• Pluggable sharding functions - modulus, range & more
• HBase Block Cache
• Multiple clusters for redundancy
• Speculative execution across clusters for low latency
!
!
!
Setting up for Success
• Many online usecases/applications
• Optimize for:
- Low MTTR - high availability
- Low latency (performance)
!
!
MTTR - I
DEADLIVE STALE
20sec 9min 40sec
!
• Stale nodes avoided
- As candidates for Reads
- As candidate replicas for writes
- During Lease Recovery
• Copying of underreplicated blocks starts when a Node is
marked as “Dead”
DataNode States
MTTR - II
Failure Detection
Lease Recovery
Log Split
Recover Regions
30 sec ZooKeeper
session timeout
HDFS 4721
HDFS 3703 +
HDFS 3912
< 2 min
!
• Avoid stale nodes at each point of the recovery process
• Multi minute timeouts ==> Multi second timeouts
Simulate, Simulate, Simulate
Simulate “Pull the plug failures” and “tail -f the logs”
• kill -9 both datanode and region server - causes connection
refused errors
• kill -STOP both datanode and region server - causes socket
timeouts
• Blackhole hosts using iptables - connect timeouts + “No
Route to host” - Most representative of AWS failures
Performance
Configuration tweaks
• Small Block Size, 4K-16K
• Prefix compression to cache more - when data is in the key,
close to 4X reduction for some data sets
• Separation of RPC handler threads for reads vs writes
• Short circuit local reads
• HBase level checksums (HBASE 5074)
Hardware
• SATA (m1.xl/c1.xl) and SSD (hi1.4xl)
• Choose based on limiting factor
- Disk space - pick SATA for max GB/$$
- IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or
heavy compaction activity
Performance (SSDs)
HFile Read Performance
• Turn off block cache for Data Blocks, reduce GC + heap
fragmentation
• Keep block cache on for Index Blocks
• Increase “dfs.client.read.shortcircuit.streams.cache.size” from
100 to 10,000 (with short circuit reads)
• Approx. 3X improvement in read throughput
!
Write Performance
• WAL contention when client sets AutoFlush=true
• HBase 8755
In the Pipeline...
!
• Building a graph database on HBase
• Disaster recovery - snapshot + incremental backup + restore
• Off Heap cache - reduce GC overhead and better use of
hardware
• Read path optimizations
And we are Hiring !!

More Related Content

PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PPTX
HBaseCon 2015: HBase and Spark
PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
PPTX
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PDF
Tales from the Cloudera Field
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2015: HBase and Spark
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Tales from the Cloudera Field

What's hot (20)

PPTX
HBase at Bloomberg: High Availability Needs for the Financial Industry
PPTX
A Survey of HBase Application Archetypes
PDF
HBase Read High Availability Using Timeline-Consistent Region Replicas
PDF
HBaseCon 2015- HBase @ Flipboard
PPTX
Rigorous and Multi-tenant HBase Performance Measurement
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PPTX
HBase Backups
PPTX
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Taming the Elephant: Efficient and Effective Apache Hadoop Management
PPTX
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
PPTX
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
PDF
Apache HBase in the Enterprise Data Hub at Cerner
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
PPTX
HBaseCon 2015: State of HBase Docs and How to Contribute
PPTX
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
PDF
HBase Status Report - Hadoop Summit Europe 2014
PDF
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBase at Bloomberg: High Availability Needs for the Financial Industry
A Survey of HBase Application Archetypes
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon 2015- HBase @ Flipboard
Rigorous and Multi-tenant HBase Performance Measurement
HBase Data Modeling and Access Patterns with Kite SDK
HBase Backups
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Taming the Elephant: Efficient and Effective Apache Hadoop Management
HBaseCon 2013: Streaming Data into Apache HBase using Apache Flume: Experienc...
Big Data Camp LA 2014 - Apache Tajo: A Big Data Warehouse System on Hadoop
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2015: State of HBase Docs and How to Contribute
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Building robust CDC pipeline with Apache Hudi and Debezium
HBase Status Report - Hadoop Summit Europe 2014
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Ad

Viewers also liked (20)

PDF
Apache HBase - Just the Basics
PDF
Apache HBase Low Latency
PPTX
HBase Low Latency
PDF
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
PPTX
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
PPTX
Introduction To HBase
PPTX
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
PDF
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
PPTX
HBaseCon 2013: 1500 JIRAs in 20 Minutes
PPTX
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
PPTX
HBaseCon 2013: Apache HBase on Flash
PPT
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
PPT
HBaseCon 2012 | Building Mobile Infrastructure with HBase
PPTX
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
PPTX
HBaseCon 2012 | Scaling GIS In Three Acts
PPTX
Cross-Site BigTable using HBase
PPTX
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
PPTX
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
PPTX
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Apache HBase - Just the Basics
Apache HBase Low Latency
HBase Low Latency
Parallel Distributed Image Stacking and Mosaicing with Hadoop__HadoopSummit2010
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
Introduction To HBase
HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon
HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Apache HBase on Flash
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN
HBaseCon 2012 | Scaling GIS In Three Acts
Cross-Site BigTable using HBase
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Relaxed Transactions for HBase - Francis Liu, Yahoo!
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...
Ad

Similar to Large-scale Web Apps @ Pinterest (20)

PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PDF
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
PPTX
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
PPTX
HBase Low Latency, StrataNYC 2014
PDF
Facebook keynote-nicolas-qcon
PDF
支撑Facebook消息处理的h base存储系统
PDF
Facebook Messages & HBase
PPTX
HBase: Where Online Meets Low Latency
PPT
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PDF
Hive spark-s3acommitter-hbase-nfs
PPTX
Scale your Alfresco Solutions
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
PPTX
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
PPTX
HBaseConAsia2018 Track3-2: HBase at China Telecom
PPTX
Hadoop ppt1
PDF
Webinar - DreamObjects/Ceph Case Study
PDF
Trend Micro Big Data Platform and Apache Bigtop
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Speed Up Your Queries with Hive LLAP Engine on Hadoop or in the Cloud
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
HBase Low Latency, StrataNYC 2014
Facebook keynote-nicolas-qcon
支撑Facebook消息处理的h base存储系统
Facebook Messages & HBase
HBase: Where Online Meets Low Latency
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Real time fraud detection at 1+M scale on hadoop stack
Hive spark-s3acommitter-hbase-nfs
Scale your Alfresco Solutions
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Track3-2: HBase at China Telecom
Hadoop ppt1
Webinar - DreamObjects/Ceph Case Study
Trend Micro Big Data Platform and Apache Bigtop

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Transform Your Business with a Software ERP System
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Softaken Excel to vCard Converter Software.pdf
PPT
Introduction Database Management System for Course Database
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administraation Chapter 3
PPTX
history of c programming in notes for students .pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
top salesforce developer skills in 2025.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Digital Strategies for Manufacturing Companies
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
medical staffing services at VALiNTRY
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Transform Your Business with a Software ERP System
CHAPTER 2 - PM Management and IT Context
Softaken Excel to vCard Converter Software.pdf
Introduction Database Management System for Course Database
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administraation Chapter 3
history of c programming in notes for students .pptx
ai tools demonstartion for schools and inter college
PTS Company Brochure 2025 (1).pdf.......
How to Choose the Right IT Partner for Your Business in Malaysia
Wondershare Filmora 15 Crack With Activation Key [2025
top salesforce developer skills in 2025.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Digital Strategies for Manufacturing Companies
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Large-scale Web Apps @ Pinterest

  • 1. Large Scale Web Apps @Pinterest (Powered by Apache HBase) May 5, 2014
  • 2. Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way. What is Pinterest ?
  • 4. Scale Challenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet
  • 5. Storage stack @Pinterest ! • MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing) App Tier Manual Sharding Sharding Logic
  • 6. Why HBase ? ! • High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk • Seamless integration with Hadoop • Distributed operation - Fault tolerance - Load Balancing - Easily add/remove nodes ! Non-Technical Reasons • Large active community • Large scale online use cases
  • 7. Outline ! • Features powered by HBase • SaaS (Storage as a Service) - MetaStore - HFile Service (Terrapin) • Our HBase setup - optimizing for High availability & Low latency
  • 8. Applications/Features ! • Offline - Analytics - Search Indexing - ETL/Hadoop worklows • Online - Personalized Feeds - Rich Pins - Recommendations ! Why HBase ?
  • 9. Personalized Feeds WHY HBASE ? Write Heavy load due to Pin fanout. Recommended Pins Users I follow
  • 10. Rich Pins WHY HBASE ? Negative Hits with Bloom Filters
  • 11. Recommendations HADOOP 1.0 HBASE + HADOOP 2.0 HADOOP 2.0 WHY HBASE ? Seamless Data Transfer from Hadoop Generate Recommendations DistCP Jobs Serving Cluster
  • 12. SaaS • Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits
  • 13. MetaStore I • Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API! ! string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value, boolean online);
  • 14. MetaStore II MetaStore Thrift Server Primary HBase Secondary HBase Clients Master/Master Replication Thrift Salting + Rate Limiting ZooKeeper Issue Gets/Sets Notifications Metastore Config - Rate Limits - Primary Cluster
  • 15. HFile Service (Terrapin) • Solve the Bulk Upload problem • HBase backed solution - Bulk upload + major compact - Major compact to delete old data • Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups ! ! !
  • 16. High Level Architecture I ! Client Library /Service ETL/Batch Jobs Load/Reload HFile Servers ! HFiles on Amazon S3 Key/Value Lookups Multiple HFiles/Server
  • 17. High Level Architecture II • Each HFile server runs 2 processes - Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client • ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data • Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client • Client library understands sharding • Old data deleted by background process !
  • 18. Salient Features • Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency ! ! !
  • 19. Setting up for Success • Many online usecases/applications • Optimize for: - Low MTTR - high availability - Low latency (performance) ! !
  • 20. MTTR - I DEADLIVE STALE 20sec 9min 40sec ! • Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery • Copying of underreplicated blocks starts when a Node is marked as “Dead” DataNode States
  • 21. MTTR - II Failure Detection Lease Recovery Log Split Recover Regions 30 sec ZooKeeper session timeout HDFS 4721 HDFS 3703 + HDFS 3912 < 2 min ! • Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts
  • 22. Simulate, Simulate, Simulate Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures
  • 23. Performance Configuration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074) Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor - Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity
  • 24. Performance (SSDs) HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput ! Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755
  • 25. In the Pipeline... ! • Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations
  • 26. And we are Hiring !!