SlideShare a Scribd company logo
2015
High Availability and
High Frequency Big
Data Analytics
Esther Kundin
Bloomberg LP
10/15/2015
#GHC15
2015
2015
Outline
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
The Problem Space
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
The Problem Space
2015
The Problem Space
 Total data set: 2 TB – roughly 2x1013 data points
− “medium data”
 Average Write: 4 billion data points a day
 Average read: 140 trillion data points a day
 Read/Write latency: 50 ms
 Read throughput: 3 trillion points in the peak
minute – 2000 bulk requests
 Allowable downtime < read latency
2015
High Availability – Pain Points and Solutions
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
High Availability - Major Points of
Failure
Client
HDFS
RegionServer RegionServer RegionServer
Meta Region
Server
2015
High Availability – Solution
HBASE-10070
Client
HDFS
RegionServer 1 RegionServer 2 RegionServer 3
Meta Region
Server
SecondaryRegion
Server 1
SecondaryRegion
Server 2
SecondaryRegion
Server 3
Secondary Meta
Region Server
2015
High Availability Across Data
Centers
 3 Options
− HBASE-12259 – HydraBase integration – HBASE +
Raft – In Progress
− Cloudera BDR in Cloudera Enterprise 5 – Not
Open Source
− Roll Your Own!
2015
Replication Across Data Centers
HBase 1 HBase 2
Writer1 Writer2
Reader1 Reader2
Global ZK
Replication
2015
High Frequency – Pain Points and
Solutions
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
HA to remove fat tails
0
2
4
6
8
10
12
50 60 80 90 95 99
Latencyinms
Percentile
Avg Latency per-Get Distribution
2015
High Frequency – Pain Points
 Speed bounded by slowest responding region
server
 Garbage Collection causes spikes in latency
2015
The Art of Fine Tuning
 Use Data to set your heuristics
− Identify repeatable base-line tests
− Identify performance parameters
− Tweak one setting at a time
2015
Tuning Your DB – Garbage Collection
 What Did Not Work
− Stop The World
− Small Memory Footprint – 4GB
− Synchronized GC via coprocessors
 What worked for us:
− CMS – shorter pauses
− Very large memory footprint – 28GB
− Read from backup RS when GC in progress
2015
Takeaways
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
Takeaways
 High Availability can solve most availability
and latency concerns
 Multiple Data Center Support Needed
 Tune those settings!
2015
Questions?
 The Problem Space
 High Availability
 High Frequency
 Takeaways
 Questions
2015
Resources:
Tuning Your DB – What to Tweak
 Key Design
 Column Family Design
 hbase_site.xml - Lots of configuration to try!
 Bloom Filters
 Short-Circuit Reads
 Block Cache
 Scheduling Major Compactions Judiciously
2015
Got Feedback?
Rate and review the session on our mobile app
Download at http://ddut.ch/ghc15
or search GHC 2015 in the app store

More Related Content

PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PPTX
HBase in Practice
PPTX
HBase at Bloomberg: High Availability Needs for the Financial Industry
PDF
HBase: Extreme Makeover
PPTX
Digital Library Collection Management using HBase
PPTX
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
PDF
HBase Status Report - Hadoop Summit Europe 2014
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
HBase in Practice
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase: Extreme Makeover
Digital Library Collection Management using HBase
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBase Status Report - Hadoop Summit Europe 2014

What's hot (20)

PPTX
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
PDF
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
PPTX
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PPTX
HBaseCon 2015: HBase and Spark
PPT
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
PDF
HBase Applications - Atlanta HUG - May 2014
PPTX
HBase: Where Online Meets Low Latency
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PDF
HBase Read High Availability Using Timeline-Consistent Region Replicas
PPTX
HBase Backups
PPTX
Content Identification using HBase
PPTX
High Availability for HBase Tables - Past, Present, and Future
PDF
HBaseCon 2015: Elastic HBase on Mesos
PPTX
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
PDF
HBase Sizing Guide
PPTX
HBaseCon 2015: State of HBase Docs and How to Contribute
PDF
Large-scale Web Apps @ Pinterest
PPTX
Keynote: The Future of Apache HBase
PDF
Tales from the Cloudera Field
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2015: HBase at Scale in an Online and High-Demand Environment
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon 2015: HBase and Spark
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBase Applications - Atlanta HUG - May 2014
HBase: Where Online Meets Low Latency
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Backups
Content Identification using HBase
High Availability for HBase Tables - Past, Present, and Future
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBase Sizing Guide
HBaseCon 2015: State of HBase Docs and How to Contribute
Large-scale Web Apps @ Pinterest
Keynote: The Future of Apache HBase
Tales from the Cloudera Field
Ad

Viewers also liked (12)

PPT
Evolutionary ideas
PDF
Untitled Presentation
PDF
Договор аренды квартиры
ODP
Media Studies alternate rock
PDF
Fabiorodriguez mandala
PPTX
Pekhawar
PPTX
Jennie Stephens - Myra Kraft Open Classroom - October 26, 2016
PPTX
¿Qué llevas en tu móvil o mp3?
DOCX
ใบงาน
PPTX
Satara
DOC
Giao an trinh_pascal_bai_tap_co_dap_an_huong_dan
Evolutionary ideas
Untitled Presentation
Договор аренды квартиры
Media Studies alternate rock
Fabiorodriguez mandala
Pekhawar
Jennie Stephens - Myra Kraft Open Classroom - October 26, 2016
¿Qué llevas en tu móvil o mp3?
ใบงาน
Satara
Giao an trinh_pascal_bai_tap_co_dap_an_huong_dan
Ad

Similar to 2015 GHC Presentation - High Availability and High Frequency Big Data Analytics (20)

PDF
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
PPT
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
PPTX
HBase Low Latency, StrataNYC 2014
PPTX
Always On: Building Highly Available Applications on Cassandra
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
PPTX
HBase at Flurry
PPTX
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
PPTX
HBase Low Latency
PPTX
NoSQL with Microsoft Azure
PDF
HBase Application Performance Improvement
PPTX
Scaling HBase for Big Data
PPTX
HBaseCon 2015: HBase 2.0 and Beyond Panel
PPT
My other computer is a datacentre - 2012 edition
PPTX
Storage Infrastructure Behind Facebook Messages
PDF
Hadoop at datasift
PDF
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
PPTX
HBase Accelerated: In-Memory Flush and Compaction
PDF
In-Memory Databases, Trends and Technologies (2012)
PPTX
fpga2014-wjun.pptx
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster
HBASE by Nicolas Liochon - Meetup HUGFR du 22 Sept 2014
HBase Low Latency, StrataNYC 2014
Always On: Building Highly Available Applications on Cassandra
Rolling Out Apache HBase for Mobile Offerings at Visa
HBase at Flurry
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
HBase Low Latency
NoSQL with Microsoft Azure
HBase Application Performance Improvement
Scaling HBase for Big Data
HBaseCon 2015: HBase 2.0 and Beyond Panel
My other computer is a datacentre - 2012 edition
Storage Infrastructure Behind Facebook Messages
Hadoop at datasift
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBase Accelerated: In-Memory Flush and Compaction
In-Memory Databases, Trends and Technologies (2012)
fpga2014-wjun.pptx

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
sap open course for s4hana steps from ECC to s4
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

2015 GHC Presentation - High Availability and High Frequency Big Data Analytics

  • 1. 2015 High Availability and High Frequency Big Data Analytics Esther Kundin Bloomberg LP 10/15/2015 #GHC15 2015
  • 2. 2015 Outline  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 3. 2015 The Problem Space  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 5. 2015 The Problem Space  Total data set: 2 TB – roughly 2x1013 data points − “medium data”  Average Write: 4 billion data points a day  Average read: 140 trillion data points a day  Read/Write latency: 50 ms  Read throughput: 3 trillion points in the peak minute – 2000 bulk requests  Allowable downtime < read latency
  • 6. 2015 High Availability – Pain Points and Solutions  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 7. 2015 High Availability - Major Points of Failure Client HDFS RegionServer RegionServer RegionServer Meta Region Server
  • 8. 2015 High Availability – Solution HBASE-10070 Client HDFS RegionServer 1 RegionServer 2 RegionServer 3 Meta Region Server SecondaryRegion Server 1 SecondaryRegion Server 2 SecondaryRegion Server 3 Secondary Meta Region Server
  • 9. 2015 High Availability Across Data Centers  3 Options − HBASE-12259 – HydraBase integration – HBASE + Raft – In Progress − Cloudera BDR in Cloudera Enterprise 5 – Not Open Source − Roll Your Own!
  • 10. 2015 Replication Across Data Centers HBase 1 HBase 2 Writer1 Writer2 Reader1 Reader2 Global ZK Replication
  • 11. 2015 High Frequency – Pain Points and Solutions  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 12. 2015 HA to remove fat tails 0 2 4 6 8 10 12 50 60 80 90 95 99 Latencyinms Percentile Avg Latency per-Get Distribution
  • 13. 2015 High Frequency – Pain Points  Speed bounded by slowest responding region server  Garbage Collection causes spikes in latency
  • 14. 2015 The Art of Fine Tuning  Use Data to set your heuristics − Identify repeatable base-line tests − Identify performance parameters − Tweak one setting at a time
  • 15. 2015 Tuning Your DB – Garbage Collection  What Did Not Work − Stop The World − Small Memory Footprint – 4GB − Synchronized GC via coprocessors  What worked for us: − CMS – shorter pauses − Very large memory footprint – 28GB − Read from backup RS when GC in progress
  • 16. 2015 Takeaways  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 17. 2015 Takeaways  High Availability can solve most availability and latency concerns  Multiple Data Center Support Needed  Tune those settings!
  • 18. 2015 Questions?  The Problem Space  High Availability  High Frequency  Takeaways  Questions
  • 19. 2015 Resources: Tuning Your DB – What to Tweak  Key Design  Column Family Design  hbase_site.xml - Lots of configuration to try!  Bloom Filters  Short-Circuit Reads  Block Cache  Scheduling Major Compactions Judiciously
  • 20. 2015 Got Feedback? Rate and review the session on our mobile app Download at http://ddut.ch/ghc15 or search GHC 2015 in the app store

Editor's Notes

  • #9: Added back in to major release 2.2 based on feedback from Bloomberg !HBASE-10070 MTTR GC Throughput, all fixed! Soon, Rack-Aware H B A S E ­ 7 5 0 9 – same thing but at HDFS level Need to enable on the hbase_site.xml, at the table level, and update your get requets with get.setConsistency(CONSISTENCY.TIMELINE);.
  • #11: Consistency will be the same as with one data center – last writer wins, just like in one-cloud hbase Latency would be the same as with any multi-writer multi-datacenter setup
  • #15: This is where most of the grunt work is. Exhaustive testing that tweaks one parameter at a time was needed to figure out the best settings to use. Very data-driven process Still a work in progress.
  • #21: This is the last slide and must be included in the slide deck