SlideShare a Scribd company logo
HBase

Operations on EC2

Jeremy Carroll
Big Data Gurus

Pinterest Engineering
Overview

• Deployment Strategies for EC2
• Validating Design
• Production Support

Pinterest Engineering
Powered by HBase

Pinterest Engineering
Lets Deploy

• First Question Asked
• Rack Locality?
• Cloud Concepts

Pinterest Engineering
High Availability

Pinterest Engineering
Logical Separation

Pinterest Engineering
Cell Based

Pinterest Engineering
Logical Separation

Pinterest Engineering
Does This Work?

• Schema Design
• Hot Spots
• Load Testing
• Tools

Pinterest Engineering
Does This Work?

Pinterest Engineering
Compaction

Pinterest Engineering
OpenTSDB

Pinterest Engineering
Production

• Monitoring
• Alerting
• Health

Pinterest Engineering
Monitoring

Pinterest Engineering
Baselines

Pinterest Engineering
Visualization

Pinterest Engineering
Problems

Pinterest Engineering
Alerting

Pinterest Engineering
Baselines

Pinterest Engineering
Snapshots & DNS
HBASE-8473
17:10 <jeremy_carroll> jmhsieh: I think I found the root cuase. All my region servers reach the barrier, but it
does not continue.
17:11 <jeremy_carroll> jmhsieh: All RS have this in their logs:
DEBUG org.apache.hadoop.hbase.procedure.Subprocedure: Subprocedure 'backup1' coordinator notified of 'acquire',
waiting on 'reached' or 'abort' from coordinator.
17:11 <jeremy_carroll> jmhsieh: Then the coordinator (Master) never sends anything. They just sit until the
timeout.
17:12 <jeremy_carroll> jmhsieh: So basically 'reached' is never obtained. Then abort it set, and it fails.
...
17:24 <jeremy_carroll> jmhsieh: Found the bug. The hostnames dont match the master due to DNS resolution
17:25 <jeremy_carroll> jmhsieh: The barrier aquired is putting in the local hostname
from the regionservers. In EC2 (Where reverse DNS does not work well), the master hands the internal name to the
client.
17:26 <jeremy_carroll> jmhsieh: So it's waiting for something like 'ip-10-155-208-202.ec2.internal,
60020,1367366580066'
zNode to show up, but instead 'hbasemetaclustera-d1b0a484,60020,1367366580066,' is being inserted. Barrier is not
reached
17:27 <jeremy_carroll> jmhsieh: Reason being in our environment the master does not
have a reverse DNS entry. So we get stuff like this on RegionServer startup in our logs.
17:27 <jeremy_carroll> jmhsieh: 2013-05-01 00:03:00,614 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Master passed us hostname to use. Was=hbasemetaclustera-d1b0a484, Now=ip-10-155-208-202.ec2.internal
17:54 <jeremy_carroll> jmhsieh: That was it. Verified. Now that Reverse DNS is working,
snapshots are working. Now how to figure out how to get Reverse DNS working on Route53. I
wished there was something like 'slave.host.name' inside of Hadoop for this. Looking at source code.

Pinterest Engineering
Thanks!

Pinterest Engineering

More Related Content

PDF
Testing Infrastructure Code
PDF
Altitude NY 2018: Don't let the weeds overwhelm the garden
PPTX
TCPIP Networks for DBAs
PPTX
ASP.NET vNext ANUG 20140817
PDF
Paulo Lopes - 10 things I learned making the fastest js server runtime in the...
PDF
Simple cache architecture
PDF
Mad scalability: Scaling when you are not Google
PDF
Improve Magento Performance
Testing Infrastructure Code
Altitude NY 2018: Don't let the weeds overwhelm the garden
TCPIP Networks for DBAs
ASP.NET vNext ANUG 20140817
Paulo Lopes - 10 things I learned making the fastest js server runtime in the...
Simple cache architecture
Mad scalability: Scaling when you are not Google
Improve Magento Performance

What's hot (19)

PDF
6 tips for improving ruby performance
PDF
Rails Caching Secrets from the Edge
PPTX
WordPress + NGINX Best Practices with EasyEngine
PDF
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
PPTX
Boost your website by running PHP on Nginx
PDF
Running PHP on Nginx
PDF
Ansible with AWS
PDF
Keep Them out of the Database
PPTX
Cold fusion is racecar fast
PPTX
Scalable Text File Service with MongoDB (Intuit)
PDF
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
PPT
Deploy Rails Application by Capistrano
PPTX
Improve ColdFusion Performance by tuning the Connector and using ColdFusion-T...
PPTX
cPanel - Apache Global Configuration
PDF
Introduction to Chef
PDF
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
PDF
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
PDF
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
PDF
DATABASE AUTOMATION with Thousands of database, monitoring and backup
6 tips for improving ruby performance
Rails Caching Secrets from the Edge
WordPress + NGINX Best Practices with EasyEngine
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
Boost your website by running PHP on Nginx
Running PHP on Nginx
Ansible with AWS
Keep Them out of the Database
Cold fusion is racecar fast
Scalable Text File Service with MongoDB (Intuit)
Caching the Uncacheable: Leveraging Your CDN to Cache Dynamic Content
Deploy Rails Application by Capistrano
Improve ColdFusion Performance by tuning the Connector and using ColdFusion-T...
cPanel - Apache Global Configuration
Introduction to Chef
MongoDB .local Bengaluru 2019: Becoming an Ops Manager Backup Superhero!
Stupid Boot Tricks: using ipxe and chef to get to boot management bliss
Asynchronous Processing with Ruby on Rails (RailsConf 2008)
DATABASE AUTOMATION with Thousands of database, monitoring and backup
Ad

Viewers also liked (6)

PDF
Implementing a Population Health Model (Hon Pak)
PDF
Big data: current technology scope.
PPTX
HBase Low Latency
PDF
Core concepts and Key technologies - Big Data Analytics
PPTX
Hadoop hbase mapreduce
PPTX
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Implementing a Population Health Model (Hon Pak)
Big data: current technology scope.
HBase Low Latency
Core concepts and Key technologies - Big Data Analytics
Hadoop hbase mapreduce
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Ad

Similar to Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol (20)

PDF
Scaling HBase at Pinterest
PPTX
Operating and Supporting Apache HBase Best Practices and Improvements
PPTX
Operating and supporting HBase Clusters
PPTX
DNS_Tutorial 2.pptx
PDF
2017 DNSSEC KSK Rollover
PDF
dns-sec-4-slides
PDF
High Availability in GCE
PDF
Pl2017 High Availability in GCE
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PDF
HTTP logging with varnishlog (Brussels PHP 2022)
PDF
Introduction to Infrastructure as Code & Automation / Introduction to Chef
PDF
React Server Side Rendering with Next.js
PDF
MCITP
PPTX
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
PDF
Introduction to Infrastructure as Code & Automation / Introduction to Chef
PPTX
More tips and tricks for running containers like a pro - Rancher Online MEetu...
PDF
HBase tales from the trenches
PDF
Intro ProxySQL
PPS
Pmw2 k3ni 1-3a
Scaling HBase at Pinterest
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and supporting HBase Clusters
DNS_Tutorial 2.pptx
2017 DNSSEC KSK Rollover
dns-sec-4-slides
High Availability in GCE
Pl2017 High Availability in GCE
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
HTTP logging with varnishlog (Brussels PHP 2022)
Introduction to Infrastructure as Code & Automation / Introduction to Chef
React Server Side Rendering with Next.js
MCITP
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Introduction to Infrastructure as Code & Automation / Introduction to Chef
More tips and tricks for running containers like a pro - Rancher Online MEetu...
HBase tales from the trenches
Intro ProxySQL
Pmw2 k3ni 1-3a

More from Hakka Labs (20)

PDF
Always Valid Inference (Ramesh Johari, Stanford)
PPTX
DataEngConf SF16 - High cardinality time series search
PDF
DataEngConf SF16 - Data Asserts: Defensive Data Science
PDF
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
PDF
DataEngConf SF16 - Recommendations at Instacart
PDF
DataEngConf SF16 - Running simulations at scale
PDF
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
PDF
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
PDF
DataEngConf SF16 - Three lessons learned from building a production machine l...
PDF
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
PDF
DataEngConf SF16 - Bridging the gap between data science and data engineering
PDF
DataEngConf SF16 - Multi-temporal Data Structures
PDF
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
PDF
DataEngConf SF16 - Beginning with Ourselves
PDF
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
PDF
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
PDF
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
PDF
DataEngConf SF16 - Spark SQL Workshop
Always Valid Inference (Ramesh Johari, Stanford)
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - Data Asserts: Defensive Data Science
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Spark SQL Workshop

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
The Rise and Fall of 3GPP – Time for a Sabbatical?
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools

Scaling HBase (nosql store) to handle massive loads at Pinterest by Jeremy Carol