SlideShare a Scribd company logo
HBase @Twitter
@gario @ctrezzo
HBase Meetup 7/16
Agenda
● Infrastructure overview
● Example use cases
● hRaven
Infrastructure Overview
● HBase/Hadoop versions
○ HBase 0.94.x
○ Hadoop 2.0
● PROC, DW, TST, EXP
● Puppet
○ Config management
○ Packaging/Deployment (RPMs)
○ Rolling Upgrades
● Using replication for data movement between PROC
and DW
Infrastructure Overview
Major Use Cases
● Mutable data store for batch processing
● Operational Intelligence
● Monitoring/Metrics
Mutable data store for batch
processing
● Tables copied from MySQL
○ Allowing for incremental loads
● MapReduce jobs over data in HBase
● Snapshot of data copied into HDFS for processing
○ HBASE-8369 will optimize this
Operational Intelligence
● DCEvents - Audit log for changes in production
● TCC big users of python!
○ HappyBase
○ Thrift Gateway
Monitoring/Metrics
https://github.com/twitter/hRaven
● Stores stats, configuration and timing for every map
reduce job on every cluster
● Structured around the full DAG of jobs from a Pig or
Scalding application
● Easily queryable for historical trending
● Allows for Pig reducer optimization based on historical
run stats
● Keep data online forever (12.6M jobs, 4.5B tasks +
attempts)
hRaven: Why?
● cluster - each cluster has a unique name mapping to
the Job Tracker
● user - map reduce jobs are run as a given user
● application - a Pig or Scalding script (or plain map
reduce job)
● flow - the combined DAG of jobs executed from a
single run of an application
● version - changes impacting the DAG are recorded as
a new version of the same application
hRaven: Key Concepts
hRaven: Application Flows
hRaven: Application Flows
● All jobs in a flow are ordered together
hRaven: Flow Storage
● Most recent flow is ordered first
hRaven: Flow Storage
● All jobs in a flow are ordered together
● Per-job metrics stored
○ Total map and reduce tasks
○ HDFS bytes read / written
○ File bytes read / written
○ Total map and reduce slot milliseconds
● Easy to aggregate stats for an entire flow
● Easy to scan the timeseries of each application’s flows
hRaven: Key Features
● Pig reducer optimizations
● Cluster utilization / capacity planning
● Application performance trending over time
● Identifying common job anti-patterns
● Ad-hoc analysis troubleshooting cluster problems
hRaven: Current Uses
hRaven: Current Uses
hRaven: Current Uses
● HBase 0.96 on Hadoop 2.0
● Flow centric hRaven UI
● Improvements to HBase replication
Future Work
Questions?
We are Hiring!
http://twitter.com/jobs
@JoinTheFlock

More Related Content

PDF
Storage Infrastructure Behind Facebook Messages
PPTX
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
PDF
Storage infrastructure using HBase behind LINE messages
PDF
Hadoop Distributed File System Reliability and Durability at Facebook
PDF
Facebook keynote-nicolas-qcon
PDF
Hadoop Successes and Failures to Drive Deployment Evolution
PPTX
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
PDF
Meet HBase 1.0
Storage Infrastructure Behind Facebook Messages
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Storage infrastructure using HBase behind LINE messages
Hadoop Distributed File System Reliability and Durability at Facebook
Facebook keynote-nicolas-qcon
Hadoop Successes and Failures to Drive Deployment Evolution
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Meet HBase 1.0

What's hot (19)

PPTX
HBase Accelerated: In-Memory Flush and Compaction
ODP
Hug Hbase Presentation.
PDF
Apache HBase for Architects
PDF
[Hi c2011]building mission critical messaging system(guoqiang jerry)
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PPTX
Apache HBase Performance Tuning
PPTX
Operating and supporting HBase Clusters
PPTX
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
PDF
HBase Application Performance Improvement
PPTX
HBaseCon 2013: Compaction Improvements in Apache HBase
PDF
Apache HBase 1.0 Release
PDF
HBase Advanced - Lars George
PDF
HBase Storage Internals
PPTX
HBase Read High Availability Using Timeline Consistent Region Replicas
PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
PDF
HBase 0.20.0 Performance Evaluation
PPTX
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
PDF
HBaseCon 2015: Elastic HBase on Mesos
HBase Accelerated: In-Memory Flush and Compaction
Hug Hbase Presentation.
Apache HBase for Architects
[Hi c2011]building mission critical messaging system(guoqiang jerry)
HBase and HDFS: Understanding FileSystem Usage in HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
Apache HBase Performance Tuning
Operating and supporting HBase Clusters
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBase Application Performance Improvement
HBaseCon 2013: Compaction Improvements in Apache HBase
Apache HBase 1.0 Release
HBase Advanced - Lars George
HBase Storage Internals
HBase Read High Availability Using Timeline Consistent Region Replicas
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBase 0.20.0 Performance Evaluation
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Elastic HBase on Mesos
Ad

Similar to HBase @ Twitter (20)

PPTX
Hadoop 2 @ Twitter, Elephant Scale
PPTX
Hadoop 2 @Twitter, Elephant Scale. Presented at
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PDF
NetFlow Data processing using Hadoop and Vertica
PDF
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
PDF
LAS16-305: Smart City Big Data Visualization on 96Boards
PDF
Savanna - Elastic Hadoop on OpenStack
PDF
Apache Tez : Accelerating Hadoop Query Processing
PPTX
Big Data Processing
PDF
Hadoop 3 @ Hadoop Summit San Jose 2017
PDF
Apache Hadoop 3.0 Community Update
ODP
Glusterfs and Hadoop
PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
PDF
Nicholas:hdfs what is new in hadoop 2
PDF
Upcoming features in Airflow 2
PPTX
PPTX
ApacheCon 2022_ Large scale unification of file format.pptx
PDF
Spark Driven Big Data Analytics
PDF
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
PDF
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @Twitter, Elephant Scale. Presented at
Apache Hadoop YARN - The Future of Data Processing with Hadoop
NetFlow Data processing using Hadoop and Vertica
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
LAS16-305: Smart City Big Data Visualization on 96Boards
Savanna - Elastic Hadoop on OpenStack
Apache Tez : Accelerating Hadoop Query Processing
Big Data Processing
Hadoop 3 @ Hadoop Summit San Jose 2017
Apache Hadoop 3.0 Community Update
Glusterfs and Hadoop
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Nicholas:hdfs what is new in hadoop 2
Upcoming features in Airflow 2
ApacheCon 2022_ Large scale unification of file format.pptx
Spark Driven Big Data Analytics
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
An Introduction to Impala – Low Latency Queries for Apache Hadoop
Ad

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Electronic commerce courselecture one. Pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Per capita expenditure prediction using model stacking based on satellite ima...
Electronic commerce courselecture one. Pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.

HBase @ Twitter