SlideShare a Scribd company logo
HBASE STATUS QUO
The State of Affairs in HBase Land
ApacheCon Europe, November 2012

Lars George
Director EMEA Services
About Me

•  Director EMEA Services @ Cloudera
    •  Consulting on Hadoop projects (everywhere)
•  Apache Committer
    •  HBase and Whirr
•  O’Reilly Author
    •  HBase – The Definitive Guide
      •  Now in Japanese!

•  Contact
    •  lars@cloudera.com                      日本語版も出ました!	
  
    •  @larsgeorge
Agenda

•  HDFS and HBase
•  HBase Project Status
HDFS AND HBASE
Past, Presence, Future
Framework for Discussion

•  Time Periods
    •  Past (Hadoop pre-1.0)
    •  Present (Hadoop 1.x, 2.0)
    •  Future (Hadoop 2.x and later)


•  Categories
    •  Reliability/Availability
    •  Performance
    •  Feature Set
HDFS and HBase History – 2006



Author: Douglass Cutting <cutting@apache.org> !
Date: Fri Jan 27 22:19:42 2006 +0000 !
!
   Create hadoop sub-project. !
!
HDFS and HBase History – 2007



Author: Douglass Cutting <cutting@apache.org>!
Date: Tue Apr 3 20:34:28 2007 +0000 !
  !!
   HADOOP-1045. Add contrib/hbase, a 

   BigTable-like online database. !
!
HDFS and HBase History – 2008



Author: Jim Kellerman <jimk@apache.org> !
Date: Tue Feb 5 02:36:26 2008 +0000 !
!
   2008/02/04 HBase is now a subproject of
   Hadoop. The first HBase release as a
   subproject will be release 0.1.0 which will
   be equivalent to the version of HBase
   included in Hadoop 0.16.0... !
HDFS and HBase History – Early 2010

HBase has been around for 3 years. But HDFS still
acts like MapReduce is the only important client!




        People	
  have	
  accused	
  HDFS	
  of	
  being	
  like	
  a	
  molasses	
  train:	
  
                         High	
  throughput	
  but	
  not	
  so	
  fast	
  
HDFS and HBase History – 2010

•  HBase becomes a top-level project
•  Facebook chooses HBase for Messages product
•  Jump from HBase 0.20 to HBase 0.89 and 0.90
•  First CDH3 betas include HBase
•  HDFS community starts to work on features for
 HBase.
  •  Infamous
    hadoop-0.20-append 

    branch
WHAT DID GET DONE?
And where is it going?
Reliability in the Past: Hadoop 1.0

•  Pre-1.0, if the DN crashed, HBase would lose its
 WALs (and your beloved data).
  •  1.0 integrated hadoop-0.20-append branch into a
     main-line release
  •  True durability support for HBase
  •  We have a fighting chance at metadata reliability!
•  Numerous bug fixes for write pipeline recovery
 and other error paths
  •  HBase is not nearly so forgiving as MapReduce!
  •  “Single-writer” fault tolerance vs. “job-level” fault
    tolerance
Reliability in the Past: Hadoop 1.0

•  Pre-1.0: if any disk failed, entire DN would go
 offline
   •  Problematic for HBase: local RS would lose all
      locality!
   •  1.0: per-disk failure detection in DN (HDFS-457)
   •  Allows HBase to lose a disk without losing all locality


•  Tip: Configure
   dfs.datanode.failed.volumes.tolerated = 1 !
Reliability Today: Hadoop 2.0

•  Integrates Highly Available HDFS
•  Active-standby hot failover removes SPOF
•  Transparent to clients: no HBase changes
   necessary
•  Tested extensively under HBase read/write
   workloads
•  Coupled with HBase master failover, no more
   HBase SPOF!
HDFS HA
Reliability in the Future: HA in 2.x

•  Remove dependency on NFS (HDFS-3077)
    •  Quorum-commit protocol for NameNode edit logs
    •  Similar to ZAB/Multi-Paxos


•  Automatic failover for HA NameNodes
 (HDFS-3042)
   •  ZooKeeper-based master election, just like HBase
   •  Merged to trunk
Other Reliability Work for HDFS 2.x

•  2.0: current hflush() API only guarantees data
   is replicated to three machines – not fully on disk.
•  A cluster-wide power outage can lose data.
   •  Upcoming in 2.x: Support for hsync()
      (HDFS-744, HBASE-5954)
   •  Calls fsync() for all replicas of the WAL
   •  Full durability of edits, even with full cluster power
      outages
hflush() and hsync()




hflush()/hsync()                       hflush(): flushes to OS buffer
- send all queued data, note seq num   hsync(): fsync() to disk
- block until corresponding ACK is
received
HDFS Wire Compatibility in Hadoop 2.0

•  In 1.0: HDFS client version must match server
   version closely.
•  How many of you have manually copied HDFS
   client jars?
•  Client-server compatibility in 2.0:
   •  Protobuf-based RPC
   •  Easier HBase installs: no more futzing with jars
   •  Separate HBase upgrades from HDFS upgrades
•  Intra-cluster server compatibility in the works
     •  Allow for rolling upgrade without downtime
Performance: Hadoop 1.0

•  Pre-1.0: even for reads from local machine, client
   connects to DN via TCP
•  1.0: Short-circuit local reads
   •  Obtains direct access to underlying local block file, then
      uses regular FileInputStream access.
   •  2x speedup for random reads
•  Configure
      dfs.client.read.shortcircuit = true!
      dfs.block.local-path-access.user = hbase!
      dfs.datanode.data.dir.perm = 755!
•  Note: Currently does not support security
Performance: Hadoop 2.0

•  Pre-2.0: Up to 50% CPU spent verifying CRC
•  2.0: Native checksums using SSE4.2 crc32 asm
 (HDFS-2080)
   •  2.5x speedup reading from buffer cache
   •  Now only 15% CPU overhead to checksumming
•  Pre-2.0: re-establishes TCP connection to DN for
   each seek
•  2.0: Rewritten BlockReader, keep-alive to DN
   (HDFS-941)
   •  40% improvement on random read for HBase
   •  2-2.5x in micro-benchmarks
•  Total improvement vs. 0.20.2: 3.4x!
Performance: Hadoop 2.x

•  Currently: lots of CPU spent copying data in
   memory
•  “Direct-read” API: read directly into user-provided
   DirectByteBuffers (HDFS-2834)
   •  Another ~2x improvement to sequential throughput
      reading from cache
   •  Opportunity to avoid two more buffer copies reading
      compressed data (HADOOP-8148)
   •  Codec APIs still in progress, needs integration into
      HBase
Performance: Hadoop 2.x

•  True “zero-copy read” support (HDFS-3051)
    •  New API would allow direct access to mmaped block
       files
    •  No syscall or JNI overhead for reads
    •  Initial benchmarks indicate at least ~30% gain.
    •  Some open questions around best safe
       implementation
Current Read Path
Proposed Read Path
Performance: Why Emphasize CPU?

•  Machines with lots of RAM now inexpensive
   (48-96GB common)
•  Want to use that to improve cache hit ratios.
•  Unfortunately, 50GB+ Java heaps still impractical
   (GC pauses too long)
•  Allocate the extra RAM to the buffer cache
  •  OS caches compressed data: another win!
•  CPU overhead reading from buffer cache
 becomes limiting factor for read workloads
What’s Up Next in 2.x?

•  HDFS Hard-links (HDFS-3370)
    •  Will allow for HBase to clone/snapshot tables
       efficiently!
    •  Improves HBase table-scoped backup story
•  HDFS Snapshots (HDFS-2802)
    •  HBase-wide snapshot support for point-in-time
       recovery
    •  Enables consistent backups copied off-site for DR
What’s Up Next in 2.x?

•  Improved block placement policies (HDFS-1094)
    •  Fundamental tradeoff between probability of data
       unavailability and the amount of data that becomes
       unavailable
    •  Current scheme: if any 3 nodes not on the same rack
       die, some very small amount of data is unavailable
    •  Proposed scheme: lessen chances of unavailability,
       but if a certain three nodes die, a larger amount is
       unavailable
    •  For many HBase applications: any single lost block
       halts whole operation. Prefer to minimize probability.
What’s Up Next in 2.x?

•  HBase-specific block placement hints
 (HBASE-4755)
  •  Assign each region a set of three RS (primary and
     two backups)
  •  Place underlying data blocks on these three DNs
  •  Could then fail-over and load-balance without losing
     any locality!
Summary

                          Hadoop	
  1.0	
               Hadoop	
  2.0	
                Hadoop	
  2.x	
  
  Availability	
   •  DN	
  volume	
  failure	
   •  NameNode	
  HA	
         •  HA	
  without	
  NAS	
  
                      isola>on	
                  •  Wire	
  Compa>bility	
   •  Rolling	
  upgrade	
  
Performance	
   •  Short-­‐circuit	
  reads	
   •  Na>ve	
  CRC	
              •  Direct-­‐read	
  API	
  
                                                •  DN	
  keep-­‐alive	
        •  Zero-­‐copy	
  API	
  
                                                                               •  Direct	
  codec	
  API	
  
     Features	
   •  Durable	
                                                 •    hsync()!
                     hflush()!                                                 •    Snapshots	
  
                                                                               •    Hard	
  links	
  
                                                                               •    HBase-­‐aware	
  block	
  
                                                                                    placement	
  
Summary

•  HBase is no longer a second-class citizen.
•  We’ve come a long way since Hadoop 0.20.2 in
   performance, reliability, and availability.
•  New features coming in the 2.x line specifically to
   benefit HBase use cases
•  Hadoop 2.0 features available today via CDH4.
   Many Cloudera customers already using CDH4
   with HBase with great success.
PROJECT STATUS
Current Project Status

•  HBase 0.90.x “Advanced Concepts”
    •  Master Rewrite – More Zookeeper
    •  Intra Row Scanning
    •  Further optimizations on algorithms and data
       structures

         CDH3
Current Project Status

•  HBase 0.92.x “Coprocessors”
    •  Multi-DC Replication
    •  Discretionary Access Control
    •  Coprocessors
      •  Endpoints and Observers
      •  Can hook into many explicit and implicit operation


        CDH4
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Read CRC Improvements
    •  Seek Optimizations
     •  Lazy Seeks
  •  WAL Compression
  •  Prefix Compression (aka Block Encoding)


  •  Atomic Append
  •  Atomic put+delete
  •  Multi Increment and Multi Append
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Per-region (i.e. local) Multi-Row Transactions


   RegionMutation rm = new RegionMutation();!
   Put p = new Put(ROW1);!
   p.add(FAMILY, QUALIFIER, VALUE);!
   rm.add(p);!
   p = new Put(ROW2);!
   p.add(FAMILY, QUALIFIER, VALUE);!
   rm.add(p);!
   t.mutateRegion(rm);!
Current Project Status (cont.)

•  HBase 0.94.x “Performance Release”
    •  Uber HBCK
    •  Embedded Thrift Server
    •  WALPlayer
    •  Enable/disable Replication Streams

        CDH4.x   (soon)
Current Project Status (cont.)

•  HBase 0.96.x “The Singularity”
    •  Protobuf RPC
      •  Rolling Upgrades
      •  Multiversion Access
  •  Metrics V2
  •  Preview Technologies
      •  Snapshots
      •  PrefixTrie Block Encoding



        CDH5 ?
Client/Server Compatibility Matrix


RegionServer,	
  
                         Client	
  0.96.0	
  	
           Client	
  0.96.1	
  	
                Client	
  0.98.0	
      Client	
  1.0.0	
  	
  
Master	
  
	
  	
  0.96.0	
  	
          Works	
  	
                       Works*	
                           Works*	
            No	
  guarantee	
  
	
  	
  0.96.1	
  	
          Works	
  	
                         Works	
  	
                      Works*	
            No	
  guarantee	
  
	
  	
  0.98.0	
              Works	
  	
                         Works	
  	
                       Works	
  	
            Works*	
  	
  
	
  	
  1.0.0	
          No	
  guarantee	
               No	
  guarantee	
                          Works	
                 Works	
  

                                 Notes:	
  *	
  If	
  new	
  features	
  are	
  not	
  used	
  
                                 	
  




             39                                     ©2012 Cloudera, Inc. All Rights Reserved.
QuesGons?	
  

More Related Content

PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
PDF
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
PDF
Data Evolution in HBase
PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
PDF
Keynote: Getting Serious about MySQL and Hadoop at Continuent
PDF
Application Architectures with Hadoop - Big Data TechCon SF 2014
PPTX
Hadoop from Hive with Stinger to Tez
PPTX
Real Time and Big Data – It’s About Time
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
HBaseCon 2015: S2Graph - A Large-scale Graph Database with HBase
Data Evolution in HBase
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Application Architectures with Hadoop - Big Data TechCon SF 2014
Hadoop from Hive with Stinger to Tez
Real Time and Big Data – It’s About Time

What's hot (20)

PDF
Introduction To Hadoop Ecosystem
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
PPTX
Hive at Yahoo: Letters from the trenches
PDF
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
PDF
Application architectures with Hadoop – Big Data TechCon 2014
PPTX
February 2014 HUG : Tez Details and Insides
PDF
2013 July 23 Toronto Hadoop User Group Hive Tuning
PDF
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
PDF
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
PPTX
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
PDF
Hadoop meets Agile! - An Agile Big Data Model
PDF
Application Architectures with Hadoop
PDF
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
PPTX
Architecting a Fraud Detection Application with Hadoop
PPTX
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
PPTX
Hadoop And Their Ecosystem
PDF
Welcome to Hadoop2Land!
PPTX
Apache drill
PPTX
New Directions for Mahout
Introduction To Hadoop Ecosystem
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Hive at Yahoo: Letters from the trenches
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Application architectures with Hadoop – Big Data TechCon 2014
February 2014 HUG : Tez Details and Insides
2013 July 23 Toronto Hadoop User Group Hive Tuning
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
Hadoop meets Agile! - An Agile Big Data Model
Application Architectures with Hadoop
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Architecting a Fraud Detection Application with Hadoop
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
Hadoop And Their Ecosystem
Welcome to Hadoop2Land!
Apache drill
New Directions for Mahout
Ad

Viewers also liked (7)

PPTX
中德文化比較
PPTX
Approaching real-time-hadoop
PPT
Applying Media Content Analysis to the Production of Musical Videos as Summar...
PDF
Wissbi osdc pdf
PPTX
20130310 solr tuorial
PPTX
Real time big data applications with hadoop ecosystem
PDF
Scaling big-data-mining-infra2
中德文化比較
Approaching real-time-hadoop
Applying Media Content Analysis to the Production of Musical Videos as Summar...
Wissbi osdc pdf
20130310 solr tuorial
Real time big data applications with hadoop ecosystem
Scaling big-data-mining-infra2
Ad

Similar to Hbase status quo apache-con europe - nov 2012 (20)

PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
PPTX
Meet hbase 2.0
PPTX
Meet HBase 2.0
PPTX
Meet Apache HBase - 2.0
PDF
Facebook - Jonthan Gray - Hadoop World 2010
PDF
HBaseConAsia2018 Keynote1: Apache HBase Project Status
PDF
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
PPTX
Meet HBase 2.0 and Phoenix-5.0
PDF
HBase User Group #9: HBase and HDFS
PPTX
HBase state of the union
PPTX
Apache HBase: State of the Union
POTX
Meet HBase 2.0 and Phoenix 5.0
PPTX
HBase New Features
 
PDF
Nyc hadoop meetup introduction to h base
PDF
Facebook keynote-nicolas-qcon
PDF
Facebook Messages & HBase
PDF
支撑Facebook消息处理的h base存储系统
PDF
Apache Big Data EU 2015 - HBase
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PDF
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
Meet hbase 2.0
Meet HBase 2.0
Meet Apache HBase - 2.0
Facebook - Jonthan Gray - Hadoop World 2010
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
Meet HBase 2.0 and Phoenix-5.0
HBase User Group #9: HBase and HDFS
HBase state of the union
Apache HBase: State of the Union
Meet HBase 2.0 and Phoenix 5.0
HBase New Features
 
Nyc hadoop meetup introduction to h base
Facebook keynote-nicolas-qcon
Facebook Messages & HBase
支撑Facebook消息处理的h base存储系统
Apache Big Data EU 2015 - HBase
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...

More from Chris Huang (20)

PDF
Data compression, data security, and machine learning
PDF
Kks sre book_ch10
PDF
Kks sre book_ch1,2
PDF
Hbase schema design and sizing apache-con europe - nov 2012
PPTX
重構—改善既有程式的設計(chapter 12,13)
PPTX
重構—改善既有程式的設計(chapter 10)
PPTX
重構—改善既有程式的設計(chapter 9)
PPTX
重構—改善既有程式的設計(chapter 8)part 2
PPTX
重構—改善既有程式的設計(chapter 8)part 1
PPTX
重構—改善既有程式的設計(chapter 7)
PPTX
重構—改善既有程式的設計(chapter 6)
PPTX
重構—改善既有程式的設計(chapter 4,5)
PPTX
重構—改善既有程式的設計(chapter 2,3)
PPTX
重構—改善既有程式的設計(chapter 1)
PDF
Designs, Lessons and Advice from Building Large Distributed Systems
PPTX
Hw5 my house in yong he
PPTX
Social English Class HW4
PPTX
Social English Class HW3
PDF
Sm Case1 Ikea
PPTX
火柴人的故事
Data compression, data security, and machine learning
Kks sre book_ch10
Kks sre book_ch1,2
Hbase schema design and sizing apache-con europe - nov 2012
重構—改善既有程式的設計(chapter 12,13)
重構—改善既有程式的設計(chapter 10)
重構—改善既有程式的設計(chapter 9)
重構—改善既有程式的設計(chapter 8)part 2
重構—改善既有程式的設計(chapter 8)part 1
重構—改善既有程式的設計(chapter 7)
重構—改善既有程式的設計(chapter 6)
重構—改善既有程式的設計(chapter 4,5)
重構—改善既有程式的設計(chapter 2,3)
重構—改善既有程式的設計(chapter 1)
Designs, Lessons and Advice from Building Large Distributed Systems
Hw5 my house in yong he
Social English Class HW4
Social English Class HW3
Sm Case1 Ikea
火柴人的故事

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Approach and Philosophy of On baking technology
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
A Presentation on Artificial Intelligence
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
DOCX
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Monthly Chronicles - July 2025
Approach and Philosophy of On baking technology
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A Presentation on Artificial Intelligence
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The AUB Centre for AI in Media Proposal.docx

Hbase status quo apache-con europe - nov 2012

  • 1. HBASE STATUS QUO The State of Affairs in HBase Land ApacheCon Europe, November 2012 Lars George Director EMEA Services
  • 2. About Me •  Director EMEA Services @ Cloudera •  Consulting on Hadoop projects (everywhere) •  Apache Committer •  HBase and Whirr •  O’Reilly Author •  HBase – The Definitive Guide •  Now in Japanese! •  Contact •  lars@cloudera.com 日本語版も出ました!   •  @larsgeorge
  • 3. Agenda •  HDFS and HBase •  HBase Project Status
  • 4. HDFS AND HBASE Past, Presence, Future
  • 5. Framework for Discussion •  Time Periods •  Past (Hadoop pre-1.0) •  Present (Hadoop 1.x, 2.0) •  Future (Hadoop 2.x and later) •  Categories •  Reliability/Availability •  Performance •  Feature Set
  • 6. HDFS and HBase History – 2006 Author: Douglass Cutting <cutting@apache.org> ! Date: Fri Jan 27 22:19:42 2006 +0000 ! ! Create hadoop sub-project. ! !
  • 7. HDFS and HBase History – 2007 Author: Douglass Cutting <cutting@apache.org>! Date: Tue Apr 3 20:34:28 2007 +0000 ! !! HADOOP-1045. Add contrib/hbase, a 
 BigTable-like online database. ! !
  • 8. HDFS and HBase History – 2008 Author: Jim Kellerman <jimk@apache.org> ! Date: Tue Feb 5 02:36:26 2008 +0000 ! ! 2008/02/04 HBase is now a subproject of Hadoop. The first HBase release as a subproject will be release 0.1.0 which will be equivalent to the version of HBase included in Hadoop 0.16.0... !
  • 9. HDFS and HBase History – Early 2010 HBase has been around for 3 years. But HDFS still acts like MapReduce is the only important client! People  have  accused  HDFS  of  being  like  a  molasses  train:   High  throughput  but  not  so  fast  
  • 10. HDFS and HBase History – 2010 •  HBase becomes a top-level project •  Facebook chooses HBase for Messages product •  Jump from HBase 0.20 to HBase 0.89 and 0.90 •  First CDH3 betas include HBase •  HDFS community starts to work on features for HBase. •  Infamous hadoop-0.20-append 
 branch
  • 11. WHAT DID GET DONE? And where is it going?
  • 12. Reliability in the Past: Hadoop 1.0 •  Pre-1.0, if the DN crashed, HBase would lose its WALs (and your beloved data). •  1.0 integrated hadoop-0.20-append branch into a main-line release •  True durability support for HBase •  We have a fighting chance at metadata reliability! •  Numerous bug fixes for write pipeline recovery and other error paths •  HBase is not nearly so forgiving as MapReduce! •  “Single-writer” fault tolerance vs. “job-level” fault tolerance
  • 13. Reliability in the Past: Hadoop 1.0 •  Pre-1.0: if any disk failed, entire DN would go offline •  Problematic for HBase: local RS would lose all locality! •  1.0: per-disk failure detection in DN (HDFS-457) •  Allows HBase to lose a disk without losing all locality •  Tip: Configure dfs.datanode.failed.volumes.tolerated = 1 !
  • 14. Reliability Today: Hadoop 2.0 •  Integrates Highly Available HDFS •  Active-standby hot failover removes SPOF •  Transparent to clients: no HBase changes necessary •  Tested extensively under HBase read/write workloads •  Coupled with HBase master failover, no more HBase SPOF!
  • 16. Reliability in the Future: HA in 2.x •  Remove dependency on NFS (HDFS-3077) •  Quorum-commit protocol for NameNode edit logs •  Similar to ZAB/Multi-Paxos •  Automatic failover for HA NameNodes (HDFS-3042) •  ZooKeeper-based master election, just like HBase •  Merged to trunk
  • 17. Other Reliability Work for HDFS 2.x •  2.0: current hflush() API only guarantees data is replicated to three machines – not fully on disk. •  A cluster-wide power outage can lose data. •  Upcoming in 2.x: Support for hsync() (HDFS-744, HBASE-5954) •  Calls fsync() for all replicas of the WAL •  Full durability of edits, even with full cluster power outages
  • 18. hflush() and hsync() hflush()/hsync() hflush(): flushes to OS buffer - send all queued data, note seq num hsync(): fsync() to disk - block until corresponding ACK is received
  • 19. HDFS Wire Compatibility in Hadoop 2.0 •  In 1.0: HDFS client version must match server version closely. •  How many of you have manually copied HDFS client jars? •  Client-server compatibility in 2.0: •  Protobuf-based RPC •  Easier HBase installs: no more futzing with jars •  Separate HBase upgrades from HDFS upgrades •  Intra-cluster server compatibility in the works •  Allow for rolling upgrade without downtime
  • 20. Performance: Hadoop 1.0 •  Pre-1.0: even for reads from local machine, client connects to DN via TCP •  1.0: Short-circuit local reads •  Obtains direct access to underlying local block file, then uses regular FileInputStream access. •  2x speedup for random reads •  Configure dfs.client.read.shortcircuit = true! dfs.block.local-path-access.user = hbase! dfs.datanode.data.dir.perm = 755! •  Note: Currently does not support security
  • 21. Performance: Hadoop 2.0 •  Pre-2.0: Up to 50% CPU spent verifying CRC •  2.0: Native checksums using SSE4.2 crc32 asm (HDFS-2080) •  2.5x speedup reading from buffer cache •  Now only 15% CPU overhead to checksumming •  Pre-2.0: re-establishes TCP connection to DN for each seek •  2.0: Rewritten BlockReader, keep-alive to DN (HDFS-941) •  40% improvement on random read for HBase •  2-2.5x in micro-benchmarks •  Total improvement vs. 0.20.2: 3.4x!
  • 22. Performance: Hadoop 2.x •  Currently: lots of CPU spent copying data in memory •  “Direct-read” API: read directly into user-provided DirectByteBuffers (HDFS-2834) •  Another ~2x improvement to sequential throughput reading from cache •  Opportunity to avoid two more buffer copies reading compressed data (HADOOP-8148) •  Codec APIs still in progress, needs integration into HBase
  • 23. Performance: Hadoop 2.x •  True “zero-copy read” support (HDFS-3051) •  New API would allow direct access to mmaped block files •  No syscall or JNI overhead for reads •  Initial benchmarks indicate at least ~30% gain. •  Some open questions around best safe implementation
  • 26. Performance: Why Emphasize CPU? •  Machines with lots of RAM now inexpensive (48-96GB common) •  Want to use that to improve cache hit ratios. •  Unfortunately, 50GB+ Java heaps still impractical (GC pauses too long) •  Allocate the extra RAM to the buffer cache •  OS caches compressed data: another win! •  CPU overhead reading from buffer cache becomes limiting factor for read workloads
  • 27. What’s Up Next in 2.x? •  HDFS Hard-links (HDFS-3370) •  Will allow for HBase to clone/snapshot tables efficiently! •  Improves HBase table-scoped backup story •  HDFS Snapshots (HDFS-2802) •  HBase-wide snapshot support for point-in-time recovery •  Enables consistent backups copied off-site for DR
  • 28. What’s Up Next in 2.x? •  Improved block placement policies (HDFS-1094) •  Fundamental tradeoff between probability of data unavailability and the amount of data that becomes unavailable •  Current scheme: if any 3 nodes not on the same rack die, some very small amount of data is unavailable •  Proposed scheme: lessen chances of unavailability, but if a certain three nodes die, a larger amount is unavailable •  For many HBase applications: any single lost block halts whole operation. Prefer to minimize probability.
  • 29. What’s Up Next in 2.x? •  HBase-specific block placement hints (HBASE-4755) •  Assign each region a set of three RS (primary and two backups) •  Place underlying data blocks on these three DNs •  Could then fail-over and load-balance without losing any locality!
  • 30. Summary Hadoop  1.0   Hadoop  2.0   Hadoop  2.x   Availability   •  DN  volume  failure   •  NameNode  HA   •  HA  without  NAS   isola>on   •  Wire  Compa>bility   •  Rolling  upgrade   Performance   •  Short-­‐circuit  reads   •  Na>ve  CRC   •  Direct-­‐read  API   •  DN  keep-­‐alive   •  Zero-­‐copy  API   •  Direct  codec  API   Features   •  Durable   •  hsync()! hflush()! •  Snapshots   •  Hard  links   •  HBase-­‐aware  block   placement  
  • 31. Summary •  HBase is no longer a second-class citizen. •  We’ve come a long way since Hadoop 0.20.2 in performance, reliability, and availability. •  New features coming in the 2.x line specifically to benefit HBase use cases •  Hadoop 2.0 features available today via CDH4. Many Cloudera customers already using CDH4 with HBase with great success.
  • 33. Current Project Status •  HBase 0.90.x “Advanced Concepts” •  Master Rewrite – More Zookeeper •  Intra Row Scanning •  Further optimizations on algorithms and data structures CDH3
  • 34. Current Project Status •  HBase 0.92.x “Coprocessors” •  Multi-DC Replication •  Discretionary Access Control •  Coprocessors •  Endpoints and Observers •  Can hook into many explicit and implicit operation CDH4
  • 35. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Read CRC Improvements •  Seek Optimizations •  Lazy Seeks •  WAL Compression •  Prefix Compression (aka Block Encoding) •  Atomic Append •  Atomic put+delete •  Multi Increment and Multi Append
  • 36. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Per-region (i.e. local) Multi-Row Transactions RegionMutation rm = new RegionMutation();! Put p = new Put(ROW1);! p.add(FAMILY, QUALIFIER, VALUE);! rm.add(p);! p = new Put(ROW2);! p.add(FAMILY, QUALIFIER, VALUE);! rm.add(p);! t.mutateRegion(rm);!
  • 37. Current Project Status (cont.) •  HBase 0.94.x “Performance Release” •  Uber HBCK •  Embedded Thrift Server •  WALPlayer •  Enable/disable Replication Streams CDH4.x (soon)
  • 38. Current Project Status (cont.) •  HBase 0.96.x “The Singularity” •  Protobuf RPC •  Rolling Upgrades •  Multiversion Access •  Metrics V2 •  Preview Technologies •  Snapshots •  PrefixTrie Block Encoding CDH5 ?
  • 39. Client/Server Compatibility Matrix RegionServer,   Client  0.96.0     Client  0.96.1     Client  0.98.0   Client  1.0.0     Master      0.96.0     Works     Works*   Works*   No  guarantee      0.96.1     Works     Works     Works*   No  guarantee      0.98.0   Works     Works     Works     Works*        1.0.0   No  guarantee   No  guarantee   Works   Works   Notes:  *  If  new  features  are  not  used     39 ©2012 Cloudera, Inc. All Rights Reserved.