SlideShare a Scribd company logo
1
HBase: Just the Basics
Jesse Anderson – Curriculum Developer and Instructor
v1
2
What Is HBase?
©2014 Cloudera, Inc. All rights reserved.2
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• Handles the various manifestations of Big Data
• Based on Google’s BigTable paper
3
Why Use HBase?
©2014 Cloudera, Inc. All rights reserved.3
• Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
4
When to Consider Not Using HBase?
©2014 Cloudera, Inc. All rights reserved.4
• Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
• Not random reads or writes
• Access patterns of the data are ill-defined
5
HBase Architecture
How it works
6
Meet the Daemons
©2014 Cloudera, Inc. All rights reserved.6
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
• NameNode/Standby NameNode
• DataNode
7
Daemon Locations
©2014 Cloudera, Inc. All rights reserved.7
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Master Nodes
Slave Nodes
8
Tables and Column Families
©2014 Cloudera, Inc. All rights reserved.8
Column Family “contactinfo” Column Family “profilephoto”
Tables are broken into groupings called Column Families.
Group data frequently
accessed together and
compress it Group photos with different settings
9
Rows and Columns
©2014 Cloudera, Inc. All rights reserved.9
Row key Column Family “contactinfo” Column Family “profilephoto”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith image: <smith.jpg>
mrossi fname: Mario lname: Rossi image: <mario.jpg>
Row keys identify a row
No storage penalty for unused columns
Each Column Family can have many columns
10
Regions
©2014 Cloudera, Inc. All rights reserved.10
Row key Column Family “contactinfo”
adupont fname: Andre lname: Dupont
jsmith fname: John lname: Smith
A table is broken into regions
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
Row key Column Family “contactinfo”
mrossi fname: Mario lname: Rossi
zstevens fname: Zack lname: Stevens
Regions are served by
RegionServers
11
Client
Write Path
©2014 Cloudera, Inc. All rights reserved.11
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Write to
RegionServer
12
Client
Read Path
©2014 Cloudera, Inc. All rights reserved.12
NameNode
DataNodeDataNode
Standby
NameNode
DataNode
RegionServer
Master
RegionServerRegionServer
ZooKeeper ZooKeeper ZooKeeper
Master Master
DataNodeDataNode DataNode
RegionServerRegionServerRegionServer
1. Which
RegionServer is
serving the Region?
2. Read from
RegionServer
13
HBase API
How to access the data
14
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.14
• Data is not accessed over SQL
• You must:
• Create your own connections
• Keep track of the type of data in a column
• Give each row a key
• Access a row by its key
15
Types of Access
©2014 Cloudera, Inc. All rights reserved.15
• Gets
• Gets a row’s data based on the row key
• Puts
• Upserts a row with data based on the row key
• Scans
• Finds all matching rows based on the row key
• Scan logic can be increased by using filters
16
Gets
©2014 Cloudera, Inc. All rights reserved.16
1
2
3
4
Get g = new Get(ROW_KEY_BYTES);
Result r= table.get(g);
byte[] byteArray =
r.getValue(COLFAM_BYTS,COLDESC_BYTS);
String columnValue =
Bytes.toString(byteArray);
17
Puts
©2014 Cloudera, Inc. All rights reserved.17
1
2
3
4
Put p = new
Put(Bytes.toBytes(ROW_KEY_BYTES);
p.add(COLFAM_BYTES, COLDESC_BYTES,
Bytes.toBytes("value"));
table.put(p);
18
HBase Schema Design
How to design
19
No SQL Means No SQL
©2014 Cloudera, Inc. All rights reserved.19
• Designing schemas for HBase requires an in-depth
knowledge
• Schema Design is ‘data-centric’ not ‘relationship-
centric’
• You design around how data is accessed
• Row keys are engineered
20
Treating HBase like a traditional RDBMS will lead
to abject failure!
Captain Picard
21
Row Keys
©2014 Cloudera, Inc. All rights reserved.21
• A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row
key
• Contents of a row key vary by access pattern
• Often made up of several pieces of data
22
Schema Design
©2014 Cloudera, Inc. All rights reserved.22
• Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
• Fewer, bigger tables
23 ©2014 Cloudera, Inc. All rights reserved.
Jesse Anderson
@jessetanderson

More Related Content

PPT
HBASE Overview
PPT
Chicago Data Summit: Apache HBase: An Introduction
PDF
Apache HBase - Just the Basics
ODP
Apache hadoop hbase
PPTX
HBaseCon 2015: Analyzing HBase Data with Apache Hive
PPTX
Introduction To HBase
PPT
Hw09 Practical HBase Getting The Most From Your H Base Install
PDF
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBASE Overview
Chicago Data Summit: Apache HBase: An Introduction
Apache HBase - Just the Basics
Apache hadoop hbase
HBaseCon 2015: Analyzing HBase Data with Apache Hive
Introduction To HBase
Hw09 Practical HBase Getting The Most From Your H Base Install
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase

What's hot (20)

PPTX
Apache Spark on Apache HBase: Current and Future
PPTX
Introduction to HBase
PDF
HBase for Architects
PDF
Hadoop and HBase in the Real World
PDF
Apache Hadoop and HBase
PPTX
Apache HBase™
PDF
HBase Read High Availability Using Timeline-Consistent Region Replicas
PDF
Intro to HBase
PDF
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
PPT
Big Data Fundamentals in the Emerging New Data World
PPTX
HBaseCon 2013: Full-Text Indexing for Apache HBase
PDF
HBaseCon 2015: Just the Basics
PDF
Intro to HBase - Lars George
PPTX
A Survey of HBase Application Archetypes
PDF
Apache HBase 1.0 Release
PPTX
PDF
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
Apache Spark on Apache HBase: Current and Future
Introduction to HBase
HBase for Architects
Hadoop and HBase in the Real World
Apache Hadoop and HBase
Apache HBase™
HBase Read High Availability Using Timeline-Consistent Region Replicas
Intro to HBase
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2013: Integration of Apache Hive and HBase
Big Data Fundamentals in the Emerging New Data World
HBaseCon 2013: Full-Text Indexing for Apache HBase
HBaseCon 2015: Just the Basics
Intro to HBase - Lars George
A Survey of HBase Application Archetypes
Apache HBase 1.0 Release
HBaseCon 2015: Graph Processing of Stock Market Order Flow in HBase on AWS
Ad

Similar to HBase: Just the Basics (20)

PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PDF
Introduction to HBase
PPTX
Kite SDK: Working with Datasets
PDF
Cloudera Operational DB (Apache HBase & Apache Phoenix)
PDF
Hive 3 a new horizon
PPTX
Hive 3 - a new horizon
PDF
Big Data Conference April 2015
PDF
Kite SDK introduction for Portland Big Data
PDF
Sql on everything with drill
PPTX
Introduction to HBase - Phoenix HUG 5/14
PPTX
Twitter with hadoop for oow
PPTX
ch10 - lecture big data - database concept 7th edition.pptx
PDF
SQL Engines for Hadoop - The case for Impala
PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
PPTX
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
PPTX
NoSQL Databases
PPTX
H base introduction & development
PDF
Sql saturday pig session (wes floyd) v2
PPTX
Couchbase 101
PDF
Hive Quick Start Tutorial
HBase Data Modeling and Access Patterns with Kite SDK
Introduction to HBase
Kite SDK: Working with Datasets
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Hive 3 a new horizon
Hive 3 - a new horizon
Big Data Conference April 2015
Kite SDK introduction for Portland Big Data
Sql on everything with drill
Introduction to HBase - Phoenix HUG 5/14
Twitter with hadoop for oow
ch10 - lecture big data - database concept 7th edition.pptx
SQL Engines for Hadoop - The case for Impala
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
NoSQL Databases
H base introduction & development
Sql saturday pig session (wes floyd) v2
Couchbase 101
Hive Quick Start Tutorial
Ad

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client

Recently uploaded (20)

PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Introduction to Artificial Intelligence
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Transform Your Business with a Software ERP System
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
2025 Textile ERP Trends: SAP, Odoo & Oracle
Introduction to Artificial Intelligence
Design an Analysis of Algorithms II-SECS-1021-03
Online Work Permit System for Fast Permit Processing
Transform Your Business with a Software ERP System
CHAPTER 2 - PM Management and IT Context
Understanding Forklifts - TECH EHS Solution
Upgrade and Innovation Strategies for SAP ERP Customers
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
How Creative Agencies Leverage Project Management Software.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
Odoo POS Development Services by CandidRoot Solutions
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ManageIQ - Sprint 268 Review - Slide Deck
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
ISO 45001 Occupational Health and Safety Management System
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...

HBase: Just the Basics

  • 1. 1 HBase: Just the Basics Jesse Anderson – Curriculum Developer and Instructor v1
  • 2. 2 What Is HBase? ©2014 Cloudera, Inc. All rights reserved.2 • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • Handles the various manifestations of Big Data • Based on Google’s BigTable paper
  • 3. 3 Why Use HBase? ©2014 Cloudera, Inc. All rights reserved.3 • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes
  • 4. 4 When to Consider Not Using HBase? ©2014 Cloudera, Inc. All rights reserved.4 • Only use with Big Data problems • Read straight through files • Write all at once or append new files • Not random reads or writes • Access patterns of the data are ill-defined
  • 6. 6 Meet the Daemons ©2014 Cloudera, Inc. All rights reserved.6 • HBase Master • RegionServer • ZooKeeper • HDFS • NameNode/Standby NameNode • DataNode
  • 7. 7 Daemon Locations ©2014 Cloudera, Inc. All rights reserved.7 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Master Nodes Slave Nodes
  • 8. 8 Tables and Column Families ©2014 Cloudera, Inc. All rights reserved.8 Column Family “contactinfo” Column Family “profilephoto” Tables are broken into groupings called Column Families. Group data frequently accessed together and compress it Group photos with different settings
  • 9. 9 Rows and Columns ©2014 Cloudera, Inc. All rights reserved.9 Row key Column Family “contactinfo” Column Family “profilephoto” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith image: <smith.jpg> mrossi fname: Mario lname: Rossi image: <mario.jpg> Row keys identify a row No storage penalty for unused columns Each Column Family can have many columns
  • 10. 10 Regions ©2014 Cloudera, Inc. All rights reserved.10 Row key Column Family “contactinfo” adupont fname: Andre lname: Dupont jsmith fname: John lname: Smith A table is broken into regions NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer Row key Column Family “contactinfo” mrossi fname: Mario lname: Rossi zstevens fname: Zack lname: Stevens Regions are served by RegionServers
  • 11. 11 Client Write Path ©2014 Cloudera, Inc. All rights reserved.11 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Write to RegionServer
  • 12. 12 Client Read Path ©2014 Cloudera, Inc. All rights reserved.12 NameNode DataNodeDataNode Standby NameNode DataNode RegionServer Master RegionServerRegionServer ZooKeeper ZooKeeper ZooKeeper Master Master DataNodeDataNode DataNode RegionServerRegionServerRegionServer 1. Which RegionServer is serving the Region? 2. Read from RegionServer
  • 13. 13 HBase API How to access the data
  • 14. 14 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.14 • Data is not accessed over SQL • You must: • Create your own connections • Keep track of the type of data in a column • Give each row a key • Access a row by its key
  • 15. 15 Types of Access ©2014 Cloudera, Inc. All rights reserved.15 • Gets • Gets a row’s data based on the row key • Puts • Upserts a row with data based on the row key • Scans • Finds all matching rows based on the row key • Scan logic can be increased by using filters
  • 16. 16 Gets ©2014 Cloudera, Inc. All rights reserved.16 1 2 3 4 Get g = new Get(ROW_KEY_BYTES); Result r= table.get(g); byte[] byteArray = r.getValue(COLFAM_BYTS,COLDESC_BYTS); String columnValue = Bytes.toString(byteArray);
  • 17. 17 Puts ©2014 Cloudera, Inc. All rights reserved.17 1 2 3 4 Put p = new Put(Bytes.toBytes(ROW_KEY_BYTES); p.add(COLFAM_BYTES, COLDESC_BYTES, Bytes.toBytes("value")); table.put(p);
  • 19. 19 No SQL Means No SQL ©2014 Cloudera, Inc. All rights reserved.19 • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship- centric’ • You design around how data is accessed • Row keys are engineered
  • 20. 20 Treating HBase like a traditional RDBMS will lead to abject failure! Captain Picard
  • 21. 21 Row Keys ©2014 Cloudera, Inc. All rights reserved.21 • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key • Contents of a row key vary by access pattern • Often made up of several pieces of data
  • 22. 22 Schema Design ©2014 Cloudera, Inc. All rights reserved.22 • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance • Fewer, bigger tables
  • 23. 23 ©2014 Cloudera, Inc. All rights reserved. Jesse Anderson @jessetanderson