SlideShare a Scribd company logo
Introduction to
HBase
Ciao
ciao
Vai a fare
ciao ciao
Dr. Fabio Fumarola
Contents
• BigTable
• HBase
– Shell
– Admin
– Put
– Get
– Scan
• Coding Session
2
BigTable
3
Bigtable at google
• "Bigtable is a distributed storage system for
managing structured data that is designed to scale to
a very large size: petabytes of data across thousands
of commodity servers. Many projects at Google store
data in Bigtable including web indexing, Google
Earth, and Google Finance.”
4
Feature
• Distributed
• Sparse
• Column-Oriented
• Versioned
5
1. The map is indexed by a
– <row key, column key, and a timestamp>
1. each value in the map is an uninterpreted array of
bytes.
6
(row key, column key, timestamp) => value
Key Concepts
• row key => 20120407152657
• column family => "personal:"
• column key => "personal:givenName",
"personal:surname”
• timestamp => 1239124584398
• Column value => “mario”, “rossi”
7
Example 1
8
Get row 20120407145045
9
HBase
• Use HBase when you need random, realtime read/
write access to your Big Data.This project's goal is
the hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware. HBase is an open-source, distributed,
versioned, column-oriented store modeled after
Google's Bigtable.
http://hbase.apache.org
10
HBase Shell
hbase(main):001:0> create 'blog', 'info', 'content'
0 row(s) in 4.3640 seconds
hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document-
oriented storage using CouchDB'
0 row(s) in 0.0330 seconds
hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith'
0 row(s) in 0.0030 seconds
hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a
document-oriented...'
0 row(s) in 0.0030 seconds
11
HBase shell
hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence'
0 row(s) in 0.0030 seconds
hbase(main):006:0> get 'blog', '20120320162535'
COLUMN
content:
info:author
info:category
info:title
4 row(s) in 0.0140 seconds
CELL
timestamp=1239135042862, value=CouchDB is a doc...
timestamp=1239135042755, value=Bob Smith
timestamp=1239135042982, value=Persistence
timestamp=1239135042623, value=Document-oriented...
12
HBase shell
hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 }
timestamp=1239135325074, value=John Doe
timestamp=1239135324741, value=John
2 row(s) in 0.0060 seconds
hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' }
ROW
20120320162535
20120320162535
20120320162535
20120320162535
COLUMN+CELL
column=content:, timestamp=1239135042862, value=CouchDB is...
column=info:author, timestamp=1239135042755, value=Bob Smith
column=info:category, timestamp=1239135042982, value=Persistence
column=info:title, timestamp=1239135042623, value=Document...
4 row(s) in 0.0230 seconds
13
Java API
14
Admin API
// Create a new table
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String tableName = "people";
HTableDescriptor desc = new HTableDescriptor(tableName);
desc.addFamily(new HColumnDescriptor("personal"));
desc.addFamily(new HColumnDescriptor("contactinfo"));
desc.addFamily(new HColumnDescriptor("creditcard"));
admin.createTable(desc);
System.out.printf("%s is available? %bn", tableName,
admin.isTableAvailable(tableName));
15
Client API
import static org.apache.hadoop.hbase.util.Bytes.toBytes;
// Add some data into 'people' table
Configuration conf = HBaseConfiguration.create();
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("givenName"),
toBytes("John"));
put.add(toBytes("personal"), toBytes("mi"),
toBytes("M")); put.add(toBytes("personal"),
toBytes("surname"), toBytes("Connor"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.connor@gmail.com")); table.put(put);
table.flushCommits(); table.close();
16
Finding Data
• GET (by row key)
• Scan (by row key ranges, filtering)
17
Get
// Get a row. Ask for only the data you need.
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Get get = new Get(toBytes("connor-john-m-43299"));
get.setMaxVersions(2);
get.addFamily(toBytes("personal"));
get.addColumn(toBytes("contactinfo"),
toBytes("email"));
Result result = table.get(get);
18
Update
// Update existing values, and add a new one
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Put put = new Put(toBytes("connor-john-m-43299"));
put.add(toBytes("personal"), toBytes("surname"),
toBytes("Smith"));
put.add(toBytes("contactinfo"), toBytes("email"),
toBytes("john.m.smith@gmail.com"));
put.add(toBytes("contactinfo"), toBytes("address"),
toBytes("San Diego, CA"));
table.put(put);
table.flushCommits();
table.close();
19
Scans
// Scan rows...
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, "people");
Scan scan = new Scan(toBytes(”jhon-"));
scan.addColumn(toBytes("personal"), toBytes("givenName"));
scan.addColumn(toBytes("contactinfo", toBytes("email"));
scan.addColumn(toBytes("contactinfo", toBytes("address"));
scan.setFilter(new PageFilter(numRowsPerPage));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// process result...
}
20
Time to Code
This is when things start to do hard
21
Setup HBase Docker
• https://registry.hub.docker.com/u/banno/hbase-standalo
• https://registry.hub.docker.com/u/oddpoet/hbase-cdh5/
22
Steps
• Shell
• Java Project
– Maven
– Gradle
23

More Related Content

PPT
8b. Column Oriented Databases Lab
PPT
8. key value databases laboratory
PPT
8. column oriented databases
PPT
9b. Document-Oriented Databases lab
PPT
8a. How To Setup HBase with Docker
PDF
Google Bigtable Paper Presentation
PDF
MySQL database replication
PDF
The Google Bigtable
8b. Column Oriented Databases Lab
8. key value databases laboratory
8. column oriented databases
9b. Document-Oriented Databases lab
8a. How To Setup HBase with Docker
Google Bigtable Paper Presentation
MySQL database replication
The Google Bigtable

What's hot (20)

PDF
Mysql database basic user guide
ODP
Introduction to PostgreSQL
PDF
What is new in MariaDB 10.6?
PDF
Hive Quick Start Tutorial
PPTX
Accessing external hadoop data sources using pivotal e xtension framework (px...
PDF
Hbase
PDF
Building Hybrid data cluster using PostgreSQL and MongoDB
PPT
Hadoop & Zing
KEY
Cassandra and Rails at LA NoSQL Meetup
PPT
SphinxSE with MySQL
PDF
Friends of Solr - Nutch & HDFS
PDF
Large Scale Crawling with Apache Nutch and Friends
PDF
What's New in PostgreSQL 9.6
 
PPTX
HBase: Just the Basics
PDF
Apache Hadoop and HBase
PPTX
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
PDF
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
PDF
MySQL Live Migration - Common Scenarios
PPTX
Redis Functions, Data Structures for Web Scale Apps
PDF
Storage Methods for Nonstandard Data Patterns
Mysql database basic user guide
Introduction to PostgreSQL
What is new in MariaDB 10.6?
Hive Quick Start Tutorial
Accessing external hadoop data sources using pivotal e xtension framework (px...
Hbase
Building Hybrid data cluster using PostgreSQL and MongoDB
Hadoop & Zing
Cassandra and Rails at LA NoSQL Meetup
SphinxSE with MySQL
Friends of Solr - Nutch & HDFS
Large Scale Crawling with Apache Nutch and Friends
What's New in PostgreSQL 9.6
 
HBase: Just the Basics
Apache Hadoop and HBase
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
MySQL shell and It's utilities - Praveen GR (Mydbops Team)
MySQL Live Migration - Common Scenarios
Redis Functions, Data Structures for Web Scale Apps
Storage Methods for Nonstandard Data Patterns
Ad

Viewers also liked (11)

PPT
10b. Graph Databases Lab
PPT
PPT
10. Graph Databases
PPT
11. From Hadoop to Spark 1:2
PPT
11. From Hadoop to Spark 2/2
PPT
Scala and spark
PPT
7. Key-Value Databases: In Depth
PPTX
Data Modeling for NoSQL
PPT
6 Data Modeling for NoSQL 2/2
PPT
5 Data Modeling for NoSQL 1/2
PPT
9. Document Oriented Databases
10b. Graph Databases Lab
10. Graph Databases
11. From Hadoop to Spark 1:2
11. From Hadoop to Spark 2/2
Scala and spark
7. Key-Value Databases: In Depth
Data Modeling for NoSQL
6 Data Modeling for NoSQL 2/2
5 Data Modeling for NoSQL 1/2
9. Document Oriented Databases
Ad

Similar to Hbase an introduction (20)

PDF
HBase Lightning Talk
PPT
Getting started into mySQL
PPTX
HBase.pptx
PPTX
Parsing HTML read and write operations and OS Module.pptx
PDF
[PSU Web 2011] HTML5 Design
PDF
Intro to HTML 5 / CSS 3
PPTX
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
PDF
The emerging world of mongo db csp
PDF
Introduction to Apache Tajo: Data Warehouse for Big Data
PDF
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
PDF
NoSQL store everyone ignored - Postgres Conf 2021
PPTX
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
PDF
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
PDF
Creating, Updating and Deleting Document in MongoDB
PDF
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
PDF
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
KEY
MongoDB at RubyEnRails 2009
PDF
HTML5, just another presentation :)
PPT
W3Conf slides - The top web features from caniuse.com you can use today
PDF
SQL Server 2014 Monitoring and Profiling
HBase Lightning Talk
Getting started into mySQL
HBase.pptx
Parsing HTML read and write operations and OS Module.pptx
[PSU Web 2011] HTML5 Design
Intro to HTML 5 / CSS 3
HBASE, HIVE , ARCHITECTURE AND WORKING EXAMPLES
The emerging world of mongo db csp
Introduction to Apache Tajo: Data Warehouse for Big Data
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
NoSQL store everyone ignored - Postgres Conf 2021
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Creating, Updating and Deleting Document in MongoDB
Valtech - Big Data & NoSQL : au-delà du nouveau buzz
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
MongoDB at RubyEnRails 2009
HTML5, just another presentation :)
W3Conf slides - The top web features from caniuse.com you can use today
SQL Server 2014 Monitoring and Profiling

More from Fabio Fumarola (8)

PPT
2 Linux Container and Docker
PDF
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
PPT
An introduction to maven gradle and sbt
PPT
Develop with linux containers and docker
PPT
Linux containers and docker
PPTX
08 datasets
PPTX
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
PPT
NoSQL databases pros and cons
2 Linux Container and Docker
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
An introduction to maven gradle and sbt
Develop with linux containers and docker
Linux containers and docker
08 datasets
A Parallel Algorithm for Approximate Frequent Itemset Mining using MapReduce
NoSQL databases pros and cons

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPTX
Welding lecture in detail for understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
additive manufacturing of ss316l using mig welding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT on Performance Review to get promotions
Welding lecture in detail for understanding
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Foundation to blockchain - A guide to Blockchain Tech
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Lesson 3_Tessellation.pptx finite Mathematics
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Embodied AI: Ushering in the Next Era of Intelligent Systems
additive manufacturing of ss316l using mig welding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Hbase an introduction

  • 1. Introduction to HBase Ciao ciao Vai a fare ciao ciao Dr. Fabio Fumarola
  • 2. Contents • BigTable • HBase – Shell – Admin – Put – Get – Scan • Coding Session 2
  • 4. Bigtable at google • "Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable including web indexing, Google Earth, and Google Finance.” 4
  • 5. Feature • Distributed • Sparse • Column-Oriented • Versioned 5
  • 6. 1. The map is indexed by a – <row key, column key, and a timestamp> 1. each value in the map is an uninterpreted array of bytes. 6 (row key, column key, timestamp) => value
  • 7. Key Concepts • row key => 20120407152657 • column family => "personal:" • column key => "personal:givenName", "personal:surname” • timestamp => 1239124584398 • Column value => “mario”, “rossi” 7
  • 10. HBase • Use HBase when you need random, realtime read/ write access to your Big Data.This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. http://hbase.apache.org 10
  • 11. HBase Shell hbase(main):001:0> create 'blog', 'info', 'content' 0 row(s) in 4.3640 seconds hbase(main):002:0> put 'blog', '20120320162535', 'info:title', 'Document- oriented storage using CouchDB' 0 row(s) in 0.0330 seconds hbase(main):003:0> put 'blog', '20120320162535', 'info:author', 'Bob Smith' 0 row(s) in 0.0030 seconds hbase(main):004:0> put 'blog', '20120320162535', 'content:', 'CouchDB is a document-oriented...' 0 row(s) in 0.0030 seconds 11
  • 12. HBase shell hbase(main):005:0> put 'blog', '20120320162535', 'info:category', 'Persistence' 0 row(s) in 0.0030 seconds hbase(main):006:0> get 'blog', '20120320162535' COLUMN content: info:author info:category info:title 4 row(s) in 0.0140 seconds CELL timestamp=1239135042862, value=CouchDB is a doc... timestamp=1239135042755, value=Bob Smith timestamp=1239135042982, value=Persistence timestamp=1239135042623, value=Document-oriented... 12
  • 13. HBase shell hbase(main):015:0> get 'blog', '20120407145045', {COLUMN=>'info:author', VERSIONS=>3 } timestamp=1239135325074, value=John Doe timestamp=1239135324741, value=John 2 row(s) in 0.0060 seconds hbase(main):016:0> scan 'blog', { STARTROW => '20120300', STOPROW => '20120400' } ROW 20120320162535 20120320162535 20120320162535 20120320162535 COLUMN+CELL column=content:, timestamp=1239135042862, value=CouchDB is... column=info:author, timestamp=1239135042755, value=Bob Smith column=info:category, timestamp=1239135042982, value=Persistence column=info:title, timestamp=1239135042623, value=Document... 4 row(s) in 0.0230 seconds 13
  • 15. Admin API // Create a new table Configuration conf = HBaseConfiguration.create(); HBaseAdmin admin = new HBaseAdmin(conf); String tableName = "people"; HTableDescriptor desc = new HTableDescriptor(tableName); desc.addFamily(new HColumnDescriptor("personal")); desc.addFamily(new HColumnDescriptor("contactinfo")); desc.addFamily(new HColumnDescriptor("creditcard")); admin.createTable(desc); System.out.printf("%s is available? %bn", tableName, admin.isTableAvailable(tableName)); 15
  • 16. Client API import static org.apache.hadoop.hbase.util.Bytes.toBytes; // Add some data into 'people' table Configuration conf = HBaseConfiguration.create(); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("givenName"), toBytes("John")); put.add(toBytes("personal"), toBytes("mi"), toBytes("M")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Connor")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.connor@gmail.com")); table.put(put); table.flushCommits(); table.close(); 16
  • 17. Finding Data • GET (by row key) • Scan (by row key ranges, filtering) 17
  • 18. Get // Get a row. Ask for only the data you need. Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Get get = new Get(toBytes("connor-john-m-43299")); get.setMaxVersions(2); get.addFamily(toBytes("personal")); get.addColumn(toBytes("contactinfo"), toBytes("email")); Result result = table.get(get); 18
  • 19. Update // Update existing values, and add a new one Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Put put = new Put(toBytes("connor-john-m-43299")); put.add(toBytes("personal"), toBytes("surname"), toBytes("Smith")); put.add(toBytes("contactinfo"), toBytes("email"), toBytes("john.m.smith@gmail.com")); put.add(toBytes("contactinfo"), toBytes("address"), toBytes("San Diego, CA")); table.put(put); table.flushCommits(); table.close(); 19
  • 20. Scans // Scan rows... Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "people"); Scan scan = new Scan(toBytes(”jhon-")); scan.addColumn(toBytes("personal"), toBytes("givenName")); scan.addColumn(toBytes("contactinfo", toBytes("email")); scan.addColumn(toBytes("contactinfo", toBytes("address")); scan.setFilter(new PageFilter(numRowsPerPage)); ResultScanner scanner = table.getScanner(scan); for (Result result : scanner) { // process result... } 20
  • 21. Time to Code This is when things start to do hard 21
  • 22. Setup HBase Docker • https://registry.hub.docker.com/u/banno/hbase-standalo • https://registry.hub.docker.com/u/oddpoet/hbase-cdh5/ 22
  • 23. Steps • Shell • Java Project – Maven – Gradle 23