SlideShare a Scribd company logo
HBase Data Types 
Nick Dimiduk, Hortonworks 
@xefyr n10k.com
Agenda 
• Motivations 
• Progress thus far 
• Future work 
• Examples 
• More Examples 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
2
Why introduce types? 
• Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ 
• Rule of least surprise 
• Interoperability across tools 
• Distill best practices 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
3
Considerations 
• Opt-in for current users 
• Easy transition for existing applications 
• Client-side only mostly 
– Filters, Split policies, Coprocessors, Block encoding 
• Avoid POJO constraints 
– No required base-class/interface 
– No magic (avoid ASM, ORM) 
• Non-Java clients 
• HBASE-8089 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
4
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
5
Inspiration 
• Orderly 
• PostgreSQL / PostGIS 
• HBASE-7221 
• HBASE-7692 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
6
Features: Encoding 
• Order preservation 
• Override direction (ASC/DSC) 
• Fixed, variable-width 
• Null-able 
• Self-identifying 
• Efficient 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
7
Features: API 
• Complex type encoding 
– Compound rowkey pattern 
– Order preservation 
– Nullable fields 
• Runtime metadata 
• User-extensible 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
8
Implementation$ 
HBASE-8089
Implementation: Encoding 
o.a.h.h.util.OrderedBytes 
• null 
• numeric, +/-Inf, NaN 
• int8, int16, int32, int64 
• float32, float64 
• variable-length text 
• variable-length blob 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
o.a.h.h.util.Bytes 
• numeric 
• boolean 
• int16, int32, int64 
• float32, float64 
• variable-length text 
2014-­‐11-­‐18 
10
Implementation: API 
interface DataType<T> 
• decode() 
• encode() 
• encodedClass() 
• encodedLength() 
• getOrder() 
• isNullable() 
• isOrderPreserving() 
• isSkippable() 
• skip() 
implements DataType 
• OrderedXXX 
• RawXXX 
• Struct 
– StructBuilder 
– StructIterator 
– TerminatedWrapper 
– FixedLengthWrapper 
• Union{2,3,4} 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
11
Up Next 
• “Default” types 
• More complex types 
– Arrays/Lists 
– Maps/Dicts 
• Tool integration 
– Apache Phoenix 
– Cloudera Kite 
• Performance audit, HBASE-8694 
• Improved metadata, 
HBASE-8863 
– isCastableTo 
– isCoercableTo 
– isComparableTo 
• TypedTable, HBASE-7941 
• Beyond Java, HBASE-10091 
– REST 
– Thrift 
– Shell 
• ImportTsv, HBASE-8593 
• User documentation 
• Coprocessors? 
• Filters? 
• CAS? 
• DataBlockEncoders? 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
12
Examples
A case for TypedTable 
Put p = new Put(Bytes.toBytes(u.user)); 
p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); 
p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); 
p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); 
p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
14
A case for TypedTable! 
static final RawString ENC_STR = new RawString();! 
static final RawLong ENC_LONG = new RawLong();! 
--! 
! 
SimplePositionedByteRange pbr =! 
new SimplePositionedByteRange(100);! 
ENC_STR.encode(pbr, u.user);! 
Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), 
pbr.getPosition()));! 
p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! 
pbr.setPosition(0);! 
ENC_STR.encode(pbr, u.name);! 
p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! 
...! 
2014-­‐11-­‐18 
15 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Structs: writing 
! 
! 
! 
Struct struct = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.toStruct();! 
PositionedByteRange buf1 =! 
new SimplePositionedByteRange(7);! 
struct.encode(buf1,! 
new Object[] { BigDecimal.ONE, "foo" });! 
! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
16
Structs: reading 
! 
! 
! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
! 
> BigDecimal.ONE, foo! 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
17
Structs: schema migration 
Struct addedFields = new StructBuilder()! 
.add(OrderedNumeric.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedString.ASCENDING)! 
.add(OrderedNumeric.ASCENDING)! 
.toStruct();! 
! 
buf1.setPosition(0);! 
StructIterator it = longer.iterator(buf1);! 
while (it.hasNext()) {! 
System.out.print(it.next() + ", ");! 
}! 
> BigDecimal.ONE, foo, null, null! 
!2014-­‐11-­‐18 
18 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
Protobuf (HBASE-11161) 
! 
class PBKeyValue extends PBType<CellProtos.KeyValue> {! 
! 
@Override! 
public int encode(PositionedByteRange dst, KeyValue val) {! 
CodedOutputStream os = outputStreamFromByteRange(dst);! 
int before = os.spaceLeft(), after, written;! 
val.writeTo(os);! 
after = os.spaceLeft();! 
written = before - after;! 
dst.setPosition(dst.getPosition() + written);! 
return written;! 
}! 
2014-­‐11-­‐18 
19 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License.
More Examples$ 
https://gist.github.com/ndimiduk/bcf33f09cc7e4408f684
Thanks! 
M A N N I N G 
Nick Dimiduk 
Amandeep Khurana 
FOREWORD BY 
Michael Stack 
hbaseinaction.com 
Nick Dimiduk 
github.com/ndimiduk 
@xefyr 
n10k.com 
http://s.apache.org/bGN 
Licensed 
under 
a 
Crea3ve 
Commons 
A8ribu3on-­‐ShareAlike 
3.0 
Unported 
License. 
2014-­‐11-­‐18 
21

More Related Content

PDF
Python and MongoDB
PDF
IPFS introduction
PDF
Native or External?
PDF
PyCon Russian 2015 - Dive into full text search with python.
KEY
Python Development (MongoSF)
PDF
GlusterFS As an Object Storage
PDF
Apache Commons ソースリーディングの会:Codec
PPTX
Data analytics
Python and MongoDB
IPFS introduction
Native or External?
PyCon Russian 2015 - Dive into full text search with python.
Python Development (MongoSF)
GlusterFS As an Object Storage
Apache Commons ソースリーディングの会:Codec
Data analytics

Viewers also liked (12)

PPTX
Introduction to Data Engineering
PDF
The inherent complexity of stream processing
PPTX
Big data road map
KEY
The Secrets of Building Realtime Big Data Systems
PDF
Data Engineering Quick Guide
PDF
Apache Big Data EU 2015 - HBase
PDF
11 Hard to Ignore Data Analytics Quotes
PDF
Demystifying Data Engineering
PPTX
Big Data: The 6 Key Skills Every Business Needs
PPTX
Big Data: The 4 Layers Everyone Must Know
PPTX
What is Big Data?
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Introduction to Data Engineering
The inherent complexity of stream processing
Big data road map
The Secrets of Building Realtime Big Data Systems
Data Engineering Quick Guide
Apache Big Data EU 2015 - HBase
11 Hard to Ignore Data Analytics Quotes
Demystifying Data Engineering
Big Data: The 6 Key Skills Every Business Needs
Big Data: The 4 Layers Everyone Must Know
What is Big Data?
Big Data - 25 Amazing Facts Everyone Should Know
Ad

Similar to HBase Data Types (20)

PDF
StORM preview
PPTX
Unit 5-hive data types – primitive and complex data
PDF
Hw09 Sqoop Database Import For Hadoop
PPT
Data Structures: Introduction_______.ppt
PPT
DSA___________________SSSSSSSSSSSSSS.ppt
PPTX
data types.pptx
PPT
Introduction to Data Structure and Algorithms
PDF
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
PDF
Leveraging Hadoop for Legacy Systems
PPS
Wrapper class
PDF
Data Evolution on HBase (with Kiji)
PDF
Data Evolution on HBase with Kiji
PPT
Hive Object Model
PDF
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
PPT
Data types
PDF
Nitty Gritty of Data Serialisation
PDF
SULTHAN's - Data Structures
PDF
Look Ma, “update DB to HTML5 using C++”, no hands! 
PDF
Apache Hive, data segmentation and bucketing
PDF
Thomas risberg mongosv-2012-spring-data-cloud-foundry
StORM preview
Unit 5-hive data types – primitive and complex data
Hw09 Sqoop Database Import For Hadoop
Data Structures: Introduction_______.ppt
DSA___________________SSSSSSSSSSSSSS.ppt
data types.pptx
Introduction to Data Structure and Algorithms
Hadoop World 2011: Leveraging Hadoop for Legacy Systems - Mathias Herberts, C...
Leveraging Hadoop for Legacy Systems
Wrapper class
Data Evolution on HBase (with Kiji)
Data Evolution on HBase with Kiji
Hive Object Model
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data types
Nitty Gritty of Data Serialisation
SULTHAN's - Data Structures
Look Ma, “update DB to HTML5 using C++”, no hands! 
Apache Hive, data segmentation and bucketing
Thomas risberg mongosv-2012-spring-data-cloud-foundry
Ad

More from Nick Dimiduk (12)

PDF
Apache Big Data EU 2015 - Phoenix
PDF
Apache HBase 1.0 Release
PPTX
HBase Low Latency, StrataNYC 2014
PDF
HBase Blockcache 101
PDF
Apache HBase Low Latency
PDF
Apache HBase for Architects
PDF
HBase Data Types (WIP)
PDF
Bring Cartography to the Cloud
PDF
HBase for Architects
PDF
HBase Client APIs (for webapps?)
PPTX
Pig, Making Hadoop Easy
KEY
Introduction to Hadoop, HBase, and NoSQL
Apache Big Data EU 2015 - Phoenix
Apache HBase 1.0 Release
HBase Low Latency, StrataNYC 2014
HBase Blockcache 101
Apache HBase Low Latency
Apache HBase for Architects
HBase Data Types (WIP)
Bring Cartography to the Cloud
HBase for Architects
HBase Client APIs (for webapps?)
Pig, Making Hadoop Easy
Introduction to Hadoop, HBase, and NoSQL

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development

HBase Data Types

  • 1. HBase Data Types Nick Dimiduk, Hortonworks @xefyr n10k.com
  • 2. Agenda • Motivations • Progress thus far • Future work • Examples • More Examples Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 2
  • 3. Why introduce types? • Δ(SQL, byte[]): (╯°□°)╯︵ ┻━┻ • Rule of least surprise • Interoperability across tools • Distill best practices Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 3
  • 4. Considerations • Opt-in for current users • Easy transition for existing applications • Client-side only mostly – Filters, Split policies, Coprocessors, Block encoding • Avoid POJO constraints – No required base-class/interface – No magic (avoid ASM, ORM) • Non-Java clients • HBASE-8089 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 4
  • 5. Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 5
  • 6. Inspiration • Orderly • PostgreSQL / PostGIS • HBASE-7221 • HBASE-7692 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 6
  • 7. Features: Encoding • Order preservation • Override direction (ASC/DSC) • Fixed, variable-width • Null-able • Self-identifying • Efficient Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 7
  • 8. Features: API • Complex type encoding – Compound rowkey pattern – Order preservation – Nullable fields • Runtime metadata • User-extensible Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 8
  • 10. Implementation: Encoding o.a.h.h.util.OrderedBytes • null • numeric, +/-Inf, NaN • int8, int16, int32, int64 • float32, float64 • variable-length text • variable-length blob Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. o.a.h.h.util.Bytes • numeric • boolean • int16, int32, int64 • float32, float64 • variable-length text 2014-­‐11-­‐18 10
  • 11. Implementation: API interface DataType<T> • decode() • encode() • encodedClass() • encodedLength() • getOrder() • isNullable() • isOrderPreserving() • isSkippable() • skip() implements DataType • OrderedXXX • RawXXX • Struct – StructBuilder – StructIterator – TerminatedWrapper – FixedLengthWrapper • Union{2,3,4} Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 11
  • 12. Up Next • “Default” types • More complex types – Arrays/Lists – Maps/Dicts • Tool integration – Apache Phoenix – Cloudera Kite • Performance audit, HBASE-8694 • Improved metadata, HBASE-8863 – isCastableTo – isCoercableTo – isComparableTo • TypedTable, HBASE-7941 • Beyond Java, HBASE-10091 – REST – Thrift – Shell • ImportTsv, HBASE-8593 • User documentation • Coprocessors? • Filters? • CAS? • DataBlockEncoders? Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 12
  • 14. A case for TypedTable Put p = new Put(Bytes.toBytes(u.user)); p.add(INFO_FAM, USER_COL, Bytes.toBytes(u.user)); p.add(INFO_FAM, NAME_COL, Bytes.toBytes(u.name)); p.add(INFO_FAM, EMAIL_COL, Bytes.toBytes(u.email)); p.add(INFO_FAM, PASS_COL, Bytes.toBytes(u.password)); Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 14
  • 15. A case for TypedTable! static final RawString ENC_STR = new RawString();! static final RawLong ENC_LONG = new RawLong();! --! ! SimplePositionedByteRange pbr =! new SimplePositionedByteRange(100);! ENC_STR.encode(pbr, u.user);! Put p = new Put(Bytes.copy(pbr.getBytes(), pbr.getOffset(), pbr.getPosition()));! p.add(INFO_FAM, USER_COL, Bytes.copy(pbr.getBytes(), ...);! pbr.setPosition(0);! ENC_STR.encode(pbr, u.name);! p.add(INFO_FAM, NAME_COL, Bytes.copy(pbr.getBytes(), ...);! ...! 2014-­‐11-­‐18 15 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 16. Structs: writing ! ! ! Struct struct = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .toStruct();! PositionedByteRange buf1 =! new SimplePositionedByteRange(7);! struct.encode(buf1,! new Object[] { BigDecimal.ONE, "foo" });! ! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 16
  • 17. Structs: reading ! ! ! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! ! > BigDecimal.ONE, foo! Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 17
  • 18. Structs: schema migration Struct addedFields = new StructBuilder()! .add(OrderedNumeric.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedString.ASCENDING)! .add(OrderedNumeric.ASCENDING)! .toStruct();! ! buf1.setPosition(0);! StructIterator it = longer.iterator(buf1);! while (it.hasNext()) {! System.out.print(it.next() + ", ");! }! > BigDecimal.ONE, foo, null, null! !2014-­‐11-­‐18 18 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 19. Protobuf (HBASE-11161) ! class PBKeyValue extends PBType<CellProtos.KeyValue> {! ! @Override! public int encode(PositionedByteRange dst, KeyValue val) {! CodedOutputStream os = outputStreamFromByteRange(dst);! int before = os.spaceLeft(), after, written;! val.writeTo(os);! after = os.spaceLeft();! written = before - after;! dst.setPosition(dst.getPosition() + written);! return written;! }! 2014-­‐11-­‐18 19 Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License.
  • 21. Thanks! M A N N I N G Nick Dimiduk Amandeep Khurana FOREWORD BY Michael Stack hbaseinaction.com Nick Dimiduk github.com/ndimiduk @xefyr n10k.com http://s.apache.org/bGN Licensed under a Crea3ve Commons A8ribu3on-­‐ShareAlike 3.0 Unported License. 2014-­‐11-­‐18 21