SlideShare a Scribd company logo
Apache HBase 0.98
Andrew Purtell

Committer, Apache HBase, Apache Software Foundation
Big Data US Research And Development, Intel
Who am I?
• Committer on the Apache HBase project
• Member of the Big Data Research And Development
Group at Intel
• Release manager for Apache HBase 0.98
What’s In Apache HBase 0.98?
• 212 resolved JIRAs
• New features
–
–
–
–
–
–
–
–

Reverse scans (HBASE-4811)
EXEC access checks for Endpoints (HBASE-6104)
Transparent server side encryption (HBASE-7544)
Per-cell ACLs (HBASE-7662)
Visibility labels (HBASE-7663)
Stripe compactions (HBASE-7667)
MapReduce over snapshots (HBASE-8369)
REST streaming scans (HBASE-9343)

• Performance improvements
– Improved WAL write threading model (HBASE-8755)

• API cleanups and many bug fixes
Branch Release Criteria
• Wire compatibility with HBase 0.96
– Mixed client↔server and server↔server operation with 0.96
possible as long as no 0.98 specific features enabled

• Compatible with earlier on-disk data formats
• Direct upgrade possible from 0.94 → 0.98 using the
same offline data migration procedure necessary for
0.94 → 0.96
• No significant performance regression from 0.96 using
defaults
• Binary API compatibility with versions < 0.98 not
guaranteed, code that directly references HBase JARs
may need to be recompiled
Reverse Scans (HBASE-4811)
• Introduces a new internal scanner type that seeks to
the end of a range and then steps backwards
• No longer necessary to maintain tables of keys in
reverse sort order for scanning
• Exposed at the client with a new Scan method
Scan#setReversed(boolean reversed)

• A few % slower than forward scanning in CPU bound
tests (server side, filters)
Endpoint EXEC Grants (HBASE-6104)
• HBase ACLs can grant a familiar set of privileges to
users (and groups):
–
–
–
–
–

(R)ead
(W)rite
E(X)excute
(C)reate
(A)dmin

• AccessController versions prior to 0.98 ignored X
• Now access to coprocessor Endpoint invocations can
be controlled on a global, per-table, or per-CF basis
–
–
–
–

Enable the AccessController
Set hbase.security.exec.permission.checks to “true”
Grant or revoke permissions as appropriate
Deploy the coprocessor application
Cell Tags
• All values written to HBase are stored into cells
– Cell is used interchangeably with “key-value” or “KeyValue” for
legacy reasons

• Cells can now also carry an arbitrary number of tags
–
–
–
–

Metadata, considered distinct from the key and the value
Optional dictionary compression for tags in HFiles and WALs
Only available server side
Coprocessors can manage their own user defined tags
HFile Version 3
• HFile version 2 plus
– The ability to persist cell tags
– Support for optional file block encryption

• Enabled via a site file change
– hfile.format.version -> 3

• Once enabled, all data is transparently migrated over
time as new files are written by flushes and
compactions
• Required for:
– Transparent Encryption (HBASE-7544)
– Per-cell ACLs (HBASE-7662)
– Visibility labels (HBASE-7663)

• Considered experimental, but proven stable under load
Transparent Encryption (HBASE-7544)
• Introduces a new generic cryptographic codec and key
management framework into hbase-common
• Provides transparent encryption of HBase on disk data
– Optional per-file HFile block encryption (requires HFile v3)
– Optional secure WAL reader and writer

• Provides simple key management
– Flexible and non-intrusive key rotation
– Two-tier key architecture for consistency with best practices
– Key provider supports secure local key storage or any network
or hardware key storage with Java KeyStore support

• Shell support
Transparent Encryption (HBASE-7544)
Per-Cell ACLs (HBASE-7662)
• Extends the AccessController with support for
persisting and checking ACL data in cell tags
• Uses existing API facilities to transmit per cell ACLs
• Backward
compatible
with existing installs and
code
• We treat ACLs on a cell
as scoped only to the
cell for straightforward
policy evolution
• All mutations must have
covering permission in a
dominating grant
Visibility Labels (HBASE-7663)
• Introduces a new VisibilityController coprocessor
• Introduces per-cell visibility expressions, client API
extensions for setting visibility and authorizations, and new
shell commands for label management
• The maximal set of labels for a user is defined with the new
shell command ‘setauths’ or equivalent admin API
• Users specify visibility expressions on cells
• Users submit authorizations on Gets and Scans
• The effective label set for the request is built in the RPC
context from authorizations; those not in the maximal set
are dropped
– How this is done is pluggable, e.g. integration with enterprise
identity management solutions

• Scan results are filtered with (label) set membership tests
Visibility Labels (HBASE-7663)
• Visibility expressions
– Labels:
arbitrary
strings
(converted into ordinals with an
internal dictionary)
– Expressions: Labels joined in
boolean expressions
– Operators: &, |, !
– Parenthesis for precedence
secret
secret | topsecret
( secret | topsecret ) & !probationary
Improved WAL Write Throughput (HBASE-8755)
• Introduces a new threading model for WAL writes that
reduces lock contention
• Provides better write throughput when under load
– A ~15% improvement in write ops/sec at high write
concurrency

• Lays groundwork for multiple WALs
– Will provide further write throughput increase
– Also important for limiting the impact of encrypting WAL
entries
Stripe Compactions (HBASE-7667)
• Stripe compactions split the data inside the region by
row key and create sub-ranges of data
• The sub-ranges are compacted independently
• Depending on ingest and access patterns, using stripe
compactions can reduce read latency variability and
reduce compaction data volume (write amplification)
• Two use cases in particular may benefit
1. Approximately uniform keys and large regions
2. Non-uniform data with sequential row keys (e.g. log data)

• Can be complex to configure and tune, consult the
documentation for detail
MapReduce Over Snapshots (HBASE-8369)
• Introduces MapReduce utilities supporting MR jobs
over snapshots of table data
• Similar to TableInputFormat but instead of running over
an online table using the HBase API it runs directly
over HFiles on disk collected from a table snapshot.
• For performance-dominant use cases where the
HBase API cannot provide sufficient throughput
– Can increase throughput of bulk scanning ~5x by streaming
HDFS reads directly to the client

• Caveat: Not recommended from a security perspective
– Built in access control is completely bypassed
– It is a risk to open direct access to HFile data in HDFS
REST Streaming Scans (HBASE-9343)
• The REST gateway provides stateful scanners to be
consistent with the HBase API but this is not REST-ful
– Scanner state is not shared across multiple gateways
– Scanner state will be lost if the gateway fails

• Introduces a new scanning mode to the REST API for
stateless scanning
• The client manages paging and limits
• Instead of forcing a batching up of results as they
come back from the RegionServers into multiple HTTP
transactions, the stateless scanner can stream all
results back to the client over one HTTP connection
End
Questions?

More Related Content

PDF
New Security Features in Apache HBase 0.98: An Operator's Guide
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
PDF
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
PPTX
Apache HBase™
PPTX
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
PPTX
A Survey of HBase Application Archetypes
PPTX
Apache phoenix
PPTX
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
New Security Features in Apache HBase 0.98: An Operator's Guide
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
Apache HBase™
Apache Falcon - Simplifying Managing Data Jobs on Hadoop
A Survey of HBase Application Archetypes
Apache phoenix
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...

What's hot (20)

PPTX
Introduction To HBase
PDF
Intro to HBase - Lars George
PDF
Building a Hadoop Data Warehouse with Impala
PPTX
Meet hbase 2.0
PDF
Advanced Security In Hadoop Cluster
PDF
SQOOP PPT
PPTX
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
ODP
Apache hadoop hbase
PDF
The Heterogeneous Data lake
PPTX
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
PPT
8. key value databases laboratory
PPTX
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PDF
SQL on Hadoop
PPT
Chicago Data Summit: Apache HBase: An Introduction
PPTX
Dancing with the elephant h base1_final
PPTX
Hadoop World 2011: Advanced HBase Schema Design
PDF
Building a Hadoop Data Warehouse with Impala
PDF
Performance Analysis of HBASE and MONGODB
PPTX
Empower Data-Driven Organizations with HPE and Hadoop
Introduction To HBase
Intro to HBase - Lars George
Building a Hadoop Data Warehouse with Impala
Meet hbase 2.0
Advanced Security In Hadoop Cluster
SQOOP PPT
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Apache hadoop hbase
The Heterogeneous Data lake
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
8. key value databases laboratory
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
SQL on Hadoop
Chicago Data Summit: Apache HBase: An Introduction
Dancing with the elephant h base1_final
Hadoop World 2011: Advanced HBase Schema Design
Building a Hadoop Data Warehouse with Impala
Performance Analysis of HBASE and MONGODB
Empower Data-Driven Organizations with HPE and Hadoop
Ad

Viewers also liked (17)

PDF
HBase Consistency and Performance Improvements
PPTX
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
PPTX
001 hbase introduction
PDF
Hadoop voor niet-technici
PPTX
Streaming map reduce
PPTX
阿里自研数据库 Ocean base实践
PDF
Hbase Nosql
PPTX
IoT:what about data storage?
PDF
Facebook Messages & HBase
PPTX
Time-Series Apache HBase
PDF
Build a Time Series Application with Apache Spark and Apache HBase
PPTX
Hortonworks Technical Workshop: HBase For Mission Critical Applications
PDF
唯品会大数据实践 Sacc pub
PPTX
Content Identification using HBase
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PDF
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
PDF
Meet HBase 1.0
HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
001 hbase introduction
Hadoop voor niet-technici
Streaming map reduce
阿里自研数据库 Ocean base实践
Hbase Nosql
IoT:what about data storage?
Facebook Messages & HBase
Time-Series Apache HBase
Build a Time Series Application with Apache Spark and Apache HBase
Hortonworks Technical Workshop: HBase For Mission Critical Applications
唯品会大数据实践 Sacc pub
Content Identification using HBase
Design Patterns for Building 360-degree Views with HBase and Kiji
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Meet HBase 1.0
Ad

Similar to Apache HBase 0.98 (20)

PDF
HBaseCon 2015: Meet HBase 1.0
PPTX
Meet HBase 2.0
PPTX
Meet Apache HBase - 2.0
PDF
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
PDF
HBase Status Report - Hadoop Summit Europe 2014
PDF
HBase lon meetup
PPTX
HBase state of the union
PPTX
Apache HBase: State of the Union
PDF
Hbase status quo apache-con europe - nov 2012
PDF
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
KEY
HBase and Hadoop at Urban Airship
PDF
HBase Coprocessors @ HUG NYC
PDF
HBaseConAsia2018 Keynote1: Apache HBase Project Status
PDF
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
PPTX
Meet HBase 2.0 and Phoenix-5.0
PDF
Apache Big Data EU 2015 - HBase
POTX
Meet HBase 2.0 and Phoenix 5.0
PDF
HBase Client APIs (for webapps?)
PPTX
Introduction to Apache HBase
PDF
Michael stack -the state of apache h base
HBaseCon 2015: Meet HBase 1.0
Meet HBase 2.0
Meet Apache HBase - 2.0
Hadoop World 2011: Apache HBase Road Map - Jonathan Gray - Facebook
HBase Status Report - Hadoop Summit Europe 2014
HBase lon meetup
HBase state of the union
Apache HBase: State of the Union
Hbase status quo apache-con europe - nov 2012
HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera
HBase and Hadoop at Urban Airship
HBase Coprocessors @ HUG NYC
HBaseConAsia2018 Keynote1: Apache HBase Project Status
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Meet HBase 2.0 and Phoenix-5.0
Apache Big Data EU 2015 - HBase
Meet HBase 2.0 and Phoenix 5.0
HBase Client APIs (for webapps?)
Introduction to Apache HBase
Michael stack -the state of apache h base

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.

Apache HBase 0.98

  • 1. Apache HBase 0.98 Andrew Purtell Committer, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel
  • 2. Who am I? • Committer on the Apache HBase project • Member of the Big Data Research And Development Group at Intel • Release manager for Apache HBase 0.98
  • 3. What’s In Apache HBase 0.98? • 212 resolved JIRAs • New features – – – – – – – – Reverse scans (HBASE-4811) EXEC access checks for Endpoints (HBASE-6104) Transparent server side encryption (HBASE-7544) Per-cell ACLs (HBASE-7662) Visibility labels (HBASE-7663) Stripe compactions (HBASE-7667) MapReduce over snapshots (HBASE-8369) REST streaming scans (HBASE-9343) • Performance improvements – Improved WAL write threading model (HBASE-8755) • API cleanups and many bug fixes
  • 4. Branch Release Criteria • Wire compatibility with HBase 0.96 – Mixed client↔server and server↔server operation with 0.96 possible as long as no 0.98 specific features enabled • Compatible with earlier on-disk data formats • Direct upgrade possible from 0.94 → 0.98 using the same offline data migration procedure necessary for 0.94 → 0.96 • No significant performance regression from 0.96 using defaults • Binary API compatibility with versions < 0.98 not guaranteed, code that directly references HBase JARs may need to be recompiled
  • 5. Reverse Scans (HBASE-4811) • Introduces a new internal scanner type that seeks to the end of a range and then steps backwards • No longer necessary to maintain tables of keys in reverse sort order for scanning • Exposed at the client with a new Scan method Scan#setReversed(boolean reversed) • A few % slower than forward scanning in CPU bound tests (server side, filters)
  • 6. Endpoint EXEC Grants (HBASE-6104) • HBase ACLs can grant a familiar set of privileges to users (and groups): – – – – – (R)ead (W)rite E(X)excute (C)reate (A)dmin • AccessController versions prior to 0.98 ignored X • Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-CF basis – – – – Enable the AccessController Set hbase.security.exec.permission.checks to “true” Grant or revoke permissions as appropriate Deploy the coprocessor application
  • 7. Cell Tags • All values written to HBase are stored into cells – Cell is used interchangeably with “key-value” or “KeyValue” for legacy reasons • Cells can now also carry an arbitrary number of tags – – – – Metadata, considered distinct from the key and the value Optional dictionary compression for tags in HFiles and WALs Only available server side Coprocessors can manage their own user defined tags
  • 8. HFile Version 3 • HFile version 2 plus – The ability to persist cell tags – Support for optional file block encryption • Enabled via a site file change – hfile.format.version -> 3 • Once enabled, all data is transparently migrated over time as new files are written by flushes and compactions • Required for: – Transparent Encryption (HBASE-7544) – Per-cell ACLs (HBASE-7662) – Visibility labels (HBASE-7663) • Considered experimental, but proven stable under load
  • 9. Transparent Encryption (HBASE-7544) • Introduces a new generic cryptographic codec and key management framework into hbase-common • Provides transparent encryption of HBase on disk data – Optional per-file HFile block encryption (requires HFile v3) – Optional secure WAL reader and writer • Provides simple key management – Flexible and non-intrusive key rotation – Two-tier key architecture for consistency with best practices – Key provider supports secure local key storage or any network or hardware key storage with Java KeyStore support • Shell support
  • 11. Per-Cell ACLs (HBASE-7662) • Extends the AccessController with support for persisting and checking ACL data in cell tags • Uses existing API facilities to transmit per cell ACLs • Backward compatible with existing installs and code • We treat ACLs on a cell as scoped only to the cell for straightforward policy evolution • All mutations must have covering permission in a dominating grant
  • 12. Visibility Labels (HBASE-7663) • Introduces a new VisibilityController coprocessor • Introduces per-cell visibility expressions, client API extensions for setting visibility and authorizations, and new shell commands for label management • The maximal set of labels for a user is defined with the new shell command ‘setauths’ or equivalent admin API • Users specify visibility expressions on cells • Users submit authorizations on Gets and Scans • The effective label set for the request is built in the RPC context from authorizations; those not in the maximal set are dropped – How this is done is pluggable, e.g. integration with enterprise identity management solutions • Scan results are filtered with (label) set membership tests
  • 13. Visibility Labels (HBASE-7663) • Visibility expressions – Labels: arbitrary strings (converted into ordinals with an internal dictionary) – Expressions: Labels joined in boolean expressions – Operators: &, |, ! – Parenthesis for precedence secret secret | topsecret ( secret | topsecret ) & !probationary
  • 14. Improved WAL Write Throughput (HBASE-8755) • Introduces a new threading model for WAL writes that reduces lock contention • Provides better write throughput when under load – A ~15% improvement in write ops/sec at high write concurrency • Lays groundwork for multiple WALs – Will provide further write throughput increase – Also important for limiting the impact of encrypting WAL entries
  • 15. Stripe Compactions (HBASE-7667) • Stripe compactions split the data inside the region by row key and create sub-ranges of data • The sub-ranges are compacted independently • Depending on ingest and access patterns, using stripe compactions can reduce read latency variability and reduce compaction data volume (write amplification) • Two use cases in particular may benefit 1. Approximately uniform keys and large regions 2. Non-uniform data with sequential row keys (e.g. log data) • Can be complex to configure and tune, consult the documentation for detail
  • 16. MapReduce Over Snapshots (HBASE-8369) • Introduces MapReduce utilities supporting MR jobs over snapshots of table data • Similar to TableInputFormat but instead of running over an online table using the HBase API it runs directly over HFiles on disk collected from a table snapshot. • For performance-dominant use cases where the HBase API cannot provide sufficient throughput – Can increase throughput of bulk scanning ~5x by streaming HDFS reads directly to the client • Caveat: Not recommended from a security perspective – Built in access control is completely bypassed – It is a risk to open direct access to HFile data in HDFS
  • 17. REST Streaming Scans (HBASE-9343) • The REST gateway provides stateful scanners to be consistent with the HBase API but this is not REST-ful – Scanner state is not shared across multiple gateways – Scanner state will be lost if the gateway fails • Introduces a new scanning mode to the REST API for stateless scanning • The client manages paging and limits • Instead of forcing a batching up of results as they come back from the RegionServers into multiple HTTP transactions, the stateless scanner can stream all results back to the client over one HTTP connection