SlideShare a Scribd company logo
Security Features in Apache HBase –
An Operator’s Guide
Anoop Sam John, Andrew Purtell, Ramkrishna S. Vasudevan
Committers and PMC Members, Apache HBase, Apache Software Foundation
Big Data US Research And Development, Intel
v5
• New Security Features in Apache HBase 0.98
• Controlling Access To Data
– Role-Based Access Control Using Groups and ACLs
– Role-Based Access Control Using Labels
– Attribute-Based Access Control Using Labels
• Preventing Data Leaks
– Transparent Encryption
• Performance Considerations
Outline
New Security Features in Apache HBase 0.98
Cell Tags
• All values written to HBase are stored in cells
• Cells can now also carry an arbitrary number of tags
– Metadata, considered distinct from the key and the value
– Compressed when persisted to HFiles
– Server side only
• Clients cannot get or send cells with tags directly
• Tags will be correctly replicated if cross-cluster replication is enabled
Cell ACLs (HBASE-7662)
• Extends the existing HBase ACL model with support for persisting
and checking per-cell ACL data in tags
– (R)ead, (W)rite, E(X)ecute, (A)dmin, (C)reate
– Namespace → Table →
Column Family → Cell
• Backwards compatible with
existing installs and code
• Uses existing facilities (operation
attributes) to carry cell ACLs to
supporting servers
Cell ACLs (HBASE-7662)
• Cell ACLs are scoped to the same point in time as the cell itself
– Simple and straightforward evolution of security policy over time without
expensive updates
• We require that mutations have covering permission
– The union of the user’s table perms, CF perms, and perms in the most
recent visible[1] version, if the value already exists, must allow the
pending mutation in order for it to be applied
– For Deletes, in addition, all visible prior versions covered by the Delete
must allow the Delete
– Delete semantics are being refined
• Complex Deletes may be rejected; just resubmit as simpler ops
• Improved in 0.98.2, likely fully resolved in 0.98.3
1. Visible is defined here as not covered already by a committed delete marker
Cell Labels (HBASE-7663)
• Visibility expression support via a new security coprocessor
– Labels: arbitrary strings
– Expressions: Labels joined in boolean expressions
– Operators: &, |, !, ( )
secret
secret | topsecret
( secret | topsecret ) & !probationary
Cell Labels (HBASE-7663)
• New admin APIs and new shell commands for label management
• The universe of labels and the maximal set of labels for a user are
defined up front
• Users label cells using visibility expressions
• Other users ask for authorizations on Gets and Scans
• We build a user’s effective set of authorizations per request in a
pluggable way on the server
• Scan results are filtered according to the user’s effective
authorizations
• VisibilityController and AccessController can be used together
Transparent Encryption (HBASE-7544)
• Transparent encryption of HBase on disk data
– HFile blocks are encrypted as written and decrypted as read
– Write ahead log (WAL) serialization is pluggable; we provide new
secure writers and readers that encrypt and decrypt edits
• Built on a new extensible cryptographic codec and key management
framework in HBase
• Simple key management
– Default provider integrates with the Java Keystore
• Per column family configuration
– Supports schema design that places sensitive information in only a
subset of column families
Transparent Encryption (HBASE-7544)
Endpoint EXEC Grants (HBASE-6104)
• HBase ACLs grant a familiar set of privileges to users and groups:
– (R)ead, (W)rite, E(X)excute, (C)reate, (A)dmin
• Versions prior to 0.98.0 ignored X
• Now access to coprocessor Endpoint invocations can be controlled
on a global, per-table, or per-column family basis
Controlling Access To Data
Our Example Schema
• A simple user information table
Row Key Column Family: i Column Family: pii
uid i:fullname pii:address
i:nick pii:phone
pii:cc
pii:cvv2
pii:expdate
> create ‘user’, 
{ NAME => ‘i’, COMPRESSION => ’snappy’, VERSIONS => 10 }, 
{ NAME => ‘pii’, COMPRESSION => ’snappy’, VERSIONS => 10 }
Our Example Security Policy
• Column family: i
Our Example Security Policy
• Column family: pii
Getting Started
• Enable HFile V3
– hfile.format.version=3
• Enable SASL+Kerberos authentication
– RPC: Follow the steps in section 8.1 of the online manual:
https://hbase.apache.org/book/security.html
– ZooKeeper: Follow the steps in section 17.2 of the online manual:
https://hbase.apache.org/book/zk.sasl.auth.html
• Install security coprocessors
– hbase.coprocessor.region.classes=
org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.visibility.VisibilityController,
org.apache.hadoop.hbase.security.token.TokenProvider
Getting Started
– hbase.coprocessor.master.classes=
org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.visibility.VisibilityController
– hbase.coprocessor.regionserver.classes=
org.apache.hadoop.hbase.security.access.AccessController
• Enable Endpoint exec permission checks
– hbase.security.exec.permission.checks=true
• [Optional] Enable transport security
– hbase.rpc.protection=auth-conf
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Map each role in the organization to a LDAP entity
– Employee ->
• cn=user, member: ou=users,dc=groups, dc=example,dc=org
– Developer ->
• cn=developer, member: ou=developers,dc=groups,dc=example,dc=org
– Test User Account ->
• cn=testuser, member: ou=users,dc=example,dc=org
– Service Account ->
• cn=service, member: ou=services,dc=example,dc=org
– Admin ->
• cn=manager,dc=example,dc=org
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Set up the Hadoop group mapper (core-site.xml)
– hadoop.security.group.mapping=
org.apache.hadoop.security.LdapGroupsMapping
– hadoop.security.group.mapping.ldap.url=…
– hadoop.security.group.mapping.ldap.bind.user=…
– hadoop.security.group.mapping.ldap.search.filter.user=
(& (|(objectclass=person)(objectclass=applicationProcess))(cn={0}))
– hadoop.security.group.mapping.ldap.search.filter.group=
(objectclass=groupofnames)
– hadoop.security.group.mapping.ldap.search.attr.member=member
– hadoop.security.group.mapping.ldap.search.attr.group.name=cn
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Confirm the configuration is working correctly
hbase> whoami
service (auth:KERBEROS)
groups: services
Role-Based Access Control
Using the Hadoop Group Mapping Service and ACLs
• Grant permissions to groups and service and test accounts
hbase> grant '@admins', 'RWXCA'
hbase> grant 'service', 'RWXCA', 'user'
hbase> grant '@developers', 'RW', 'user', 'i'
hbase> grant 'testuser', 'RW', 'user', 'i'
hbase> grant 'user', 
{ '@developers' => 'RW', 'testuser' => 'R' }, 
{ COLUMNS => 'pii', FILTER => "(PrefixFilter ('test'))" }
Note: Cell grants done by the shell apply to existing cells only. This is useful for testing. In practice applications must add the
desired cell ACL to the operation when submitting writes.
Role-Based Access Control
Using Labels
• Define labels corresponding to roles in the security policy
admin
service
test
developer
Role-Based Access Control
Using Labels
• Express access rules as visibility expressions
admin | service
admin | service | test
admin | service | developer
admin | service | developer | test
• Define labels
hbase> add_labels [ 'admin', 'service', 'developer', 'test' ]
Role-Based Access Control
Using Labels
• Assign one or more roles to each user by associating their principal
with a label set
hbase> set_auths 'service', [ 'service' ]
hbase> set_auths 'testuser', [ 'test' ]
hbase> set_auths 'manager', [ 'admin' ]
hbase> set_auths 'dev', [ 'developer' ]
hbase> set_auths 'qa', [ 'test', 'developer' ]
hbase> …
Role-Based Access Control
Using Labels
• Apply appropriate visibility expressions to cells
hbase> set_visibility 'user', 'admin|service|developer', 
{ COLUMNS => 'i' }
hbase> set_visibility 'user', 'admin|service', 
{ COLUMNS => ' pii' }
hbase> set_visibility 'user', 'admin|service|developer|test',
{ COLUMNS => [ 'i', 'pii' ], 
FILTER => "(PrefixFilter ('test'))" }
Note: Visibility expressions added to cells by the shell apply to existing cells only. This is useful for testing. In practice
applications must add the desired visibility expression to the operation when submitting writes.
Attribute-Based Access Control
• We can construct the effective authorization set for a user in a
pluggable and stackable way
← Retrieves principal for user
← Maps principal to group names
← Imports auths from request
← Enforces minimum auths
Auths table
← Maps identity attributes to auths
Directory
Attribute-Based Access Control
• LDAP plugin can mix in auths corresponding to attributes of the
subject’s identity
– Expected soon in 0.98 (maybe 0.98.4) Query
(&(objectClass=person)
(userPrincipalName={0}))
Attribute Mapping
<attribute>: <regex> → <auth>
memberOf: .+ -> $1
division: .+ -> $1
department: .+ -> $1
employeeID: P[0-9]+ -> probationary
Directory
Attribute-Based Access Control
Using Labels
• Apply appropriate visibility expressions to cells
hbase> set_visibility 'user', 
'admin|service|(developer&(!probationary))', 
{ COLUMNS => 'i' }
hbase> set_visibility 'user', 'admin|service', 
{ COLUMNS => ' pii' }
hbase> set_visibility 'user', 
'admin|service|((developer|test)&(!probationary))', 
{ COLUMNS => [ 'i', 'pii' ], 
FILTER => "(PrefixFilter ('test'))" }
Attribute-Based Access Control
Using ACLs
• An area of future work
– We could consider a HBase provided replacement for the Hadoop
Group Mapper that also supports mapping object attributes to strings
– For the VisibilityController, the mapped strings would be interpreted as
auths (see slide #27)
– For the AccessController, the mapped strings could be interpreted as
group names
– See HBASE-10919[1] or raise a discussion on user@hbase.apache.org
1. https://issues.apache.org/jira/browse/HBASE-10919
Preventing Data Leaks
Protecting Data At Rest
• HBase is deployed into a layered system
• Incorrect handling of permissions or storage volumes at the HDFS
layer or below could expose sensitive information
Apache HBase
Apache ZooKeeper
ZooKeeper ZooKeeper ZooKeeper
Apache Hadoop Distributed File System (HDFS)
DataNode
MasterMaster
(Standby)
RegionServer
DataNode DataNode DataNode DataNode
RegionServer RegionServer RegionServer RegionServer
Getting Started
• Create the cluster master key in a KeyStore file
$ keytool -keystore hbase.jks -storetype jceks –genseckey 
-keyalg AES -keysize 128 -storepass secret 
-alias hbase-master-default
• Deploy the KeyStore file to all site configuration directories and
restrict local access to it
$ chown hbase:hbase hbase.jks
$ chmod 0600 hbase.jks (-rw-------)
• Enable HFile V3
– hfile.format.version=3
Getting Started
• Set up key provider configuration for KeyStore files
– hbase.crypto.keyprovider=
org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider
– hbase.crypto.keyprovider.parameters=
jceks:///path/to/hbase/conf/hbase.jks?password=secret
– hbase.crypto.master.key.name=hbase-master-default
• Restrict local access to the site file
$ chown hbase:hbase hbase-site.xml
$ chmod 0600 hbase-site.xml (-rw-------)
• The KeyStore password need not be embedded in the site file
– Use ?passwordFile=/path/to/password/file and protect that instead
Getting Started
• Enable WAL encryption
– hbase.crypto.wal.key.name=hbase-master-default
– hbase.regionserver.hlog.reader.impl=
org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
– hbase.regionserver.hlog.writer.impl=
org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
– hbase.regionserver.wal.encryption=true
WAL encryption is configured separately from HFile encryption to enable
storage management with tiered sensitivity
• (JRE 8+) Enable AES-NI acceleration features
– Add to hbase-env.sh: – XX:+UseAES –XX:+UseAESIntrinsics
Transparent Encryption
• Segregate sensitive information into one or a few column families
with HFile encryption enabled
– We are storing sensitive personally identifiable customer information in
the “pii” family
– Enable encryption on “pii” only to mitigate performance impact
– After changing schema, run a major compaction to insure all files are
(eventually) transformed
hbase> disable 'user'
hbase> alter 'user', { NAME => 'pii',
COMPRESSION => 'snappy', 
ENCRYPTION => 'aes' }
hbase> enable 'user'
hbase> major_compact 'user'
Row Key Column Family: i Column Family: pii
uid i:fullname pii:address
i:nick pii:phone
pii:cc
pii:cvv2
pii:expdate
Transparent Encryption
• Data key management
– RegionServers retrieve and unwrap CF keys from descriptors as
needed to encrypt HFiles
– The data key for a CF can be modified at any time by the admin
• Or, encryption can be enabled and disabled entirely
• CF encryption is completely reversible!
– HFiles contain the data key used for encryption, wrapped (encrypted) by
the master key
• Supports incremental rekeying without expensive IO or downtime
– Simply trigger major compaction to normalize encryption and data
keying state over the entire CF
• Can be done on a region by region basis with a HBase shell script
Transparent Encryption
• Master key rotation
– Should be an infrequent operation, an attacker able to observe even all
schema and HFiles gains very little information about it over time
– Store a copy of the current master key with an alternate alias e.g.
“hbase-master-alt”
– Replace the master key with a new one
– Update site file
• hbase.crypto.master.alternate.key.name=hbase-master-alt
– Do a rolling restart of all HBase server processes
– Trigger a major compaction and wait for completion
– Remove the old master key from the KMS and remove alt alias from site
– Do another rolling restart of all HBase server processes
Key Providers
• Any Key Management System with a Java KeyStore provider can be
supported by the KeyStoreKeyProvider
• Or natively, via custom HBase KeyProviders
• Update site configuration
hbase.crypto.keyprovider
hbase.crypto.keyprovider.parameters
HBase
KeyStoreKeyProvider
HBase
YourKeyProvider
JDK KeyStore provider framework
Thales Luna CloudHSM . . .
Cipher Providers
• We support alternate or accelerated ciphers with either:
1. Java Cryptography (JCE) algorithm provider
• Install a signed JCE provider (supporting “AES/CTR/NoPadding”
mode with 128 bit keys)
• Add it with highest preference to the JCE site configuration file
$JAVA_HOME/lib/security/java.security
• Update site configuration
hbase.crypto.algorithm.aes.provider
hbase.crypto.algorithm.rng.provider
2. Custom HBase Cipher implementation
• Start at org.apache.hadoop.hbase.io.crypto.CipherProvider
• Make it available on the server classpath
• Update site configuration
hbase.crypto.cipherprovider
Performance Considerations
WAL Encryption
• Performance implications of WAL encryption
– As measured by HLogPerformanceEvaluation microbenchmark
– Relative differences are what is interesting
– WAL throughput ceiling ~10% lower with 7u45
– ~8% lower with 8u20
• Future mitigation: When HDFS storage tiering capability is in
production, configure separate storage tiers for WAL and HFile data
Test
Throughput
ops/sec
Total cycles
Insns per
cycle
Oracle Java 1.7.0_45-b18 - None 52658.302 8878179986750 0.47
Oracle Java 1.7.0_45-b18 - AES WAL encryption 48045.834 9911748458387 0.57
OpenJDK 1.8.0_20-b09 - None 54874.125 8662634367005 0.46
OpenJDK 1.8.0_20-b09 - AES WAL encryption 50659.507 9668111259270 0.61
Promoting Common ACLs
• When designing security policy for a table, consider that table and
column family level grants are inexpensive compared to cell level
grants
– Table and CF level grants are cached in memory
– Cell level grants require region scanning
• We consider permissions as the union of grants at all levels; a table
or CF grant allows us to early out
• If a user will always be granted permissions at the cell level,
promote their access to a column family or table level grant
End
Questions?

More Related Content

PDF
Apache HBase 0.98
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PPTX
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
PDF
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
PDF
Large-scale Web Apps @ Pinterest
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
PDF
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
PDF
Efficient in situ processing of various storage types on apache tajo
Apache HBase 0.98
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon 2012 | HBase for the Worlds Libraries - OCLC
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
Large-scale Web Apps @ Pinterest
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...
Efficient in situ processing of various storage types on apache tajo

What's hot (20)

PDF
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
PDF
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
PDF
Neo4j 4.1 overview
PPTX
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
PDF
Log analysis with elastic stack
PPTX
Document validation in MongoDB 3.2
PPTX
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
PPTX
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
PPTX
Apache phoenix
PPTX
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
PPTX
Running secured Spark job in Kubernetes compute cluster and integrating with ...
PPTX
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
PDF
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
PDF
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
PDF
Database Security Threats - MariaDB Security Best Practices
PDF
Apache Cassandra at Macys
PPTX
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
PPTX
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
PDF
Hadoop security
PPTX
Hadoop security
HBaseCon 2012 | HBase Filtering - Lars George, Cloudera
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
Neo4j 4.1 overview
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
Log analysis with elastic stack
Document validation in MongoDB 3.2
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Designing & Optimizing Micro Batching Systems Using 100+ Nodes (Ananth Ram, R...
Apache phoenix
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
DataStax | Effective Testing in DSE (Lessons Learned) (Predrag Knezevic) | Ca...
Database Security Threats - MariaDB Security Best Practices
Apache Cassandra at Macys
DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | C...
Tajo Seoul Meetup July 2015 - What's New Tajo 0.11
Hadoop security
Hadoop security
Ad

Viewers also liked (20)

PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PPTX
Content Identification using HBase
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
PPTX
HBase Data Modeling and Access Patterns with Kite SDK
PDF
Apache HBase in the Enterprise Data Hub at Cerner
PPTX
HBaseCon 2013: Compaction Improvements in Apache HBase
PDF
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
PPTX
Streaming map reduce
PDF
Hindex: Secondary indexes for faster HBase queries
PDF
HBase Consistency and Performance Improvements
PDF
Hbase Nosql
PPTX
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
PPTX
IoT:what about data storage?
PPTX
Time-Series Apache HBase
PPTX
Fine-Grained Security for Spark and Hive
PDF
Build a Time Series Application with Apache Spark and Apache HBase
PPTX
Hortonworks Technical Workshop: HBase For Mission Critical Applications
PPTX
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
PPTX
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
PPT
HBaseCon 2012 | Building Mobile Infrastructure with HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
Content Identification using HBase
Design Patterns for Building 360-degree Views with HBase and Kiji
HBase Data Modeling and Access Patterns with Kite SDK
Apache HBase in the Enterprise Data Hub at Cerner
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Streaming map reduce
Hindex: Secondary indexes for faster HBase queries
HBase Consistency and Performance Improvements
Hbase Nosql
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
IoT:what about data storage?
Time-Series Apache HBase
Fine-Grained Security for Spark and Hive
Build a Time Series Application with Apache Spark and Apache HBase
Hortonworks Technical Workshop: HBase For Mission Critical Applications
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2012 | Building Mobile Infrastructure with HBase
Ad

Similar to New Security Features in Apache HBase 0.98: An Operator's Guide (20)

PDF
Building secure NoSQL applications nosqlnow_conf_2014
PDF
Hadoop & Security - Past, Present, Future
PDF
BigData Security - A Point of View
PDF
2014 sept 4_hadoop_security
PPTX
HBase.pptx
PDF
XA Secure | Whitepaper on data security within Hadoop
PPTX
BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apa...
PDF
Hadoop security landscape
PPTX
HBase New Features
 
PDF
003 admin featuresandclients
PPTX
Securing the Hadoop Ecosystem
PDF
Michael stack -the state of apache h base
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
PPTX
Big data security
PPTX
Hdp security overview
PPTX
Apache HBase Internals you hoped you Never Needed to Understand
PPT
HBASE Overview
PDF
Hbase
ODP
HBase introduction talk
PPTX
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...
Building secure NoSQL applications nosqlnow_conf_2014
Hadoop & Security - Past, Present, Future
BigData Security - A Point of View
2014 sept 4_hadoop_security
HBase.pptx
XA Secure | Whitepaper on data security within Hadoop
BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apa...
Hadoop security landscape
HBase New Features
 
003 admin featuresandclients
Securing the Hadoop Ecosystem
Michael stack -the state of apache h base
Curb your insecurity with HDP - Tips for a Secure Cluster
Big data security
Hdp security overview
Apache HBase Internals you hoped you Never Needed to Understand
HBASE Overview
Hbase
HBase introduction talk
HBaseConAsia2018 Track2-1: Kerberos-based Big Data Security Solution and Prac...

More from HBaseCon (20)

PDF
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
PDF
hbaseconasia2017: HBase on Beam
PDF
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
PDF
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
PDF
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
PDF
hbaseconasia2017: Apache HBase at Netease
PDF
hbaseconasia2017: HBase在Hulu的使用和实践
PDF
hbaseconasia2017: 基于HBase的企业级大数据平台
PDF
hbaseconasia2017: HBase at JD.com
PDF
hbaseconasia2017: Large scale data near-line loading method and architecture
PDF
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
hbaseconasia2017: hbase-2.0.0
PDF
HBaseCon2017 Democratizing HBase
PDF
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
PDF
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
PDF
HBaseCon2017 Transactions in HBase
PDF
HBaseCon2017 Highly-Available HBase
PDF
HBaseCon2017 Apache HBase at Didi
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: hbase-2.0.0
HBaseCon2017 Democratizing HBase
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Transactions in HBase
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 gohbase: Pure Go HBase Client

Recently uploaded (20)

PDF
medical staffing services at VALiNTRY
PPTX
history of c programming in notes for students .pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Transform Your Business with a Software ERP System
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
How Creative Agencies Leverage Project Management Software.pdf
medical staffing services at VALiNTRY
history of c programming in notes for students .pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Transform Your Business with a Software ERP System
VVF-Customer-Presentation2025-Ver1.9.pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Choose the Right IT Partner for Your Business in Malaysia
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 2 - PM Management and IT Context
PTS Company Brochure 2025 (1).pdf.......
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Softaken Excel to vCard Converter Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Understanding Forklifts - TECH EHS Solution
Which alternative to Crystal Reports is best for small or large businesses.pdf
How Creative Agencies Leverage Project Management Software.pdf

New Security Features in Apache HBase 0.98: An Operator's Guide

  • 1. Security Features in Apache HBase – An Operator’s Guide Anoop Sam John, Andrew Purtell, Ramkrishna S. Vasudevan Committers and PMC Members, Apache HBase, Apache Software Foundation Big Data US Research And Development, Intel v5
  • 2. • New Security Features in Apache HBase 0.98 • Controlling Access To Data – Role-Based Access Control Using Groups and ACLs – Role-Based Access Control Using Labels – Attribute-Based Access Control Using Labels • Preventing Data Leaks – Transparent Encryption • Performance Considerations Outline
  • 3. New Security Features in Apache HBase 0.98
  • 4. Cell Tags • All values written to HBase are stored in cells • Cells can now also carry an arbitrary number of tags – Metadata, considered distinct from the key and the value – Compressed when persisted to HFiles – Server side only • Clients cannot get or send cells with tags directly • Tags will be correctly replicated if cross-cluster replication is enabled
  • 5. Cell ACLs (HBASE-7662) • Extends the existing HBase ACL model with support for persisting and checking per-cell ACL data in tags – (R)ead, (W)rite, E(X)ecute, (A)dmin, (C)reate – Namespace → Table → Column Family → Cell • Backwards compatible with existing installs and code • Uses existing facilities (operation attributes) to carry cell ACLs to supporting servers
  • 6. Cell ACLs (HBASE-7662) • Cell ACLs are scoped to the same point in time as the cell itself – Simple and straightforward evolution of security policy over time without expensive updates • We require that mutations have covering permission – The union of the user’s table perms, CF perms, and perms in the most recent visible[1] version, if the value already exists, must allow the pending mutation in order for it to be applied – For Deletes, in addition, all visible prior versions covered by the Delete must allow the Delete – Delete semantics are being refined • Complex Deletes may be rejected; just resubmit as simpler ops • Improved in 0.98.2, likely fully resolved in 0.98.3 1. Visible is defined here as not covered already by a committed delete marker
  • 7. Cell Labels (HBASE-7663) • Visibility expression support via a new security coprocessor – Labels: arbitrary strings – Expressions: Labels joined in boolean expressions – Operators: &, |, !, ( ) secret secret | topsecret ( secret | topsecret ) & !probationary
  • 8. Cell Labels (HBASE-7663) • New admin APIs and new shell commands for label management • The universe of labels and the maximal set of labels for a user are defined up front • Users label cells using visibility expressions • Other users ask for authorizations on Gets and Scans • We build a user’s effective set of authorizations per request in a pluggable way on the server • Scan results are filtered according to the user’s effective authorizations • VisibilityController and AccessController can be used together
  • 9. Transparent Encryption (HBASE-7544) • Transparent encryption of HBase on disk data – HFile blocks are encrypted as written and decrypted as read – Write ahead log (WAL) serialization is pluggable; we provide new secure writers and readers that encrypt and decrypt edits • Built on a new extensible cryptographic codec and key management framework in HBase • Simple key management – Default provider integrates with the Java Keystore • Per column family configuration – Supports schema design that places sensitive information in only a subset of column families
  • 11. Endpoint EXEC Grants (HBASE-6104) • HBase ACLs grant a familiar set of privileges to users and groups: – (R)ead, (W)rite, E(X)excute, (C)reate, (A)dmin • Versions prior to 0.98.0 ignored X • Now access to coprocessor Endpoint invocations can be controlled on a global, per-table, or per-column family basis
  • 13. Our Example Schema • A simple user information table Row Key Column Family: i Column Family: pii uid i:fullname pii:address i:nick pii:phone pii:cc pii:cvv2 pii:expdate > create ‘user’, { NAME => ‘i’, COMPRESSION => ’snappy’, VERSIONS => 10 }, { NAME => ‘pii’, COMPRESSION => ’snappy’, VERSIONS => 10 }
  • 14. Our Example Security Policy • Column family: i
  • 15. Our Example Security Policy • Column family: pii
  • 16. Getting Started • Enable HFile V3 – hfile.format.version=3 • Enable SASL+Kerberos authentication – RPC: Follow the steps in section 8.1 of the online manual: https://hbase.apache.org/book/security.html – ZooKeeper: Follow the steps in section 17.2 of the online manual: https://hbase.apache.org/book/zk.sasl.auth.html • Install security coprocessors – hbase.coprocessor.region.classes= org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.visibility.VisibilityController, org.apache.hadoop.hbase.security.token.TokenProvider
  • 17. Getting Started – hbase.coprocessor.master.classes= org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.visibility.VisibilityController – hbase.coprocessor.regionserver.classes= org.apache.hadoop.hbase.security.access.AccessController • Enable Endpoint exec permission checks – hbase.security.exec.permission.checks=true • [Optional] Enable transport security – hbase.rpc.protection=auth-conf
  • 18. Role-Based Access Control Using the Hadoop Group Mapping Service and ACLs • Map each role in the organization to a LDAP entity – Employee -> • cn=user, member: ou=users,dc=groups, dc=example,dc=org – Developer -> • cn=developer, member: ou=developers,dc=groups,dc=example,dc=org – Test User Account -> • cn=testuser, member: ou=users,dc=example,dc=org – Service Account -> • cn=service, member: ou=services,dc=example,dc=org – Admin -> • cn=manager,dc=example,dc=org
  • 19. Role-Based Access Control Using the Hadoop Group Mapping Service and ACLs • Set up the Hadoop group mapper (core-site.xml) – hadoop.security.group.mapping= org.apache.hadoop.security.LdapGroupsMapping – hadoop.security.group.mapping.ldap.url=… – hadoop.security.group.mapping.ldap.bind.user=… – hadoop.security.group.mapping.ldap.search.filter.user= (& (|(objectclass=person)(objectclass=applicationProcess))(cn={0})) – hadoop.security.group.mapping.ldap.search.filter.group= (objectclass=groupofnames) – hadoop.security.group.mapping.ldap.search.attr.member=member – hadoop.security.group.mapping.ldap.search.attr.group.name=cn
  • 20. Role-Based Access Control Using the Hadoop Group Mapping Service and ACLs • Confirm the configuration is working correctly hbase> whoami service (auth:KERBEROS) groups: services
  • 21. Role-Based Access Control Using the Hadoop Group Mapping Service and ACLs • Grant permissions to groups and service and test accounts hbase> grant '@admins', 'RWXCA' hbase> grant 'service', 'RWXCA', 'user' hbase> grant '@developers', 'RW', 'user', 'i' hbase> grant 'testuser', 'RW', 'user', 'i' hbase> grant 'user', { '@developers' => 'RW', 'testuser' => 'R' }, { COLUMNS => 'pii', FILTER => "(PrefixFilter ('test'))" } Note: Cell grants done by the shell apply to existing cells only. This is useful for testing. In practice applications must add the desired cell ACL to the operation when submitting writes.
  • 22. Role-Based Access Control Using Labels • Define labels corresponding to roles in the security policy admin service test developer
  • 23. Role-Based Access Control Using Labels • Express access rules as visibility expressions admin | service admin | service | test admin | service | developer admin | service | developer | test • Define labels hbase> add_labels [ 'admin', 'service', 'developer', 'test' ]
  • 24. Role-Based Access Control Using Labels • Assign one or more roles to each user by associating their principal with a label set hbase> set_auths 'service', [ 'service' ] hbase> set_auths 'testuser', [ 'test' ] hbase> set_auths 'manager', [ 'admin' ] hbase> set_auths 'dev', [ 'developer' ] hbase> set_auths 'qa', [ 'test', 'developer' ] hbase> …
  • 25. Role-Based Access Control Using Labels • Apply appropriate visibility expressions to cells hbase> set_visibility 'user', 'admin|service|developer', { COLUMNS => 'i' } hbase> set_visibility 'user', 'admin|service', { COLUMNS => ' pii' } hbase> set_visibility 'user', 'admin|service|developer|test', { COLUMNS => [ 'i', 'pii' ], FILTER => "(PrefixFilter ('test'))" } Note: Visibility expressions added to cells by the shell apply to existing cells only. This is useful for testing. In practice applications must add the desired visibility expression to the operation when submitting writes.
  • 26. Attribute-Based Access Control • We can construct the effective authorization set for a user in a pluggable and stackable way ← Retrieves principal for user ← Maps principal to group names ← Imports auths from request ← Enforces minimum auths Auths table ← Maps identity attributes to auths Directory
  • 27. Attribute-Based Access Control • LDAP plugin can mix in auths corresponding to attributes of the subject’s identity – Expected soon in 0.98 (maybe 0.98.4) Query (&(objectClass=person) (userPrincipalName={0})) Attribute Mapping <attribute>: <regex> → <auth> memberOf: .+ -> $1 division: .+ -> $1 department: .+ -> $1 employeeID: P[0-9]+ -> probationary Directory
  • 28. Attribute-Based Access Control Using Labels • Apply appropriate visibility expressions to cells hbase> set_visibility 'user', 'admin|service|(developer&(!probationary))', { COLUMNS => 'i' } hbase> set_visibility 'user', 'admin|service', { COLUMNS => ' pii' } hbase> set_visibility 'user', 'admin|service|((developer|test)&(!probationary))', { COLUMNS => [ 'i', 'pii' ], FILTER => "(PrefixFilter ('test'))" }
  • 29. Attribute-Based Access Control Using ACLs • An area of future work – We could consider a HBase provided replacement for the Hadoop Group Mapper that also supports mapping object attributes to strings – For the VisibilityController, the mapped strings would be interpreted as auths (see slide #27) – For the AccessController, the mapped strings could be interpreted as group names – See HBASE-10919[1] or raise a discussion on user@hbase.apache.org 1. https://issues.apache.org/jira/browse/HBASE-10919
  • 31. Protecting Data At Rest • HBase is deployed into a layered system • Incorrect handling of permissions or storage volumes at the HDFS layer or below could expose sensitive information Apache HBase Apache ZooKeeper ZooKeeper ZooKeeper ZooKeeper Apache Hadoop Distributed File System (HDFS) DataNode MasterMaster (Standby) RegionServer DataNode DataNode DataNode DataNode RegionServer RegionServer RegionServer RegionServer
  • 32. Getting Started • Create the cluster master key in a KeyStore file $ keytool -keystore hbase.jks -storetype jceks –genseckey -keyalg AES -keysize 128 -storepass secret -alias hbase-master-default • Deploy the KeyStore file to all site configuration directories and restrict local access to it $ chown hbase:hbase hbase.jks $ chmod 0600 hbase.jks (-rw-------) • Enable HFile V3 – hfile.format.version=3
  • 33. Getting Started • Set up key provider configuration for KeyStore files – hbase.crypto.keyprovider= org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider – hbase.crypto.keyprovider.parameters= jceks:///path/to/hbase/conf/hbase.jks?password=secret – hbase.crypto.master.key.name=hbase-master-default • Restrict local access to the site file $ chown hbase:hbase hbase-site.xml $ chmod 0600 hbase-site.xml (-rw-------) • The KeyStore password need not be embedded in the site file – Use ?passwordFile=/path/to/password/file and protect that instead
  • 34. Getting Started • Enable WAL encryption – hbase.crypto.wal.key.name=hbase-master-default – hbase.regionserver.hlog.reader.impl= org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader – hbase.regionserver.hlog.writer.impl= org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter – hbase.regionserver.wal.encryption=true WAL encryption is configured separately from HFile encryption to enable storage management with tiered sensitivity • (JRE 8+) Enable AES-NI acceleration features – Add to hbase-env.sh: – XX:+UseAES –XX:+UseAESIntrinsics
  • 35. Transparent Encryption • Segregate sensitive information into one or a few column families with HFile encryption enabled – We are storing sensitive personally identifiable customer information in the “pii” family – Enable encryption on “pii” only to mitigate performance impact – After changing schema, run a major compaction to insure all files are (eventually) transformed hbase> disable 'user' hbase> alter 'user', { NAME => 'pii', COMPRESSION => 'snappy', ENCRYPTION => 'aes' } hbase> enable 'user' hbase> major_compact 'user' Row Key Column Family: i Column Family: pii uid i:fullname pii:address i:nick pii:phone pii:cc pii:cvv2 pii:expdate
  • 36. Transparent Encryption • Data key management – RegionServers retrieve and unwrap CF keys from descriptors as needed to encrypt HFiles – The data key for a CF can be modified at any time by the admin • Or, encryption can be enabled and disabled entirely • CF encryption is completely reversible! – HFiles contain the data key used for encryption, wrapped (encrypted) by the master key • Supports incremental rekeying without expensive IO or downtime – Simply trigger major compaction to normalize encryption and data keying state over the entire CF • Can be done on a region by region basis with a HBase shell script
  • 37. Transparent Encryption • Master key rotation – Should be an infrequent operation, an attacker able to observe even all schema and HFiles gains very little information about it over time – Store a copy of the current master key with an alternate alias e.g. “hbase-master-alt” – Replace the master key with a new one – Update site file • hbase.crypto.master.alternate.key.name=hbase-master-alt – Do a rolling restart of all HBase server processes – Trigger a major compaction and wait for completion – Remove the old master key from the KMS and remove alt alias from site – Do another rolling restart of all HBase server processes
  • 38. Key Providers • Any Key Management System with a Java KeyStore provider can be supported by the KeyStoreKeyProvider • Or natively, via custom HBase KeyProviders • Update site configuration hbase.crypto.keyprovider hbase.crypto.keyprovider.parameters HBase KeyStoreKeyProvider HBase YourKeyProvider JDK KeyStore provider framework Thales Luna CloudHSM . . .
  • 39. Cipher Providers • We support alternate or accelerated ciphers with either: 1. Java Cryptography (JCE) algorithm provider • Install a signed JCE provider (supporting “AES/CTR/NoPadding” mode with 128 bit keys) • Add it with highest preference to the JCE site configuration file $JAVA_HOME/lib/security/java.security • Update site configuration hbase.crypto.algorithm.aes.provider hbase.crypto.algorithm.rng.provider 2. Custom HBase Cipher implementation • Start at org.apache.hadoop.hbase.io.crypto.CipherProvider • Make it available on the server classpath • Update site configuration hbase.crypto.cipherprovider
  • 41. WAL Encryption • Performance implications of WAL encryption – As measured by HLogPerformanceEvaluation microbenchmark – Relative differences are what is interesting – WAL throughput ceiling ~10% lower with 7u45 – ~8% lower with 8u20 • Future mitigation: When HDFS storage tiering capability is in production, configure separate storage tiers for WAL and HFile data Test Throughput ops/sec Total cycles Insns per cycle Oracle Java 1.7.0_45-b18 - None 52658.302 8878179986750 0.47 Oracle Java 1.7.0_45-b18 - AES WAL encryption 48045.834 9911748458387 0.57 OpenJDK 1.8.0_20-b09 - None 54874.125 8662634367005 0.46 OpenJDK 1.8.0_20-b09 - AES WAL encryption 50659.507 9668111259270 0.61
  • 42. Promoting Common ACLs • When designing security policy for a table, consider that table and column family level grants are inexpensive compared to cell level grants – Table and CF level grants are cached in memory – Cell level grants require region scanning • We consider permissions as the union of grants at all levels; a table or CF grant allows us to early out • If a user will always be granted permissions at the cell level, promote their access to a column family or table level grant