SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Key Management in Accumulo
for Encryption at Rest
Anthony Young-Garner
2© Cloudera, Inc. All rights reserved.
Past and future threats, a refresher
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven
(User)
Jo
(User)
Molly
(Network Admin)
William
(Accumulo Admin)
Halim
(HDFS Admin)
Trusted Zone (implicit)
IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
3© Cloudera, Inc. All rights reserved.
This is not theoretical (1 of 3)
[accumulo@secure-2 lib]# accumulo shell -u root
Password: **************
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0-cdh5.1.4
- instance name: accumulo
- instance id: cce72c83-826a-41bf-a11f-a8aecebeebaf
-
- type 'help' for a list of available commands
-
root@accumulo table1> scan -s public,private
alice properties:age [public] 48
alice properties:ssn [private] 123-45-6789
bob properties:age [public] 51
bob properties:ssn [private] 231-32-6789
root@accumulo table1> quit
Accumulo user with proper
visibility authorizations
accessing data.
4© Cloudera, Inc. All rights reserved.
This is not theoretical (2 of 3)
[hdfs@secure-2 ~]$ hadoop distcp 
hdfs://secure-1:8020/accumulo/tables/3/default_tablet/F000018w.rf 
hdfs:// insecure-1:8020/tmp/table1_export_dest/
HDFS administrator copying
RFile from a cluster on which
he has no privileges to one on
which he does.
5© Cloudera, Inc. All rights reserved.
This is not theoretical (3 of 3)
[root@insecure-5 ~]# accumulo shell -u root
Password: **************
Shell - Apache Accumulo Interactive Shell
-
- version: 1.6.0-cdh5.1.4
- instance name: accumulo
- instance id: ebfe2e64-ba12-4231-8261-3a89115046ed
-
- type 'help' for a list of available commands
-
root@accumulo> importtable table1_copy /tmp/table1_export_dest
root@accumulo table1> setauths -u root -s public,private
root@accumulo> scan -t table1_copy -s public,private
alice properties:age [public] 48
alice properties:ssn [private] 123-45-6789
bob properties:age [public] 51
bob properties:ssn [private] 231-32-6789
root@accumulo> quit
HDFS admin reading
unauthorized data.
6© Cloudera, Inc. All rights reserved.
Past and future threats, where we left off
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven
(User)
Jo
(User)
Molly
(Network Admin)
William
(Accumulo Admin)
Halim
(HDFS Admin)
Trusted Zone (implicit)
IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
7© Cloudera, Inc. All rights reserved.
Accumulo SecretKeyEncryptionStrategy
• Accumulo encryption at rest encrypts each RFile and WAL file with a data
encryption key (DEK)
• Data encryption keys are encrypted with a key encryption key (KEK)
• Data is secure at rest and in transit
• Key encryption key is stored in HDFS (default implementation)
See Michael Allen's "Past and Future Threats: Encryption and Security in Accumulo"
presentation from Accumulo Summit 2014 for more detail on message encryption (SSL)
and data encryption support in Accumulo 1.6
http://accumulosummit.com/archives/2014/program/talks/
8© Cloudera, Inc. All rights reserved.
Enabling Accumulo encryption - accumulo-site.xml
9© Cloudera, Inc. All rights reserved.
Access attempt thwarted
[root@insecure-5 ~]# accumulo shell -u root
root@accumulo> importtable table1_copy /tmp/table1_export_dest
root@accumulo> scan -t table1_copy -s public,private
2015-04-18 22:45:15,282 [shell.Shell] ERROR: java.lang.RuntimeException:
org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server insecure-5.vpc.cloudera.com:10011
root@accumulo> quit
HDFS admin attempt to read
unauthorized data fails.
10© Cloudera, Inc. All rights reserved.
Data access threats summarized
Vector Protection mechanism
Unauthorized users Visibility labels
Network administrator Thrift/SSL
HDFS administrator Accumulo encryption at rest
Misconfiguration All of the above
11© Cloudera, Inc. All rights reserved.
Not so fast!
hdfs@secure-2 ~]$ hadoop distcp 
hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey 
hdfs:// insecure-4:8020/accumulo/accumulo/crypto/secret/keyEncryptionKey
HDFS admin can copy the
Accumulo key encryption key!
12© Cloudera, Inc. All rights reserved.
Current threats: nearly back where we started?!
Accumulo Cluster HDFS ClusterClient Machines
Zookeeper Cluster
Sven
(User)
Jo
(User)
Molly
(Network Admin)
William
(Accumulo Admin)
Halim
(HDFS Admin)
Trusted Zone (implicit)
13© Cloudera, Inc. All rights reserved.
An interlude: HDFS transparent encryption at rest
• Data in encryption zones is transparently encrypted by HDFS client
• Secure at rest and in transit
• Prevents attacks at HDFS, FS and OS levels
• Key management is independent of HDFS
• Designed for performance, scalability, compartmentalization and compatibility
• Keys are stored by Hadoop Key Management Service (KMS)
• Proxy between key store and HDFS encryption subsystems on HDFS client/server
14© Cloudera, Inc. All rights reserved.
HDFS encryption, simple version
HDFS
client
HDFS Cluster
HDFS
Data Node
HDFS
Name Node
REST/H
TTP
Hadoop Key
Provider API
1. User or
process creates
key (KEK)
HDFS File
HDFS File
HDFS File
HDFS File
File metadata
File metadata
File metadata
2. HDFS admin
creates
encryption zone.
Associates
empty directory
and a KEK.
3. User or process initiates
read/write to file in
encryption zone
Hadoop KMS
15© Cloudera, Inc. All rights reserved.
HDFS encryption, name node actions
HDFS
client
HDFS
Data Node
HDFS
Name Node
Hadoop KMS
REST/H
TTP
Hadoop Key
Provider API
HDFS File
HDFS File
HDFS File
HDFS File
File metadata
File metadata
File metadata
3. User or process initiates read/write to
file in encryption zone
5. Name node returns
file stream and
encrypted key to client 4. On file creation, name
node requests encrypted
data encryption key
(EDEK) from KMS. EDEK
is stored with file
metadata on Name
Node.
HDFS Cluster
16© Cloudera, Inc. All rights reserved.
HDFS encryption, client actions
HDFS
client
HDFS
Data Node
HDFS
Name Node
REST/H
TTP
Hadoop Key
Provider API
HDFS File
HDFS File
HDFS File
HDFS File
File metadata
File metadata
File metadata
6. Client requests decrypted DEK from KMS
7. KMS uses KEK to
decrypt DEK and
returns decrypted DEK
to client.
8. Client uses DEK
to read/write
encrypted data
to/from stream.
HDFS Cluster
Hadoop KMS
17© Cloudera, Inc. All rights reserved.
Hadoop KMS: what's in the black orange box?
REST/H
TTPS
Hadoop Key
Provider API
Hadoop KMS
• Hadoop Key Management Server is a proxy between KMS clients
and a backing key store
• Default store is a Java key store file
• Implementations for full-featured key servers with support for
Hardware Security Module (HSM) integration available today
• HSM integration moves the root of trust to the HSM
• Provides a unified API and scalability
• Configurable caching support
• Provides key lifecycle management (create, delete, roll, etc.)
• Provides a broad set of access control capabilities
• Per-user ACL configuration for access to KMS
• Per-key ACL configuration for access to specific keys
• Strong authentication via Kerberos support
• Full featured hadoop shell command line provided
18© Cloudera, Inc. All rights reserved.
Hadoop KMS ACL example: blacklisting hdfs admin
19© Cloudera, Inc. All rights reserved.
Finally, HDFS admin is truly blocked
hdfs@secure-2 ~]$ hadoop distcp 
hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey 
hdfs:// insecure-4:8020/accumulo/crypto/secret/keyEncryptionKey
15/04/18 22:41:09 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
...
15/04/18 22:41:10 ERROR util.RetriableCommand: Failure in Retriable command: Copying hdfs://secure-
1.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKey to hdfs://insecure-
4.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKey
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
org.apache.hadoop.security.authorize.AuthorizationException: User:hdfs not allowed to do 'DECRYPT_EEK'
on 'accumulo-key'
HDFS admin can no longer
copy the Accumulo key
encryption key!
20© Cloudera, Inc. All rights reserved.
Well, mostly blocked...
[hdfs@secure-2 ~]$ hadoop distcp 
hdfs://secure-1:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey 
hdfs:// insecure-4:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey
04/18 22:43:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/04/18 22:43:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
...
15/04/18 22:43:53 INFO mapreduce.Job: Job job_local2004063696_0001 completed successfully
The /.reserved/raw virtual path allows HDFS admins
to perform distcp operations but the data is moved
in its encrypted form. No decryption occurs.
21© Cloudera, Inc. All rights reserved.
Accumulo SecretKeyEncryptionStrategy revisited
• Accumulo encryption at rest encrypts each RFile with a data encryption key (DEK)
• Data encryption keys are encrypted with a key encryption key (KEK)
• Data is secure at rest and in transit
• Key encryption key is stored in HDFS
• Options to protect the Accumulo KEK
• Leverage HDFS encryption in Hadoop 2.6
• Default SecretKeyEncryptionStrategy with HDFS encryption
• Accumulo on HDFS encryption
• Custom SecretKeyEncryptionStrategy
22© Cloudera, Inc. All rights reserved.
Using HDFS encryption to protect the Accumulo KEK via the Hadoop KMS
Accumulo Cluster
HDFS Cluster
Zookeeper Cluster
William
(Accumulo Admin)
Halim
(HDFS Admin)
Trusted Zone (implicit)
Hadoop KMS
23© Cloudera, Inc. All rights reserved.
Moving Accumulo KEK to an encryption zone
# sudo -u accumulo hadoop key create accumulo-key
accumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128,
description='null', attributes=null}.
KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated.
# sudo -u accumulo hadoop fs -mv /accumulo/crypto/secret /accumulo/crypto/secret-tmp
# sudo –u hdfs hadoop fs –mkdir –p /accumulo/crypto/secret
# sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo/crypto/secret
# sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo/crypto/secret
Added encryption zone /accumulo/crypto/secret
# sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo/crypto/secret-tmp 
/accumulo/crypto/secret
# sudo -u accumulo hadoop fs -rm –r /accumulo/crypto/secret-tmp
Deleted /accumulo/crypto/secret
Creating KEK in an existing
encryption zone is much
simpler.
24© Cloudera, Inc. All rights reserved.
Tradeoffs of hybrid approach (Accumulo KEK + KMS)
Pros
• Least effort path forward
• Minimal operational risk
• Minimal Accumulo downtime
• Allows gentle adoption of
HDFS encryption and Hadoop
KMS
• Leverage nearly all
administrative capabilities of
Hadoop KMS
Cons
• Accumulo 1.6 encryption at
rest supports rfiles and write-
ahead logs, but not yet
recovered write-ahead logs
• Current implementation and
framework is experimental
25© Cloudera, Inc. All rights reserved.
Using HDFS encryption to protect the Accumulo directory directly
Accumulo Cluster
HDFS Cluster
Zookeeper Cluster
William
(Accumulo Admin)
Halim
(HDFS Admin)
Trusted Zone (implicit)
Hadoop KMS
26© Cloudera, Inc. All rights reserved.
Moving Accumulo directory to an encryption zone
# sudo -u accumulo hadoop key create accumulo-key
accumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128,
description='null', attributes=null}.
KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated.
# sudo -u accumulo hadoop fs -mv /accumulo /accumulo-tmp
# sudo –u hdfs hadoop fs –mkdir /accumulo
# sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo
# sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo
Added encryption zone /accumulo
# sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo-tmp /accumulo
# sudo -u hdfs hadoop fs -rm –r /accumulo-tmp
Deleted /accumulo
Stop tablet servers before
moving data directory.
27© Cloudera, Inc. All rights reserved.
Tradeoffs of full HDFS encryption approach
Pros
• Least effort path forward
• HDFS encryption and KMS can
be leveraged by multiple
services (skill re-use and
operational efficiency)
• HDFS encryption and KMS are
generally available
• Leverage all administrative
capabilities of HDFS encryption
and Hadoop KMS
Cons
• Moderate operational risk
(see HBase)
• Accumulo downtime during data move
• Possible operational performance impact
28© Cloudera, Inc. All rights reserved.
Other options
• Custom SecretKeyEncryptionStrategy
• Tighter connection to core Accumulo functionality and release cycle
• Support arbitrary key servers
• But it's easy to get the details of both encryption and key management wrong
• Arbitrary key server support can also be developed via a custom key provider for
the Hadoop KMS
• Native KMS SecretKeyEncryptionStrategy
• Leverage administrative functions of KMS without relying on HDFS encryption
29© Cloudera, Inc. All rights reserved.
Thank you. Let's talk about keys!

More Related Content

PPTX
How to configure a hive high availability connection with zeppelin
PDF
Freeradius edir
PPTX
How to create a secured cloudera cluster
PPTX
How to create a multi tenancy for an interactive data analysis
PPTX
DevOpsDays InSpec Workshop
PPTX
Container security
PDF
Getting Started With Your Virtual Dedicated Server
PPTX
Server hardening
How to configure a hive high availability connection with zeppelin
Freeradius edir
How to create a secured cloudera cluster
How to create a multi tenancy for an interactive data analysis
DevOpsDays InSpec Workshop
Container security
Getting Started With Your Virtual Dedicated Server
Server hardening

What's hot (20)

PDF
Oracle linux kube
PDF
Continuous Security: From tins to containers - now what!
PDF
Conf2015 d waddle_defense_pointsecurity_deploying_splunksslbestpractices
PDF
Code Factory avec GitLab CI et Rancher
PDF
Hardening Linux and introducing Securix Linux
PDF
Getting Started with Redis
PDF
CentOS Linux Server Hardening
PDF
CSACSGuide-SAMPLE
PDF
Security on a Container Platform
DOCX
Component pack 6006 install guide
PDF
Linux Hardening - nullhyd
ODP
Deploy Mediawiki Using FIWARE Lab Facilities
PPTX
Simple docker hosting in FIWARE Lab
DOCX
Linux Server Hardening - Steps by Steps
ODT
RHCE FINAL Questions and Answers
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
PPTX
Automated out-of-band management with Ansible and Redfish
DOC
Enterprise Manager Cloud Control 12c Release 4 - Installation
PPSX
Linux configer
PDF
Ef09 installing-alfresco-components-1-by-1
Oracle linux kube
Continuous Security: From tins to containers - now what!
Conf2015 d waddle_defense_pointsecurity_deploying_splunksslbestpractices
Code Factory avec GitLab CI et Rancher
Hardening Linux and introducing Securix Linux
Getting Started with Redis
CentOS Linux Server Hardening
CSACSGuide-SAMPLE
Security on a Container Platform
Component pack 6006 install guide
Linux Hardening - nullhyd
Deploy Mediawiki Using FIWARE Lab Facilities
Simple docker hosting in FIWARE Lab
Linux Server Hardening - Steps by Steps
RHCE FINAL Questions and Answers
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Automated out-of-band management with Ansible and Redfish
Enterprise Manager Cloud Control 12c Release 4 - Installation
Linux configer
Ef09 installing-alfresco-components-1-by-1
Ad

Similar to Accumulo Summit 2015: Attempting to answer unanswerable questions: Key management in Accumulo for Encryption at Rest [Security] (20)

PDF
Advanced Security In Hadoop Cluster
PDF
Охота на уязвимости Hadoop
PPT
How to Protect Big Data in a Containerized Environment
PDF
Docker Security Deep Dive by Ying Li and David Lawrence
PPTX
Overview of HDFS Transparent Encryption
PPTX
Risk Management for Data: Secured and Governed
PPTX
EPiServer Deployment Tips & Tricks
DOCX
Hybrid Cloud Approach for Secure Authorized Deduplication
PDF
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
PDF
Docker Security Paradigm
PPT
les_02.ppt of the Oracle course train_2 file
PDF
How Secure Is Your Container? ContainerCon Berlin 2016
PDF
[Wroclaw #9] The purge - dealing with secrets in Opera Software
PPTX
Docker Security workshop slides
PPTX
Deploying FuseMQ with Fuse Fabric
PDF
Security Tips to run Docker in Production
PPTX
Adventures in Underland: Is encryption solid as a rock or a handful of dust?
ODP
SFS (Secure File System)
PPTX
Securing Containers - Sathyajit Bhat - Adobe
PDF
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Advanced Security In Hadoop Cluster
Охота на уязвимости Hadoop
How to Protect Big Data in a Containerized Environment
Docker Security Deep Dive by Ying Li and David Lawrence
Overview of HDFS Transparent Encryption
Risk Management for Data: Secured and Governed
EPiServer Deployment Tips & Tricks
Hybrid Cloud Approach for Secure Authorized Deduplication
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
Docker Security Paradigm
les_02.ppt of the Oracle course train_2 file
How Secure Is Your Container? ContainerCon Berlin 2016
[Wroclaw #9] The purge - dealing with secrets in Opera Software
Docker Security workshop slides
Deploying FuseMQ with Fuse Fabric
Security Tips to run Docker in Production
Adventures in Underland: Is encryption solid as a rock or a handful of dust?
SFS (Secure File System)
Securing Containers - Sathyajit Bhat - Adobe
Grâce aux tags Varnish, j'ai switché ma prod sur Raspberry Pi
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)
Modernizing your data center with Dell and AMD

Accumulo Summit 2015: Attempting to answer unanswerable questions: Key management in Accumulo for Encryption at Rest [Security]

  • 1. 1© Cloudera, Inc. All rights reserved. Key Management in Accumulo for Encryption at Rest Anthony Young-Garner
  • 2. 2© Cloudera, Inc. All rights reserved. Past and future threats, a refresher Accumulo Cluster HDFS ClusterClient Machines Zookeeper Cluster Sven (User) Jo (User) Molly (Network Admin) William (Accumulo Admin) Halim (HDFS Admin) Trusted Zone (implicit) IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
  • 3. 3© Cloudera, Inc. All rights reserved. This is not theoretical (1 of 3) [accumulo@secure-2 lib]# accumulo shell -u root Password: ************** Shell - Apache Accumulo Interactive Shell - - version: 1.6.0-cdh5.1.4 - instance name: accumulo - instance id: cce72c83-826a-41bf-a11f-a8aecebeebaf - - type 'help' for a list of available commands - root@accumulo table1> scan -s public,private alice properties:age [public] 48 alice properties:ssn [private] 123-45-6789 bob properties:age [public] 51 bob properties:ssn [private] 231-32-6789 root@accumulo table1> quit Accumulo user with proper visibility authorizations accessing data.
  • 4. 4© Cloudera, Inc. All rights reserved. This is not theoretical (2 of 3) [hdfs@secure-2 ~]$ hadoop distcp hdfs://secure-1:8020/accumulo/tables/3/default_tablet/F000018w.rf hdfs:// insecure-1:8020/tmp/table1_export_dest/ HDFS administrator copying RFile from a cluster on which he has no privileges to one on which he does.
  • 5. 5© Cloudera, Inc. All rights reserved. This is not theoretical (3 of 3) [root@insecure-5 ~]# accumulo shell -u root Password: ************** Shell - Apache Accumulo Interactive Shell - - version: 1.6.0-cdh5.1.4 - instance name: accumulo - instance id: ebfe2e64-ba12-4231-8261-3a89115046ed - - type 'help' for a list of available commands - root@accumulo> importtable table1_copy /tmp/table1_export_dest root@accumulo table1> setauths -u root -s public,private root@accumulo> scan -t table1_copy -s public,private alice properties:age [public] 48 alice properties:ssn [private] 123-45-6789 bob properties:age [public] 51 bob properties:ssn [private] 231-32-6789 root@accumulo> quit HDFS admin reading unauthorized data.
  • 6. 6© Cloudera, Inc. All rights reserved. Past and future threats, where we left off Accumulo Cluster HDFS ClusterClient Machines Zookeeper Cluster Sven (User) Jo (User) Molly (Network Admin) William (Accumulo Admin) Halim (HDFS Admin) Trusted Zone (implicit) IMAGE DESIGN CREDIT: MICHAEL ALLEN, SEE SLIDE 7
  • 7. 7© Cloudera, Inc. All rights reserved. Accumulo SecretKeyEncryptionStrategy • Accumulo encryption at rest encrypts each RFile and WAL file with a data encryption key (DEK) • Data encryption keys are encrypted with a key encryption key (KEK) • Data is secure at rest and in transit • Key encryption key is stored in HDFS (default implementation) See Michael Allen's "Past and Future Threats: Encryption and Security in Accumulo" presentation from Accumulo Summit 2014 for more detail on message encryption (SSL) and data encryption support in Accumulo 1.6 http://accumulosummit.com/archives/2014/program/talks/
  • 8. 8© Cloudera, Inc. All rights reserved. Enabling Accumulo encryption - accumulo-site.xml
  • 9. 9© Cloudera, Inc. All rights reserved. Access attempt thwarted [root@insecure-5 ~]# accumulo shell -u root root@accumulo> importtable table1_copy /tmp/table1_export_dest root@accumulo> scan -t table1_copy -s public,private 2015-04-18 22:45:15,282 [shell.Shell] ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server insecure-5.vpc.cloudera.com:10011 root@accumulo> quit HDFS admin attempt to read unauthorized data fails.
  • 10. 10© Cloudera, Inc. All rights reserved. Data access threats summarized Vector Protection mechanism Unauthorized users Visibility labels Network administrator Thrift/SSL HDFS administrator Accumulo encryption at rest Misconfiguration All of the above
  • 11. 11© Cloudera, Inc. All rights reserved. Not so fast! hdfs@secure-2 ~]$ hadoop distcp hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey hdfs:// insecure-4:8020/accumulo/accumulo/crypto/secret/keyEncryptionKey HDFS admin can copy the Accumulo key encryption key!
  • 12. 12© Cloudera, Inc. All rights reserved. Current threats: nearly back where we started?! Accumulo Cluster HDFS ClusterClient Machines Zookeeper Cluster Sven (User) Jo (User) Molly (Network Admin) William (Accumulo Admin) Halim (HDFS Admin) Trusted Zone (implicit)
  • 13. 13© Cloudera, Inc. All rights reserved. An interlude: HDFS transparent encryption at rest • Data in encryption zones is transparently encrypted by HDFS client • Secure at rest and in transit • Prevents attacks at HDFS, FS and OS levels • Key management is independent of HDFS • Designed for performance, scalability, compartmentalization and compatibility • Keys are stored by Hadoop Key Management Service (KMS) • Proxy between key store and HDFS encryption subsystems on HDFS client/server
  • 14. 14© Cloudera, Inc. All rights reserved. HDFS encryption, simple version HDFS client HDFS Cluster HDFS Data Node HDFS Name Node REST/H TTP Hadoop Key Provider API 1. User or process creates key (KEK) HDFS File HDFS File HDFS File HDFS File File metadata File metadata File metadata 2. HDFS admin creates encryption zone. Associates empty directory and a KEK. 3. User or process initiates read/write to file in encryption zone Hadoop KMS
  • 15. 15© Cloudera, Inc. All rights reserved. HDFS encryption, name node actions HDFS client HDFS Data Node HDFS Name Node Hadoop KMS REST/H TTP Hadoop Key Provider API HDFS File HDFS File HDFS File HDFS File File metadata File metadata File metadata 3. User or process initiates read/write to file in encryption zone 5. Name node returns file stream and encrypted key to client 4. On file creation, name node requests encrypted data encryption key (EDEK) from KMS. EDEK is stored with file metadata on Name Node. HDFS Cluster
  • 16. 16© Cloudera, Inc. All rights reserved. HDFS encryption, client actions HDFS client HDFS Data Node HDFS Name Node REST/H TTP Hadoop Key Provider API HDFS File HDFS File HDFS File HDFS File File metadata File metadata File metadata 6. Client requests decrypted DEK from KMS 7. KMS uses KEK to decrypt DEK and returns decrypted DEK to client. 8. Client uses DEK to read/write encrypted data to/from stream. HDFS Cluster Hadoop KMS
  • 17. 17© Cloudera, Inc. All rights reserved. Hadoop KMS: what's in the black orange box? REST/H TTPS Hadoop Key Provider API Hadoop KMS • Hadoop Key Management Server is a proxy between KMS clients and a backing key store • Default store is a Java key store file • Implementations for full-featured key servers with support for Hardware Security Module (HSM) integration available today • HSM integration moves the root of trust to the HSM • Provides a unified API and scalability • Configurable caching support • Provides key lifecycle management (create, delete, roll, etc.) • Provides a broad set of access control capabilities • Per-user ACL configuration for access to KMS • Per-key ACL configuration for access to specific keys • Strong authentication via Kerberos support • Full featured hadoop shell command line provided
  • 18. 18© Cloudera, Inc. All rights reserved. Hadoop KMS ACL example: blacklisting hdfs admin
  • 19. 19© Cloudera, Inc. All rights reserved. Finally, HDFS admin is truly blocked hdfs@secure-2 ~]$ hadoop distcp hdfs://secure-1:8020/accumulo/crypto/secret/keyEncryptionKey hdfs:// insecure-4:8020/accumulo/crypto/secret/keyEncryptionKey 15/04/18 22:41:09 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 ... 15/04/18 22:41:10 ERROR util.RetriableCommand: Failure in Retriable command: Copying hdfs://secure- 1.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKey to hdfs://insecure- 4.vpc.cloudera.com:8020/accumulo/crypto/secret/keyEncryptionKey org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: org.apache.hadoop.security.authorize.AuthorizationException: User:hdfs not allowed to do 'DECRYPT_EEK' on 'accumulo-key' HDFS admin can no longer copy the Accumulo key encryption key!
  • 20. 20© Cloudera, Inc. All rights reserved. Well, mostly blocked... [hdfs@secure-2 ~]$ hadoop distcp hdfs://secure-1:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey hdfs:// insecure-4:8020/.reserved/raw/accumulo/crypto/secret/keyEncryptionKey 04/18 22:43:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null 15/04/18 22:43:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 ... 15/04/18 22:43:53 INFO mapreduce.Job: Job job_local2004063696_0001 completed successfully The /.reserved/raw virtual path allows HDFS admins to perform distcp operations but the data is moved in its encrypted form. No decryption occurs.
  • 21. 21© Cloudera, Inc. All rights reserved. Accumulo SecretKeyEncryptionStrategy revisited • Accumulo encryption at rest encrypts each RFile with a data encryption key (DEK) • Data encryption keys are encrypted with a key encryption key (KEK) • Data is secure at rest and in transit • Key encryption key is stored in HDFS • Options to protect the Accumulo KEK • Leverage HDFS encryption in Hadoop 2.6 • Default SecretKeyEncryptionStrategy with HDFS encryption • Accumulo on HDFS encryption • Custom SecretKeyEncryptionStrategy
  • 22. 22© Cloudera, Inc. All rights reserved. Using HDFS encryption to protect the Accumulo KEK via the Hadoop KMS Accumulo Cluster HDFS Cluster Zookeeper Cluster William (Accumulo Admin) Halim (HDFS Admin) Trusted Zone (implicit) Hadoop KMS
  • 23. 23© Cloudera, Inc. All rights reserved. Moving Accumulo KEK to an encryption zone # sudo -u accumulo hadoop key create accumulo-key accumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128, description='null', attributes=null}. KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated. # sudo -u accumulo hadoop fs -mv /accumulo/crypto/secret /accumulo/crypto/secret-tmp # sudo –u hdfs hadoop fs –mkdir –p /accumulo/crypto/secret # sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo/crypto/secret # sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo/crypto/secret Added encryption zone /accumulo/crypto/secret # sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo/crypto/secret-tmp /accumulo/crypto/secret # sudo -u accumulo hadoop fs -rm –r /accumulo/crypto/secret-tmp Deleted /accumulo/crypto/secret Creating KEK in an existing encryption zone is much simpler.
  • 24. 24© Cloudera, Inc. All rights reserved. Tradeoffs of hybrid approach (Accumulo KEK + KMS) Pros • Least effort path forward • Minimal operational risk • Minimal Accumulo downtime • Allows gentle adoption of HDFS encryption and Hadoop KMS • Leverage nearly all administrative capabilities of Hadoop KMS Cons • Accumulo 1.6 encryption at rest supports rfiles and write- ahead logs, but not yet recovered write-ahead logs • Current implementation and framework is experimental
  • 25. 25© Cloudera, Inc. All rights reserved. Using HDFS encryption to protect the Accumulo directory directly Accumulo Cluster HDFS Cluster Zookeeper Cluster William (Accumulo Admin) Halim (HDFS Admin) Trusted Zone (implicit) Hadoop KMS
  • 26. 26© Cloudera, Inc. All rights reserved. Moving Accumulo directory to an encryption zone # sudo -u accumulo hadoop key create accumulo-key accumulo-key has been successfully created with options Options{cipher='AES/CTR/NoPadding', bitLength=128, description='null', attributes=null}. KMSClientProvider[http://secure-3.vpc.cloudera.com:16000/kms/v1/] has been updated. # sudo -u accumulo hadoop fs -mv /accumulo /accumulo-tmp # sudo –u hdfs hadoop fs –mkdir /accumulo # sudo –u hdfs hadoop fs –chown accumulo:accumulo /accumulo # sudo -u hdfs hdfs crypto -createZone -keyName accumulo-key -path /accumulo Added encryption zone /accumulo # sudo -u hdfs hadoop distcp -pugpx -skipcrccheck -update /accumulo-tmp /accumulo # sudo -u hdfs hadoop fs -rm –r /accumulo-tmp Deleted /accumulo Stop tablet servers before moving data directory.
  • 27. 27© Cloudera, Inc. All rights reserved. Tradeoffs of full HDFS encryption approach Pros • Least effort path forward • HDFS encryption and KMS can be leveraged by multiple services (skill re-use and operational efficiency) • HDFS encryption and KMS are generally available • Leverage all administrative capabilities of HDFS encryption and Hadoop KMS Cons • Moderate operational risk (see HBase) • Accumulo downtime during data move • Possible operational performance impact
  • 28. 28© Cloudera, Inc. All rights reserved. Other options • Custom SecretKeyEncryptionStrategy • Tighter connection to core Accumulo functionality and release cycle • Support arbitrary key servers • But it's easy to get the details of both encryption and key management wrong • Arbitrary key server support can also be developed via a custom key provider for the Hadoop KMS • Native KMS SecretKeyEncryptionStrategy • Leverage administrative functions of KMS without relying on HDFS encryption
  • 29. 29© Cloudera, Inc. All rights reserved. Thank you. Let's talk about keys!

Editor's Notes

  • #2: Accumulo 1.6 introduced support for encryption in transit and at rest. These capabilities increase the protection Accumulo provides against the threat of unauthorized data access. Michael Allen covered these features and topics in a presentation at last year's Accumulo Summit. However, at that point in time, there were some unanswered questions around key management. Today, I'd like to highlight where advances in the underlying Hadoop platform and HDFS start to provide some clearer choices and better answers to these questions.
  • #3: Last year, Michael Allen introduced us to a set of threat vectors and a cast of characters. As a reminder, Accumulo's built-in access control mechanism prevent unauthorized data access to users and processes participating in the Accumulo processing paths. However, the design of the system also creates an implicit zone of trust composed of users and processes that are not participants in the Accumulo processing paths but who do have visibility into portions of those processing paths. In particular, there are user roles with administrator privileges within the larger data storage and network environments in which Accumulo operates who can see data in the clear over the wire between Accumulo clients and servers, and RFiles and WAL files at rest on HDFS. Thus, whether by side effect or intention, these user roles must be considered trusted users by the system unless steps can be taken to push them out of the Trusted Zone.
  • #4: The threat is not theoretical. Here we see a properly authorized accumulo user accessing table data on what is intended to be a secure production cluster. Within accumulo, the data is protected from unauthorized access by visibility rules defined with identity theft rather than age discrimination in mind. But regardless of the naming choices made in defining the visibility labels, the table data is only visible to authenticated users with specific authorization for these specific labels (public and private). Another accumulo user who has not been specifically authorized to view data with the public visibility label will not be able to see the age data in table1 and an accumulo user who has not been specifically authorized to view data with the private visibility label will not be able to see the social security number data in table 1.
  • #5: But whatever are the niceties of privacy, our HDFS administrator is really curious to know Alice's age. And after all, if he really wasn't supposed to see the data on the secure cluster it wouldn't be so easy for him to copy it to an insecure cluster over which he has full control.
  • #6: Once the unencrypted data is on an insecure cluster under the curious HDFS administrator's control, he can give himself whatever privileges he needs to view the data. Proper auditing might alert others to the HDFS administrator's actions. We and other vendors offer auditing functionality that, properly configured, would detect, report and possibly even send alerts in response to the suspect distcp invocation. But even so, detection after the fact does not change the fact of the unauthorized data access and therefore the only available actions at that point are reactions and damage control. And this is the least sophisticated method. An OS administrator with rights on the underlying file system could copy from the local filesystem rather than using distcp and use a hexdump to read the data. A network administrator might sniff the data off the wire while it is in transit between Accumulo client and server or between servers in the cluster.
  • #7: The encryption capabilities in Accumulo 1.6 addresses these threats. The introduction of SSL encryption between Accumulo clients and servers and between Accumulo servers and servers pushes those with network sniffing privileges out of the implicit zone of trust. And introducing encryption of Accumulo data before writing to persistent storage pushes HDFS administrators out of the zone of trust. The SSL functionality is straightforward but before continuing, I’ll talk a bit more about the way in which Accumulo encryption of data at rest works.
  • #8: The collection of Accumulo 1.6 encryption at rest JIRAs encrypt each Rfile and WAL file with a data encryption key before writing to persistent storage. Each of these data encryption keys are encrypted with a service-wide key encryption key before also being written to storage, each DEK along with its associated file. This design largely achieves the goal of ensuring data security both in transit to and from and at rest. The linchpin in this scheme then, is deciding how to secure the service-wide key encryption key. This is done by implementing the Accumulo SecretKeyEncryptionStrategy interface. The SecretKeyEncryptionStrategy interface defines two methods, encryptSecretKey and decryptSecretKey, which are intended to do, to the service-side key encryption key, exactly what their names suggest. The default implementation of the SecretKeyEncryptionStrategy interface is Accumulo 1.6 stores the key encryption key in HDFS. In this talk, I’m going to focus on the use and operational aspects of the data encryption at rest support. For more detail on the design and implementation of both message encryption (SSL) and data encryption support in Accumulo 1.6, see Michael Allen's "Past and Future Threats: Encryption and Security in Accumulo" presentation from last year’s Accumulo Summit. http://accumulosummit.com/archives/2014/program/talks/
  • #9: Enabling data encryption in Accumulo is mostly painless, involving the setting of about a dozen properties in the accumulo-site.xml file, most of which can be set by rote. Only a few are shown here. The most important settings to note for the purposes of our discussion today are the crypto.secret.key.encryption.strategy.class and crypto.default.key.strategy.key.location properties, the first of which defines which implementation class will be used to secure and store the key encryption key and the second of which, in the case of the CachingHDFSSecretKeyEncryptionStrategy, specifies where in HDFS the KEK will be stored. Properties: crypto.module.class, crypto.cipher.suite, crypto.cipher.algorithm.name, crypto.block.stream.size, crypto.cipher.key.length, crypto.secure.rng, crypto.secure.rng.provider, crypto.secret.key.encryption.strategy.class, crypto.default.key.strategy.hdfs.uri, crypto.default.key.strategy.key.location, crypto.default.key.strategy.cipher.suite
  • #10: Once Accumulo encryption at rest is configured, a rogue actor can still use distcp or other means to copy data from the secure cluster to an insecure location. But attempts to actually use the data will fail because the encrypted data is meaningless to the Accumulo runtime, which cannot parse the encrypted file(s).
  • #11: So let’s pause for a moment and summarize what we’ve discussed so far. We’ve recognized that visibility labels are probably the Accumulo project's most famous security feature. They provide fine-grained, cell-level access control and thereby protect against unauthorized data access by Accumulo users and client applications. But as we’ve seen, visibility labels only protect against data access from within Accumulo processes. HDFS administrators can still access the Accumulo RFiles on disk and network administrators can access Accumulo data in transit between hosts. The support for SSL and encryption at rest addresses these threat vectors and allows us to more closely manage who is included in the implicit zone of trust around our Accumulo cluster.
  • #12: But as Michael Allen has discussed in the past and as I’ve alluded to earlier in our discussion, we’re not quite done here because the key encryption key itself is still available to anyone with proper access on HDFS, including the HDFS administrator.
  • #13: So it may seem that we’ve done quite a bit of work, or at least in our particular case, a lot of talking, to end up right back where we started. Our network administrator has definitely been removed from the zone of trust by SSL, but it seems that our HDFS administrator is a bit more stubborn.
  • #14: In the interim between the release of Accumulo 1.6 and today, support for encryption at rest in HDFS and key management in Hadoop were added to the platform. This support became generally available in Hadoop 2.6. HDFS encryption at rest, like Accumulo encryption at rest, secures data at rest and in transit between HDFS client and server. (Full stop) It prevents attacks at the HDFS, FS and OS levels. In addition it leverages the general purpose Key Management Service (KMS) also introduced in Hadoop 2.6. Both features, HDFS encryption and the KMS, are designed to meet the performance and scalability requirements necessary to provide transparent data encryption and key management function to Hadoop services without adversely impacting the operational capabilities of the services. HDFS encryption, in particular, was designed and implemented to be entirely transparent to and compatible with clients and services running in and on top of HDFS other than the hopefully minimal performance impact of the actual encryption and decryption operations. The KMS, in particular, is designed to not only provide a robust proxy between the Hadoop cluster and back end key stores but also to provide the particular management, administrative and role-based compartmentalization of access and function needed to effectively leverage key management within the Hadoop ecosystem. Before talking about how HDFS encryption and the KMS help us to solve the problem of key management for encrypted Accumulo data, I’ll talk in a bit more detail about how both HDFS encryption and the KMS work.
  • #15: HDFS encryption, like Accumulo encryption, makes use of data encryption keys (DEKs) to protect individual files and key encryption keys (KEKs) to protect a set of data encryption keys. The implementation also relies on the Hadoop Key Management Service (KMS) to manage and provide the root of trust for key encryption keys, which removes the ability of HDFS administrators to control or access KEK key material. This also blocks HDFS administrator access to encrypted data. Operational use of HDFS encryption starts with a user or process creating a key via the Hadoop Key Management Service. In our case, the Accumulo user might create an accumulo-key. Then the HDFS administrator can create an encryption zone for this user on an empty directory set with the appropriate ownership and ACLs. The encryption zone ties together a directory (and all of it subdirectories) with a particular key. The HDFS admin only needs to know the name of the key; he neither has nor requires access to the key material. Once these two steps are completed, client reads and writes to the encryption zone are automatically and transparently decrypted and encrypted by the HDFS client. As a matter of fact, no user or process on the HDFS server ever sees or has access to the key encryption key. The result is that no processes or users on the server side are able to decrypt the data. The most important take away message for our purposes here is HDFS encryption establishes the root of trust and control in the KMS rather than through any mechanism or authority within HDFS itself, which allows the flexibility to separate the data access roles from file administration roles. The most important message in general about the KMS within the context of HDFS encryption is that the key encryption key never leaves the KMS host (and if a key server is used, it never leaves the KMS process).
  • #16: NOTE: This level of detail may be unnecessary. Consider skipping this slide unless people want to know more. The steps on the previous slide are all you need to know to use HDFS encryption. All other details after step 3 are transparent to the user and to the HDFS client processes. However, in order to understand how the key encryption keys (and, thus, the data) are protected, you may want to know more. The details in steps 4 and 5 above are the mechanism used to keep data encryption keys away from any server-side processes, including the name node and data nodes. So then how does the client get the unencrypted data encryption key?
  • #17: NOTE: This level of detail may be unnecessary. Consider skipping this slide unless people want to know more. The client requests the decrypted data encryption key directly from the KMS. This functionality happens transparently implemented in the CryptoStream support in HDFS encryption. The decryption of the Data Encryption Key occurs on the KMS, so the Key Encryption Key never leaves the KMS. Furthermore, assuming ACLs are configured properly, only the client process is able to request the decryption of the Data Encryption Key. The key take away from this slide and the previous one is that all of the steps on these two slides happen within the HDFS implementation with no changes to client code. This is why it is called transparent encryption. Only administrative actions are required to enable or disable encryption. No code changes are necessary.
  • #18: I’ve said that the Hadoop KMS is the root of trust when using HDFS encryption. This is true, full stop. HDFS encryption key management keys are secured by the KMS. However, the keys are not stored by the KMS. Instead, the KMS provides a key provider API which defines how keys are stored and relies on an implementations of this API to actually store the keys somewhere. The default implementation of the key provider API stores keys in Java key store files on the local file system of the KMS. This is relatively secure, assuming the KMS host is protected from unauthorized local file system access. However, it is not particularly robust in that it cannot provide for high availability or failover. And in some environments, storing key material on an operational host, even one that is protected from unauthorized local file system access is in violation of policy. In such environments, it is likely that a dedicated and full-featured key server is in use. For example, we provider a key provider implementation that leverages a backend keystore that provides for availability and failover and also supports the Hardware Security Modules (known as HSMs) used by the most security-conscious organizations. Having the KMS as a proxy allows it to provide the particular management, administrative and role-based compartmentalization of access and function needed to effectively leverage key management within the Hadoop ecosystem while respecting the fact that the most security-conscious environments will require a dedicated key server to meet quality of service standards and/or to meet security requirements that the root of trust for keys rest in an organizational HSM.
  • #19: Getting back to our Accumulo use case, one key aspect of the separation of roles and concerns between HDFS encryption and the Hadoop KMS is that it allows us to define KMS ACLs that block particular users, including the HDFS administrator, from accessing key material and/or particular key management functions. KMS ACLs support both white and black lists on all exposed KMS functions (create, delete, rollover, get key, get key metadata, generate encrypted key, decrypt encrypted key) You can also configure white lists on a per-key basis. ACL updates in config files are loaded dynamically without re-starting the service.
  • #20: Having enabled HDFS encryption and configured the KMS, we can finally block the HDFS administrator from getting unauthorized access to secure data.
  • #21: But then again, we have to stop for a bit and consider that moving data around on the cluster is a core part of the HDFS administrator’s job. We don’t want to tempt the HDFS administrator with access to data he shouldn’t see but we also don’t want to tie his hands. Therefore, HDFS encryption introduces the /.reserved/raw virtual path. This virtual path allows the HDFS admin to perform arbitrary HDFS operations on data but when operations are performed in this virtual path, no transparent encryption or decryption occurs. So, returning to our now canonical example, we see that the HDFS administrator can still use distcp to copy the Accumulo key encryption key from the secure cluster to the insecure cluster. But this time the copy of the file on the insecure cluster is still encrypted via HDFS encryption and cannot be read on the insecure cluster which doesn’t have access to the necessary keys. Also get screen shots of checksum of key on original fs vs. exploit fs
  • #22: With the introduction of HDFS encryption, we now have a couple of viable options to protect the Accumulo key encryption key without needing to implement a custom SecretKeyEncryptionStrategy, although this remains a reasonable option for specific situations. The first option is to use HDFS encryption to protect the Accumulo KEK stored by the default SecretKeyEncryptionStrategy. The second option is to use HDFS encryption to secure the entire Accumulo data directory directly in HDFS. I’ll discuss each of these options in turn.
  • #23: The simplest option, especially if you’re already using Accumulo encryption at rest and the default SecretKeyEncryptionStrategy implementation, is to use HDFS encryption to protect the Accumulo key encryption key (KEK) with the root of trust in the backing key store of the Hadoop KMS. In this deployment model, Accumulo data continues to be protected using Accumulo encryption at rest while the Accumulo KEK itself is protected by HDFS encryption, effectively blocking the HDFS administrator from unauthorized access.
  • #24: How do we actually do this? The process is straightforward. First, we’ll need to find a brief Accumulo maintenance window (no more than five minutes, but you may want to allow more time if you want to run some acceptance tests or smoke tests afterward) as access to Accumulo data may be briefly interrupted while we encrypt the Accumulo key encryption key. TODO: More detailed speaker notes here.
  • #25: This hybrid approach requires little effort to implement and introduces the least possible operational risk, assuming you’re already using Accumulo encryption at rest. It also requires the least amount of Accumulo downtime because the distcp operation only has to copy the one Accumulo key encryption key file which will in most cases be completed in seconds. Since only the Accumulo KEK is being protected by HDFS encryption, it allows your Accumulo administrator, HDFS administrator and other Hadoop administrators to gently ramp up their adoption of HDFS encryption and Hadoop KMS without having to become experts on day one while still providing your admins with nearly all of the administrative capabilities that full use of the Hadoop KMS with Accumulo data would provide. One negative aspect of this approach, especially if you’re not already using Accumulo encryption at rest, is that the Accumulo encryption at rest implementation in Accumulo 1.6 is considered experimental and is not yet fully complete. While Accumulo 1.6 supports encryption of rfiles and write ahead logs, it does not yet provide support for encryption of recovered write ahead logs that are created and stored on HDFS when a tablet server fails. See JIRA https://issues.apache.org/jira/browse/ACCUMULO-981 (support pluggable encryption codecs for MapFile when recovering write-ahead logs) for more details.
  • #26: Another option worth considering, especially if you’re not currently already using Accumulo encryption at rest, is leveraging HDFS transparent encryption to protect your entire Accumulo data directory.
  • #27: The procedure to move the entire Accumulo data directory to an encryption zone is quite similar to the process we followed to encrypt the Accumulo key encryption key. However, in this case, because much more data is likely involved, a longer maintenance window will be needed (both because the actual distcp operation will likely take longer and also because you may want to perform more extensive verification and smoke testing on the Accumulo cluster after the modification). IMPORTANT NOTES: When using CDH, tablet servers should still be stopped via “decommissioning a node” method described in Apache Accumulo user manual BEFORE stopping service using CM. This prevents errors in the distcp process due to stale HDFS Name Node information about the Accumulo write ahead logs. The command is accumulo admin stop <host[:port]> <host[:port]> <host[:port]> <host[:port]> … (be sure to include all tablet servers) Accumulo gc.trash.ignore policy should be set to true before re-starting the Accumulo service since the /user/accumulo/.Trash directory is now outside of the encryption zone and files cannot be moved between encryption zones. Doing steps in this order avoids the issue in CDH-26178 on certain versions of Hadoop (pre C5.4 I think). Gives more people a chance of avoiding issues if they try this at home on C5.3.
  • #28: As we saw on the previous slide, using HDFS encryption to protect your entire Accumulo data directory requires no more effort than no more effort than protecting only the Accumulo key encryption key. However because the entire Accumulo data directory is being moved and encrypted, the distcp operation is likely to take significantly longer. And you may want to run more tests after encrypting the entire Accumulo directory than you would after encrypting only the Accumulo key encryption key. One great advantage of using HDFS encryption and KMS is that the skills and operational practices can be used across the set of services that run on top of HDFS rather than having different solutions for Accumulo versus other services. Also, HDFS encryption and KMS are generally available and fully supported today. The primary risk of immediately moving all of your Accumulo data to an HDFS encryption zone is that HDFS encryption, while generally available and fully supported, is a relatively new feature. It was delivered in Hadoop 2.6 which was released last summer, less than a year ago. As a consequence, HDFS or service developers are occasionally, though infrequently thus far, finding that there are subtle interactions between services running in HDFS and HDFS encryption that were not accounted for in the implementation. As a user, you would probably call these subtle interactions bugs, either bugs in HDFS encryption or bugs in the service that heretofore had little impact but which surfaced with the advent of HDFS encryption. And we’ve seen one such example in HBase recently. But over time, this disadvantage could become an advantage. Because there are a large number of services that run on top of HDFS and therefore on top of HDFS encryption, it is likely that bugs of this type will surface quickly and that HDFS encryption will achieve maturity quickly. Another consideration in choosing to move the Accumulo directory into an encryption zone is that there will be longer downtime depending on the amount of data you have stored in Accumulo as all of the data is copied from an unencrypted location to an encrypted location. There may also be performance impacts as encryption and decryption are computational expensive operations. In testing of sample of HDFS workloads, our teams have seen performance hits of between 4 and 10 percent. However, the performance impact for services running on top of HDFS will vary. For example, nascent testing has shown that some sample HBase workloads have lesser performance hits than seen on HDFS workloads because the additional processing overhead of HBase itself makes the overhead added by encryption/decryption operations negligible. Performance testing is ongoing and we’d definitely welcome offers of real and/or realistic workloads that you’re willing to share with us for testing purposes. (If more detail is requested about the HBase issue): We’ve found that, under load, the HBase write ahead log can fail when running in an HDFS encryption zone due to differing concurrency assumptions between the HBase WAL implementation and HDFS. This is being fixed in an Apache JIRA and, like many concurrency bugs, only occurs in specific circumstances but certainly issues like this are a risk and cannot be discounted when introducing significant new functionality at the HDFS layer. https://issues.apache.org/jira/browse/HBASE-13221 (HDFS Transparent Encryption breaks WAL writing)