SlideShare a Scribd company logo
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
DataWorks Summit
June 14th
, 2017
Clay Baenziger
Hadoop Infrastructure
hadoop@bloomberg.net
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Bloomberg quickly and accurately delivers business and
financial information, news and insight around the world.
A Sense of Scale:
●550 exchange feeds and over 100 billion market data messages a
day
●400 million emails and 17 million IM’s daily across the Bloomberg
Professional Service
●Over 2,700 journalists and analysts in over 120 countries
●Producing more than 5,000 stories a day
●Reaching over 360 million homes worldwide
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Project JIRAs Project JIRAs Project JIRAs
Phoenix 24 HBase 20 Spark 9
Zookeeper 8 HDFS 6 Bigtop 3
Oozie 4 Storm 2 Hive 2
Hadoop 2 YARN 2 Kafka 2
Flume 1 HAWQ 1 Total 86
Apache Solr: 3 core committers (one PMC member) – commits in every release since 4.6
(Reporter or assignee from our Foundational Services group and affiliated projects)
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
Technology Intro – Apache HBase
What?
●Distributed database designed to host very large tables – billions of
rows by millions of columns
●Block cache, bloom filters and time-line consistency for highly available,
real-time queries
●Sharded, versioned, non-relational database modeled after Google's
Bigtable – compacting log structured merge tree design
●Supports exports and backups – by global administrators
© 2017 Bloomberg Finance L.P. All rights reserved.
Technology Intro – Apache Oozie
What?
•Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
•Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
•Oozie workflows can be templated with properties.
•Oozie coordinator jobs are reoccurring Oozie workflow jobs triggered by time and data
availability.
•Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop
jobs, security tokens as well as providing system specific jobs out of the box.
•Oozie is a scalable, highly available and extensible system.
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase Multitenancy
Why?
Larger capacity reserve:
●Can handle request spikes
●Can lose a rack
●Higher per-machine usage
●Multi-cluster Hadoop support is evolving
Why Not?
●Isolation
●Easier to understand
© 2017 Bloomberg Finance L.P. All rights reserved.
Write Heavy
●Memstore Heavy
●Compactions Optimized for HFile Size
●Flush Size Tuning
Read Heavy
●Cache Heavy
●Compactions Optimized for Minimal HFiles
●Read Replicas
Mixed Read/Write
●SSDs
●Read Replicas
HBase Workloads
© 2017 Bloomberg Finance L.P. All rights reserved.
●Availability:
●Data Bugs
●Thread Death/Starvation
●User Code Bugs
●
HBase Contested Resources
Storage:
●Memstore
●HDFS
●Cache
●Input/Outut:
●Latency
●Queues
●Ingest
© 2017 Bloomberg Finance L.P. All rights reserved.
●Data Bugs:
●Can isolate tenants with Region Server Groups – HBASE-6721
●Thread Death/Starvation:
●Master becomes a zombie if filesystem object closes – HBASE-17287
●“Region Server Too Busy” – Request Quotas
●Garbage Collection “Bombs” – HBASE-18023 - “Log multi-* requests for
more than threshold number of rows”
●User Code Bugs (Coprocessors)
●”Coprocessors - Uses, Abuses, Solutions” Esther Kundin and Clay Baenziger
– HBase Con East, September 26th, 2016
●Can run only approved coprocessors – HBASE-16700 – “Allow for
Coprocessor Whitelisting”
HBase Resources – Availability
© 2017 Bloomberg Finance L.P. All rights reserved.
●Memstore (Flushes)
●Affects HFile Quantity
●Can Block Writes
●Compacting Memstore
●
●HDFS
●Denial-of-Service – HBASE-16961 - “FileSystem Quotas”
●
●Cache
●Multiple workloads thrashing the cache
HBase Resources – Storage
© 2017 Bloomberg Finance L.P. All rights reserved.
Latency
●Caching – Off-Heaping
●De-Prioritizing Scanners
Queues – Monitoring!
●Replication Queue
●Handler Queue
Ingest
●Splits
●Compactions
●Bulk-Load
HBase Resources – Input/Output
© 2017 Bloomberg Finance L.P. All rights reserved.
●Kerberos - Need a way to get identity:
●LDAP Group Traversal – HADOOP-12291
●Namespaces
●Grants
●Multiple Clusters:
●Spark-Hbase Connector (https://github.com/hortonworks-spark/shc PR#120)
●Oozie Delegation Token Acquistion – OOZIE-1646 – “HBase Table Copy
between two HBase servers doesn't work with Kerberos”
HBase Access
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
Oozie and Self-Service
Self-Service:
“The serving of oneself with goods or services” – Merriam-Webster.COM
• Automation – pipelines – workflows; re-occurrence – coordinators
•Job Status – callback URLs to notify of job progress
•Job IDs – Map/Reduce or YARN IDs for action’s internal sub-jobs’ ID’s
•Authentication – delegation tokens (no keytabs!)
© 2017 Bloomberg Finance L.P. All rights reserved.
Separates Continuous Integration from Continuous Deployment
Leverages:
●Git – OOZIE-2877
●Maven – OOZIE-2878
Axioms – write a deployment workflow for your job (and its workflow)
Must follow principals of:
●Idempotency – On re-runs, results in same state as first run
●Cleanliness – Removes old artifacts (state)
●Separation of Configuration from Code – deploy same workflow to
development and production with only workflow property differences
●See also slides to ”Cluster Continuous Delivery with Oozie” – ApacheCon
North America - Big Data, May 18th, 2017
Oozie Deployments
© 2017 Bloomberg Finance L.P. All rights reserved.
About Bloomberg
Technology Introduction
HBase Multitenancy
●Workloads, Resources, Access
Oozie Self-Service
●Deployments
HBase and Oozie
●Scheduled Compactions, Exporting HBase Snapshots
What we will cover today
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Why?
Self-Service – No DBA!
Regularly schedulable
Runs as the project user with no keytab
Logs and reporting
●Can record HBase errors with logs organized by each Oozie job
●Can report success of job with proactive callbacks
●Can verify performance of job with SLA subsystem
●Can provide infrastructure insight into what’s running
© 2017 Bloomberg Finance L.P. All rights reserved.
●
●
HBase Scheduled Compactions
Compactions are I/O and processing heavy – can be:
●Detrimental to read or write performance
●Lead to split storms or rebalancing
●Good to plan for impact – schedule them
●Can also go region-by-region to lessen impact
●Can poll to know when compaction is complete
●A simple Java action:https://tinyurl.com/oozie-hbase-compaction
© 2017 Bloomberg Finance L.P. All rights reserved.
●
●
HBase Scheduled Compactions
Key Oozie-isms:
●Java action needs to know how to get configuration:
conf.addResource(new Path("file:///", 
System.getProperty("oozie.action.conf.xml")));
●Need to pass delegation tokens from Oozie:
if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != 
null)
 { conf.set("mapreduce.job.credentials.binary",
       System.getenv("HADOOP_TOKEN_FILE_LOCATION")); }
●Need to use Oozie’s Credentials Action Authentication
●Must pass in properties manually – no job­xml support – OOZIE-2947
© 2017 Bloomberg Finance L.P. All rights reserved.
Backup Requirements:
●Live backups (cannot disable table or take HBase offline)
●Self-service (non-HBase user can backup/restore their own data)
●Automatable procedure (Oozie)
●Works on a secure cluster (hdfs:///hbase is non-world readable)
●Backup location may not be running Hbase
●Does not require significant architectural “baggage” addition to HBase
●
●
HBase Snapshot Export
Table Snapshot
Snapshot
initiated
Oozie workflow submitted
(as namespace admin)
HBase Snapshot Export
HBase Export
Snapshot
Perm. check
Oozie Export
Snapshot Action
Namespace Admin
User
HDFS
Create “dropbox” directory
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
Clay Baenziger
Hadoop Infrastructure
https://github.com/bloomberg
hadoop@bloomberg.net
© 2017 Bloomberg Finance L.P. All rights reserved.
HBase and Oozie
Multitenancy at Bloomberg
DataWorks Summit
June 14th
, 2017
Clay Baenziger
Hadoop Infrastructure
Hadoop@Bloomberg.NET

More Related Content

PPTX
Schema Registry - Set Your Data Free
PPTX
Curb your insecurity with HDP
PDF
HAWQ Meets Hive - Querying Unmanaged Data
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
PPTX
What's new in apache hive
PPTX
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
PPTX
Troubleshooting Kerberos in Hadoop: Taming the Beast
PPTX
The Future of Apache Ambari
Schema Registry - Set Your Data Free
Curb your insecurity with HDP
HAWQ Meets Hive - Querying Unmanaged Data
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
What's new in apache hive
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Troubleshooting Kerberos in Hadoop: Taming the Beast
The Future of Apache Ambari

What's hot (20)

PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Apache Ranger Hive Metastore Security
PPTX
From Zero to Data Flow in Hours with Apache NiFi
PDF
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
PPTX
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
PPTX
End-to-End Security and Auditing in a Big Data as a Service Deployment
PPTX
Enabling Modern Application Architecture using Data.gov open government data
PPTX
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
PDF
Pivotal HAWQ 소개
PPTX
Evolving HDFS to a Generalized Storage Subsystem
PPTX
IoT with Apache MXNet and Apache NiFi and MiniFi
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
PPTX
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
PPTX
Sharing metadata across the data lake and streams
PDF
SQL and Machine Learning on Hadoop using HAWQ
PPTX
Protecting your Critical Hadoop Clusters Against Disasters
PPTX
Cloudy with a Chance of Hadoop - Real World Considerations
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
PPTX
Druid and Hive Together : Use Cases and Best Practices
Apache Hive 2.0: SQL, Speed, Scale
Apache Ranger Hive Metastore Security
From Zero to Data Flow in Hours with Apache NiFi
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
End-to-End Security and Auditing in a Big Data as a Service Deployment
Enabling Modern Application Architecture using Data.gov open government data
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Pivotal HAWQ 소개
Evolving HDFS to a Generalized Storage Subsystem
IoT with Apache MXNet and Apache NiFi and MiniFi
An Overview on Optimization in Apache Hive: Past, Present Future
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Sharing metadata across the data lake and streams
SQL and Machine Learning on Hadoop using HAWQ
Protecting your Critical Hadoop Clusters Against Disasters
Cloudy with a Chance of Hadoop - Real World Considerations
Apache Hive 2.0: SQL, Speed, Scale
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Druid and Hive Together : Use Cases and Best Practices
Ad

Similar to Multitenancy At Bloomberg - HBase and Oozie (20)

PDF
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
PDF
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
PDF
October 2013 HUG: Oozie 4.x
PPTX
HBase Low Latency
PPTX
Hadoop at Bloomberg:Medium data for the financial industry
PPTX
Oozie & sqoop by pradeep
PPTX
HBase coprocessors, Uses, Abuses, Solutions
PPTX
Rolling Out Apache HBase for Mobile Offerings at Visa
PDF
HUG_Ireland_BryanQuinnPresentation_20160111
PPT
Hadoop ecosystem framework n hadoop in live environment
PPTX
HBase Low Latency, StrataNYC 2014
PPTX
HBaseCon 2015: HBase Operations in a Flurry
PDF
Hadoop 80hr v1.0
PDF
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
DOCX
Big data unit iv and v lecture notes qb model exam
PPTX
Big Data Processing Using Hadoop Infrastructure
PPTX
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
PPTX
Apache ooziehhwkkwksjshshjjwjwisisis.pptx
PPT
Apache hadoop, hdfs and map reduce Overview
PPTX
Data Management on Hadoop at Yahoo!
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
October 2013 HUG: Oozie 4.x
HBase Low Latency
Hadoop at Bloomberg:Medium data for the financial industry
Oozie & sqoop by pradeep
HBase coprocessors, Uses, Abuses, Solutions
Rolling Out Apache HBase for Mobile Offerings at Visa
HUG_Ireland_BryanQuinnPresentation_20160111
Hadoop ecosystem framework n hadoop in live environment
HBase Low Latency, StrataNYC 2014
HBaseCon 2015: HBase Operations in a Flurry
Hadoop 80hr v1.0
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big data unit iv and v lecture notes qb model exam
Big Data Processing Using Hadoop Infrastructure
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
Apache ooziehhwkkwksjshshjjwjwisisis.pptx
Apache hadoop, hdfs and map reduce Overview
Data Management on Hadoop at Yahoo!
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Unlocking AI with Model Context Protocol (MCP)
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Unlocking AI with Model Context Protocol (MCP)

Multitenancy At Bloomberg - HBase and Oozie

  • 1. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg DataWorks Summit June 14th , 2017 Clay Baenziger Hadoop Infrastructure hadoop@bloomberg.net
  • 2. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 3. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Bloomberg quickly and accurately delivers business and financial information, news and insight around the world. A Sense of Scale: ●550 exchange feeds and over 100 billion market data messages a day ●400 million emails and 17 million IM’s daily across the Bloomberg Professional Service ●Over 2,700 journalists and analysts in over 120 countries ●Producing more than 5,000 stories a day ●Reaching over 360 million homes worldwide
  • 4. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Project JIRAs Project JIRAs Project JIRAs Phoenix 24 HBase 20 Spark 9 Zookeeper 8 HDFS 6 Bigtop 3 Oozie 4 Storm 2 Hive 2 Hadoop 2 YARN 2 Kafka 2 Flume 1 HAWQ 1 Total 86 Apache Solr: 3 core committers (one PMC member) – commits in every release since 4.6 (Reporter or assignee from our Foundational Services group and affiliated projects)
  • 5. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 6. © 2017 Bloomberg Finance L.P. All rights reserved. Technology Intro – Apache HBase What? ●Distributed database designed to host very large tables – billions of rows by millions of columns ●Block cache, bloom filters and time-line consistency for highly available, real-time queries ●Sharded, versioned, non-relational database modeled after Google's Bigtable – compacting log structured merge tree design ●Supports exports and backups – by global administrators
  • 7. © 2017 Bloomberg Finance L.P. All rights reserved. Technology Intro – Apache Oozie What? •Oozie is a workflow scheduler system to manage Apache Hadoop jobs. •Oozie workflow jobs are Directed Acyclical Graphs (DAGs) of actions. •Oozie workflows can be templated with properties. •Oozie coordinator jobs are reoccurring Oozie workflow jobs triggered by time and data availability. •Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs, security tokens as well as providing system specific jobs out of the box. •Oozie is a scalable, highly available and extensible system.
  • 8. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 9. © 2017 Bloomberg Finance L.P. All rights reserved. HBase Multitenancy Why? Larger capacity reserve: ●Can handle request spikes ●Can lose a rack ●Higher per-machine usage ●Multi-cluster Hadoop support is evolving Why Not? ●Isolation ●Easier to understand
  • 10. © 2017 Bloomberg Finance L.P. All rights reserved. Write Heavy ●Memstore Heavy ●Compactions Optimized for HFile Size ●Flush Size Tuning Read Heavy ●Cache Heavy ●Compactions Optimized for Minimal HFiles ●Read Replicas Mixed Read/Write ●SSDs ●Read Replicas HBase Workloads
  • 11. © 2017 Bloomberg Finance L.P. All rights reserved. ●Availability: ●Data Bugs ●Thread Death/Starvation ●User Code Bugs ● HBase Contested Resources Storage: ●Memstore ●HDFS ●Cache ●Input/Outut: ●Latency ●Queues ●Ingest
  • 12. © 2017 Bloomberg Finance L.P. All rights reserved. ●Data Bugs: ●Can isolate tenants with Region Server Groups – HBASE-6721 ●Thread Death/Starvation: ●Master becomes a zombie if filesystem object closes – HBASE-17287 ●“Region Server Too Busy” – Request Quotas ●Garbage Collection “Bombs” – HBASE-18023 - “Log multi-* requests for more than threshold number of rows” ●User Code Bugs (Coprocessors) ●”Coprocessors - Uses, Abuses, Solutions” Esther Kundin and Clay Baenziger – HBase Con East, September 26th, 2016 ●Can run only approved coprocessors – HBASE-16700 – “Allow for Coprocessor Whitelisting” HBase Resources – Availability
  • 13. © 2017 Bloomberg Finance L.P. All rights reserved. ●Memstore (Flushes) ●Affects HFile Quantity ●Can Block Writes ●Compacting Memstore ● ●HDFS ●Denial-of-Service – HBASE-16961 - “FileSystem Quotas” ● ●Cache ●Multiple workloads thrashing the cache HBase Resources – Storage
  • 14. © 2017 Bloomberg Finance L.P. All rights reserved. Latency ●Caching – Off-Heaping ●De-Prioritizing Scanners Queues – Monitoring! ●Replication Queue ●Handler Queue Ingest ●Splits ●Compactions ●Bulk-Load HBase Resources – Input/Output
  • 15. © 2017 Bloomberg Finance L.P. All rights reserved. ●Kerberos - Need a way to get identity: ●LDAP Group Traversal – HADOOP-12291 ●Namespaces ●Grants ●Multiple Clusters: ●Spark-Hbase Connector (https://github.com/hortonworks-spark/shc PR#120) ●Oozie Delegation Token Acquistion – OOZIE-1646 – “HBase Table Copy between two HBase servers doesn't work with Kerberos” HBase Access
  • 16. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 17. © 2017 Bloomberg Finance L.P. All rights reserved. Oozie and Self-Service Self-Service: “The serving of oneself with goods or services” – Merriam-Webster.COM • Automation – pipelines – workflows; re-occurrence – coordinators •Job Status – callback URLs to notify of job progress •Job IDs – Map/Reduce or YARN IDs for action’s internal sub-jobs’ ID’s •Authentication – delegation tokens (no keytabs!)
  • 18. © 2017 Bloomberg Finance L.P. All rights reserved. Separates Continuous Integration from Continuous Deployment Leverages: ●Git – OOZIE-2877 ●Maven – OOZIE-2878 Axioms – write a deployment workflow for your job (and its workflow) Must follow principals of: ●Idempotency – On re-runs, results in same state as first run ●Cleanliness – Removes old artifacts (state) ●Separation of Configuration from Code – deploy same workflow to development and production with only workflow property differences ●See also slides to ”Cluster Continuous Delivery with Oozie” – ApacheCon North America - Big Data, May 18th, 2017 Oozie Deployments
  • 19. © 2017 Bloomberg Finance L.P. All rights reserved. About Bloomberg Technology Introduction HBase Multitenancy ●Workloads, Resources, Access Oozie Self-Service ●Deployments HBase and Oozie ●Scheduled Compactions, Exporting HBase Snapshots What we will cover today
  • 20. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Why? Self-Service – No DBA! Regularly schedulable Runs as the project user with no keytab Logs and reporting ●Can record HBase errors with logs organized by each Oozie job ●Can report success of job with proactive callbacks ●Can verify performance of job with SLA subsystem ●Can provide infrastructure insight into what’s running
  • 21. © 2017 Bloomberg Finance L.P. All rights reserved. ● ● HBase Scheduled Compactions Compactions are I/O and processing heavy – can be: ●Detrimental to read or write performance ●Lead to split storms or rebalancing ●Good to plan for impact – schedule them ●Can also go region-by-region to lessen impact ●Can poll to know when compaction is complete ●A simple Java action:https://tinyurl.com/oozie-hbase-compaction
  • 22. © 2017 Bloomberg Finance L.P. All rights reserved. ● ● HBase Scheduled Compactions Key Oozie-isms: ●Java action needs to know how to get configuration: conf.addResource(new Path("file:///",  System.getProperty("oozie.action.conf.xml"))); ●Need to pass delegation tokens from Oozie: if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") !=  null)  { conf.set("mapreduce.job.credentials.binary",        System.getenv("HADOOP_TOKEN_FILE_LOCATION")); } ●Need to use Oozie’s Credentials Action Authentication ●Must pass in properties manually – no job­xml support – OOZIE-2947
  • 23. © 2017 Bloomberg Finance L.P. All rights reserved. Backup Requirements: ●Live backups (cannot disable table or take HBase offline) ●Self-service (non-HBase user can backup/restore their own data) ●Automatable procedure (Oozie) ●Works on a secure cluster (hdfs:///hbase is non-world readable) ●Backup location may not be running Hbase ●Does not require significant architectural “baggage” addition to HBase ● ● HBase Snapshot Export
  • 24. Table Snapshot Snapshot initiated Oozie workflow submitted (as namespace admin) HBase Snapshot Export HBase Export Snapshot Perm. check Oozie Export Snapshot Action Namespace Admin User HDFS Create “dropbox” directory
  • 25. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg Clay Baenziger Hadoop Infrastructure https://github.com/bloomberg hadoop@bloomberg.net
  • 26. © 2017 Bloomberg Finance L.P. All rights reserved. HBase and Oozie Multitenancy at Bloomberg DataWorks Summit June 14th , 2017 Clay Baenziger Hadoop Infrastructure Hadoop@Bloomberg.NET