SlideShare a Scribd company logo
Columnar databases
Roy Ian
05/02/2020
Row vs. columnar relational databases
‱ All relational databases deal with tables, rows, and columns
‱ But there are sub-types:
‱ row-oriented: they are internally organized around the handling of rows
‱ columnar / column-oriented: these mainly work with columns
‱ Both types usually offer SQL interfaces and produce tables (with rows
and columns) as their result sets
‱ Both types can generally solve the same queries
‱ Both types have specific use cases that they're good for (and use
cases that they're not good for)
Row vs. columnar relational databases
‱ In practice, row-oriented databases are often optimized and
particularly good for OLTP workloads
‱ whereas column-oriented databases are often well-suited for OLAP
workloads
‱ This is due to the different internal designs of row- and column-
oriented databases
Row-oriented storage
‱ In row-oriented databases, row value data is usually stored
contiguously:
Row-oriented storage
‱ When looking at a table's datafile, it could look like as follows:
‱ Actual row values are stored at specific offsets of the values struct:
‱ Offsets depend on column types, e.g. 4 for int32, 8 for int64 etc.
Row-oriented storage
‱ Row-oriented storage is good if we need to touch one row. This
normally requires reading/writing a single page
‱ Row-oriented storage is beneficial if all or most columns of a row
need to be read or written. This can be done with a single read/write.
‱ Row-oriented storage is very inefficient if not all columns are needed
but a lot of rows need to be read:
‱ Full rows are read, including columns not used by a query
‱ Reads are done page-wise. Not many rows may fit on a page when rows are
big
‱ Pages are normally not fully filled, which leads to reading lots of unused areas
‱ Record (and sometimes page) headers need to be read, too but do not
contain actual row data
Column-oriented storage
‱ Column-oriented databases primarily work on columns
‱ All columns are treated individually
‱ Values of a single column are stored contiguously
‱ This allows array-processing the values of a column
‱ Rows may be constructed from column values later if required
‱ This means column stores can still produce row output (tables)
‱ Values from multiple columns need to be retrieved and assembled for
that, making implementation of bit more complex
‱ Query processors in columnar databases work on columns, too
Column-oriented storage
‱ Column stores store data in column-specific files
‱ Simplest case: one datafile per column
‱ Row values for each column are stored contiguously
Column-oriented storage
‱ Since the data is stored in column wise, a single block can store many
values compared to row-oriented databases
‱ Records per block,
‱ Row-oriented:
Block size
Total record size
‱ Column oriented:
Block size
Column size
‱ More data per block => less block reads => improve I/O efficnency
Column-oriented storage – Compression
‱ Almost all column stores perform compression
– Compression further reduces the storage footprint of each column
– Column data type tailored compression
‱ RLE (Run-length encoding)
‱ Integer packing
‱ Dictionary and lookup string compression
‱ Other (depends on column store)
‱ Effective compression reduces storage cost
‱ IO reduction yields decreased response times during queries as well
- Queries may execute an order of magnitude faster compared to queries over
the same data set on a row store
‱ 10:1 to 30:1 compression rations may be seen
Column-oriented storage – Compression
‱ All data within each column datafile have the same type, making it
ideal for compression
‱ Usually a much better compression factor can be achieved for single
columns than for entire rows
‱ Compression allows reducing disk I/O when reading/writing column
data but has some CPU cost
‱ For data sets bigger than the memory size compression is often
beneficial because disk access is slower than decompression
Column-oriented storage – Compression
‱ A good use case for compression in column stores is dictionary
compression for variable length string values
‱ Each unique string is assigned an integer number
‱ The dictionary, consisting of integer number and string value, is saved
as column meta data
‱ Column values are then integers only, making them small and fixed
width
‱ This can save much space if string values are non-unique
‱ With dictionaries sorted by column value, this will also allow range
queries
Column-oriented storage – IO saving
‱ Column stores can greatly improve the performance of queries that only
touch a small amount of columns
‱ This is because they will only access these columns’ content
‱ Simple math: table t has a total of 10 GB data, with
‱ column a: 4 GB
‱ column b: 2 GB
‱ column c: 3 GB
‱ column d: 1 GB
‱ If a query only uses column d, at most 1 GB of data will be processed by a
column store
‱ Could read even less with compression
‱ In a row store, the full 10 GB will be processed
Column-oriented storage – segments
‱ Column data in column stores is often grouped into segments/packets of a
specific size (e.g. 64 K values)
‱ Meta data is calculated and stored separately per segment, e.g.:
‱ Segment meta data can be checked during query processing when no indexes are
available
‱ Min value in segment
‱ Max value in segment
‱ Number of NOT NULL values in segment
‱ Histograms
‱ Compression meta data
‱ Segment meta data may provide information about whether the segment can be
skipped entirely, allowing to reduce the number of values that need to be
processed in the query
‱ Calculating segment meta data is a relatively cheap operation (only needs to
traverse column values in segment) but still should occur infrequently
‱ In a read-only or read-mostly workload, this is tolerable
Column-oriented storage – processing
‱ Column values are not processed row-at-a-time, but block-at-a-time
‱ This reduces the number of function calls (function call per block of
values, but not per row)
‱ Operating in blocks allows compiler optimizations, e.g. loop unrolling,
parallelization, pipelining
‱ Column values are normally positioned in contiguous memory
locations, also allowing SIMD operations (vectorization)
‱ Working on many subsequent memory positions also improves cache
usage (multiple values are in the same cache line) and reduces
pipeline stalls
‱ All these make column stores ideal for batch processing
Column-oriented storage – processing
‱ Reading all columns of a row is an expensive operation in a column
store, so full row tuple construction is avoided or delayed as much as
possible internally
‱ Updating/deleting or inserting rows may also be very expensive and
may cost much more time than in a row store
‱ Some column stores are hybrids, with read-optimized (column)
storage and write-optimized OLTP storage
‱ Still, column stores are not really made for OLTP workloads, and if you
need to work with many columns at once, you'll pay a price in a
column store
OLTP
‱ Transactional processing
‱ Retrieve or modify individual records (mostly few records)
‱ Use indexes to quickly find relevant records
‱ Queries often triggered by end user actions and should complete
instantly
‱ ACID properties may be important
‱ Mixed read/write workload working set should fit in RAM
OLAP
‱ Analytical processing / reporting
‱ Derive new information from existing data (aggregates,
transformations, calculations)
‱ Queries often run on many records or complete data set data set
may exceed size of RAM easily
‱ Mainly read or even read-only workload
‱ ACID properties often not important, data can often be regenerated
‱ Queries often run interactively
‱ Common: not known in advance which aspects are interesting so pre-
indexing „relevant“ columns is difficult
Big Data Storages: HBbase
Big-datalandscape
HistoryofHBase
HBasedefinition
Sparse
Consistent
Distributed
Multi-
dimensional
Sorted map
HBase VsRDBMS
Hbase RDBMS
Column-oriented Row-oriented (Mostly)
Flexible schema, add columns on the fly Fixed schema
Good with sparse tables Not optimized for sparse tables
Not optimized for joins – still possible with map reduce Optimized for joins
Tight integration with MR Not really
Horizontal scalability – just add hardware Hard to shared and scale
Good for semi-structured data as well as structured data Good for structured data
When touseHBase?
Uses of
HBase
Unstructured
data
High
Scalability
Visional data
Generating
data fromMR
work flow
Column-
oriented data
High volume
data to be
stored
When not touseHBase?
‱ When you have only few thousands/millions rows
‱ Lacks of RDBMS commands
‱ When you have hardware less than 5 data nodes when replica
factor is 3
Note: HBase can run quite well in stand-alone mode on a laptop, but,
this should be considers a development configuration only
Uses ofHBase
Column family
‱ In the HBase data model columns are grouped into column families
‱ Column families must be defined up front during table creation
‱ Column families are stored together on disk, which is why HBase is referred to
as a column-oriented data store
Datamodel
Datamodel
Row key Personal_data Demographic
Personal_ID Name Address Birth Date Gender
1 H. Houdini Budapest 1926-10-31 M
2 D. Copper 1956-09-16
3 - - - M
HBase architecture– overview
Datadistribution
‱ HBase HMaster is a lightweight process that assigns regions to region servers in
the Hadoop cluster for load balancing.
‱ Manages and Monitors the Hadoop Cluster
‱ Performs Administration (Interface for creating, updating and deleting tables.)
‱ Controlling the failover
‱ DDL operations are handled by the HMaster
‱ Whenever a client wants to change the schema and change any of the
metadata operations, HMaster is responsible for all these operation
Architecture-HMaster
‱ These are the worker nodes which handle read, write, update, and delete
requests from clients
‱ Region Server process, runs on every node in the Hadoop cluster
‱ Block Cache – This is the read cache. Most frequently read data is stored in the
read cache and whenever the block cache is full, recently used data is evicted.
‱ MemStore- This is the write cache and stores new data that is not yet written to
the disk. Every column family in a region has a MemStore.
‱ Write Ahead Log (WAL) is a file that stores new data that is not persisted to
permanent storage.
‱ HFile is the actual storage file that stores the rows as sorted key values on a disk
Architecture–Regionserver
HBase writepath
Datadistribution
HBase Readpath
HBasecompaction
HBase storagearchitecture
CAPtheory inHBase
‱ HBase supports Consistency and Partition tolerance.
‱ IT compromises the Availability factor
‱ Partition tolerance
‱ HBase runs on top of Hadoop distribution
‱ All the HBase data are stored in HFDS
‱ Hadoop is designed to have fault tolerance and therefore, HBase inherit the
partition tolerance capability.
CAPtheory inHBasecontd.
‱ Consistency
‱ Access to row data is atomic and includes any number of columns being read
or written to
‱ The atomic access is a factor to this architecture being strictly consistent, as
each concurrent reader and writer can make safe assumptions about the state
of a row
‱ When data is updated it is first written to a commit log, called a write-ahead
log (WAL) in Hbase
‱ Then stored in the (sorted by RowId) in-memory memstore
‱ Once the data in memory has exceeded a given maximum value, it is flushed
as an HFile to disk
‱ After the flush, the commit logs can be discarded up to the last unflushed
modification
CAPtheory inHBasecontd.
‱ Availability
‱ HBase compromises the availability factor
‱ But, Cloudera Enterprise 5.9.x and Hortonworks Data Platform 2.2
implements high available feature in HBase
‱ They provides a feature called region replication to achieve high availability for
reads

More Related Content

PPTX
HBase in Practice
PPTX
HBase in Practice
PPTX
Rise of Column Oriented Database
PPTX
Column db dol
PPTX
Designing data intensive applications
PPTX
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
PPT
Schema Design
 
PPTX
SQL Server 2014 Memory Optimised Tables - Advanced
HBase in Practice
HBase in Practice
Rise of Column Oriented Database
Column db dol
Designing data intensive applications
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
Schema Design
 
SQL Server 2014 Memory Optimised Tables - Advanced

Similar to Columnar databases on Big data analytics (20)

PPTX
UNIT I Introduction to NoSQL.pptx
PPTX
PPT
Main MeMory Data Base
PPTX
UNIT I Introduction to NoSQL.pptx
PPT
Parquet and impala overview external
PPT
Lecture3.ppt
PDF
Redshift deep dive
PPTX
No sql databases
PPTX
PPTX
File Organization in database management.pptx
PPTX
Cassandra from the trenches: migrating Netflix (update)
PDF
Shard-Query, an MPP database for the cloud using the LAMP stack
PPTX
Cassandra an overview
PPTX
SQL Explore 2012: P&T Part 2
PPTX
01 hbase
PDF
Database Technologies
PPTX
Cassandra from the trenches: migrating Netflix
PPT
Data Indexing Presentation-My.pptppt.ppt
 
PPTX
Unit 5.pptx computer graphics and gaming
PPTX
Comparative study of modern databases
UNIT I Introduction to NoSQL.pptx
Main MeMory Data Base
UNIT I Introduction to NoSQL.pptx
Parquet and impala overview external
Lecture3.ppt
Redshift deep dive
No sql databases
File Organization in database management.pptx
Cassandra from the trenches: migrating Netflix (update)
Shard-Query, an MPP database for the cloud using the LAMP stack
Cassandra an overview
SQL Explore 2012: P&T Part 2
01 hbase
Database Technologies
Cassandra from the trenches: migrating Netflix
Data Indexing Presentation-My.pptppt.ppt
 
Unit 5.pptx computer graphics and gaming
Comparative study of modern databases
Ad

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
top salesforce developer skills in 2025.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
PPTX
L1 - Introduction to python Backend.pptx
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Digital Strategies for Manufacturing Companies
PPT
Introduction Database Management System for Course Database
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Transform Your Business with a Software ERP System
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Upgrade and Innovation Strategies for SAP ERP Customers
top salesforce developer skills in 2025.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
How to Choose the Right IT Partner for Your Business in Malaysia
Navsoft: AI-Powered Business Solutions & Custom Software Development
Adobe Illustrator 28.6 Crack My Vision of Vector Design
CHAPTER 2 - PM Management and IT Context
VVF-Customer-Presentation2025-Ver1.9.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
 
L1 - Introduction to python Backend.pptx
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Digital Strategies for Manufacturing Companies
Introduction Database Management System for Course Database
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Transform Your Business with a Software ERP System
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Operating system designcfffgfgggggggvggggggggg
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Ad

Columnar databases on Big data analytics

  • 2. Row vs. columnar relational databases ‱ All relational databases deal with tables, rows, and columns ‱ But there are sub-types: ‱ row-oriented: they are internally organized around the handling of rows ‱ columnar / column-oriented: these mainly work with columns ‱ Both types usually offer SQL interfaces and produce tables (with rows and columns) as their result sets ‱ Both types can generally solve the same queries ‱ Both types have specific use cases that they're good for (and use cases that they're not good for)
  • 3. Row vs. columnar relational databases ‱ In practice, row-oriented databases are often optimized and particularly good for OLTP workloads ‱ whereas column-oriented databases are often well-suited for OLAP workloads ‱ This is due to the different internal designs of row- and column- oriented databases
  • 4. Row-oriented storage ‱ In row-oriented databases, row value data is usually stored contiguously:
  • 5. Row-oriented storage ‱ When looking at a table's datafile, it could look like as follows: ‱ Actual row values are stored at specific offsets of the values struct: ‱ Offsets depend on column types, e.g. 4 for int32, 8 for int64 etc.
  • 6. Row-oriented storage ‱ Row-oriented storage is good if we need to touch one row. This normally requires reading/writing a single page ‱ Row-oriented storage is beneficial if all or most columns of a row need to be read or written. This can be done with a single read/write. ‱ Row-oriented storage is very inefficient if not all columns are needed but a lot of rows need to be read: ‱ Full rows are read, including columns not used by a query ‱ Reads are done page-wise. Not many rows may fit on a page when rows are big ‱ Pages are normally not fully filled, which leads to reading lots of unused areas ‱ Record (and sometimes page) headers need to be read, too but do not contain actual row data
  • 7. Column-oriented storage ‱ Column-oriented databases primarily work on columns ‱ All columns are treated individually ‱ Values of a single column are stored contiguously ‱ This allows array-processing the values of a column ‱ Rows may be constructed from column values later if required ‱ This means column stores can still produce row output (tables) ‱ Values from multiple columns need to be retrieved and assembled for that, making implementation of bit more complex ‱ Query processors in columnar databases work on columns, too
  • 8. Column-oriented storage ‱ Column stores store data in column-specific files ‱ Simplest case: one datafile per column ‱ Row values for each column are stored contiguously
  • 9. Column-oriented storage ‱ Since the data is stored in column wise, a single block can store many values compared to row-oriented databases ‱ Records per block, ‱ Row-oriented: Block size Total record size ‱ Column oriented: Block size Column size ‱ More data per block => less block reads => improve I/O efficnency
  • 10. Column-oriented storage – Compression ‱ Almost all column stores perform compression – Compression further reduces the storage footprint of each column – Column data type tailored compression ‱ RLE (Run-length encoding) ‱ Integer packing ‱ Dictionary and lookup string compression ‱ Other (depends on column store) ‱ Effective compression reduces storage cost ‱ IO reduction yields decreased response times during queries as well - Queries may execute an order of magnitude faster compared to queries over the same data set on a row store ‱ 10:1 to 30:1 compression rations may be seen
  • 11. Column-oriented storage – Compression ‱ All data within each column datafile have the same type, making it ideal for compression ‱ Usually a much better compression factor can be achieved for single columns than for entire rows ‱ Compression allows reducing disk I/O when reading/writing column data but has some CPU cost ‱ For data sets bigger than the memory size compression is often beneficial because disk access is slower than decompression
  • 12. Column-oriented storage – Compression ‱ A good use case for compression in column stores is dictionary compression for variable length string values ‱ Each unique string is assigned an integer number ‱ The dictionary, consisting of integer number and string value, is saved as column meta data ‱ Column values are then integers only, making them small and fixed width ‱ This can save much space if string values are non-unique ‱ With dictionaries sorted by column value, this will also allow range queries
  • 13. Column-oriented storage – IO saving ‱ Column stores can greatly improve the performance of queries that only touch a small amount of columns ‱ This is because they will only access these columns’ content ‱ Simple math: table t has a total of 10 GB data, with ‱ column a: 4 GB ‱ column b: 2 GB ‱ column c: 3 GB ‱ column d: 1 GB ‱ If a query only uses column d, at most 1 GB of data will be processed by a column store ‱ Could read even less with compression ‱ In a row store, the full 10 GB will be processed
  • 14. Column-oriented storage – segments ‱ Column data in column stores is often grouped into segments/packets of a specific size (e.g. 64 K values) ‱ Meta data is calculated and stored separately per segment, e.g.: ‱ Segment meta data can be checked during query processing when no indexes are available ‱ Min value in segment ‱ Max value in segment ‱ Number of NOT NULL values in segment ‱ Histograms ‱ Compression meta data ‱ Segment meta data may provide information about whether the segment can be skipped entirely, allowing to reduce the number of values that need to be processed in the query ‱ Calculating segment meta data is a relatively cheap operation (only needs to traverse column values in segment) but still should occur infrequently ‱ In a read-only or read-mostly workload, this is tolerable
  • 15. Column-oriented storage – processing ‱ Column values are not processed row-at-a-time, but block-at-a-time ‱ This reduces the number of function calls (function call per block of values, but not per row) ‱ Operating in blocks allows compiler optimizations, e.g. loop unrolling, parallelization, pipelining ‱ Column values are normally positioned in contiguous memory locations, also allowing SIMD operations (vectorization) ‱ Working on many subsequent memory positions also improves cache usage (multiple values are in the same cache line) and reduces pipeline stalls ‱ All these make column stores ideal for batch processing
  • 16. Column-oriented storage – processing ‱ Reading all columns of a row is an expensive operation in a column store, so full row tuple construction is avoided or delayed as much as possible internally ‱ Updating/deleting or inserting rows may also be very expensive and may cost much more time than in a row store ‱ Some column stores are hybrids, with read-optimized (column) storage and write-optimized OLTP storage ‱ Still, column stores are not really made for OLTP workloads, and if you need to work with many columns at once, you'll pay a price in a column store
  • 17. OLTP ‱ Transactional processing ‱ Retrieve or modify individual records (mostly few records) ‱ Use indexes to quickly find relevant records ‱ Queries often triggered by end user actions and should complete instantly ‱ ACID properties may be important ‱ Mixed read/write workload working set should fit in RAM
  • 18. OLAP ‱ Analytical processing / reporting ‱ Derive new information from existing data (aggregates, transformations, calculations) ‱ Queries often run on many records or complete data set data set may exceed size of RAM easily ‱ Mainly read or even read-only workload ‱ ACID properties often not important, data can often be regenerated ‱ Queries often run interactively ‱ Common: not known in advance which aspects are interesting so pre- indexing „relevant“ columns is difficult
  • 23. HBase VsRDBMS Hbase RDBMS Column-oriented Row-oriented (Mostly) Flexible schema, add columns on the fly Fixed schema Good with sparse tables Not optimized for sparse tables Not optimized for joins – still possible with map reduce Optimized for joins Tight integration with MR Not really Horizontal scalability – just add hardware Hard to shared and scale Good for semi-structured data as well as structured data Good for structured data
  • 24. When touseHBase? Uses of HBase Unstructured data High Scalability Visional data Generating data fromMR work flow Column- oriented data High volume data to be stored
  • 25. When not touseHBase? ‱ When you have only few thousands/millions rows ‱ Lacks of RDBMS commands ‱ When you have hardware less than 5 data nodes when replica factor is 3 Note: HBase can run quite well in stand-alone mode on a laptop, but, this should be considers a development configuration only
  • 27. Column family ‱ In the HBase data model columns are grouped into column families ‱ Column families must be defined up front during table creation ‱ Column families are stored together on disk, which is why HBase is referred to as a column-oriented data store
  • 29. Datamodel Row key Personal_data Demographic Personal_ID Name Address Birth Date Gender 1 H. Houdini Budapest 1926-10-31 M 2 D. Copper 1956-09-16 3 - - - M
  • 32. ‱ HBase HMaster is a lightweight process that assigns regions to region servers in the Hadoop cluster for load balancing. ‱ Manages and Monitors the Hadoop Cluster ‱ Performs Administration (Interface for creating, updating and deleting tables.) ‱ Controlling the failover ‱ DDL operations are handled by the HMaster ‱ Whenever a client wants to change the schema and change any of the metadata operations, HMaster is responsible for all these operation Architecture-HMaster
  • 33. ‱ These are the worker nodes which handle read, write, update, and delete requests from clients ‱ Region Server process, runs on every node in the Hadoop cluster ‱ Block Cache – This is the read cache. Most frequently read data is stored in the read cache and whenever the block cache is full, recently used data is evicted. ‱ MemStore- This is the write cache and stores new data that is not yet written to the disk. Every column family in a region has a MemStore. ‱ Write Ahead Log (WAL) is a file that stores new data that is not persisted to permanent storage. ‱ HFile is the actual storage file that stores the rows as sorted key values on a disk Architecture–Regionserver
  • 39. CAPtheory inHBase ‱ HBase supports Consistency and Partition tolerance. ‱ IT compromises the Availability factor ‱ Partition tolerance ‱ HBase runs on top of Hadoop distribution ‱ All the HBase data are stored in HFDS ‱ Hadoop is designed to have fault tolerance and therefore, HBase inherit the partition tolerance capability.
  • 40. CAPtheory inHBasecontd. ‱ Consistency ‱ Access to row data is atomic and includes any number of columns being read or written to ‱ The atomic access is a factor to this architecture being strictly consistent, as each concurrent reader and writer can make safe assumptions about the state of a row ‱ When data is updated it is first written to a commit log, called a write-ahead log (WAL) in Hbase ‱ Then stored in the (sorted by RowId) in-memory memstore ‱ Once the data in memory has exceeded a given maximum value, it is flushed as an HFile to disk ‱ After the flush, the commit logs can be discarded up to the last unflushed modification
  • 41. CAPtheory inHBasecontd. ‱ Availability ‱ HBase compromises the availability factor ‱ But, Cloudera Enterprise 5.9.x and Hortonworks Data Platform 2.2 implements high available feature in HBase ‱ They provides a feature called region replication to achieve high availability for reads