SlideShare a Scribd company logo
© 2014 Center for Social Media Cloud Computing
© 2014 Center for Social Media Cloud Computing
Contents
HBase
Hive
Hive+HBase Motivation
Integration
StorageHandler
Schema/Type Mapping
Data Flows
Use Cases
I.
II.
III.
IV.
V.
VI.
VII
VIII
© 2014 Center for Social Media Cloud Computing
HBase
 Apache HBase in a few words:
“HBase is an open-source, distributed, column-oriented,
versioned NoSQL database modeled after Google's Bigtable”
 Used for:
– Powering websites/products, such as StumbleUpon and
Facebook’s Messages
– Storing data that’s used as a sink or a source to analytical
jobs (usually MapReduce)
 Main features:
– Horizontal scalability
– Machine failure tolerance
– Row-level atomic operations including compare-and-swap-
ops like incrementing counters
– Augmented key-value schemas, the user can group columns
into families which are configured independently
– Multiple clients like its native Java library, Thrift, and REST
© 2014 Center for Social Media Cloud Computing
Apache HBase Architecture
© 2014 Center for Social Media Cloud Computing
Hive
 Apache Hive in a few words:
“A data warehouse infrastructure built on top of Apache Hadoop”
 Used for:
– Ad-hoc querying and analyzing large data sets without having
to learn MapReduce
 Main features:
– SQL-like query language called HiveQL
– Built-in user defined functions (UDFs) to manipulate dates,
strings, and other data-mining tools
– Plug-in capabilities for custom mappers, reducers, and UDFs
– Support for different storage types such as plain text, RCFiles, HBase,
and others
– Multiple clients like a shell, JDBC, Thrift
© 2014 Center for Social Media Cloud Computing
Apache Hive Architecture
© 2014 Center for Social Media Cloud Computing
Hive+HBase Motivation
 Hive and HBase has different characteristics
High latency Low latency
Structured vs. Unstructured
Analysts Programmers
 Hive data warehouses on Hadoop are high latency
- Long ETL times
- Accesss to real time data
 Analyzing HBase data with MapReduce requires custom coding
 Hive and SQL are already known by many analysts
© 2014 Center for Social Media Cloud Computing
Integration
 Reasons to use Hive on HBase:
– A lot of data sitting in HBase due to its usage in a real-time
environment, but never used for analysis
– Give access to data in HBase usually only queried through
MapReduce to people that don’t code (business analysts)
– When needing a more flexible storage solution, so that rows
can be updated live by either a Hive job or an application and can
be seen immediately to the other
 Reasons not to do it:
– Run SQL queries on HBase to answer live user requests (it’s
still a MR job)
– Hoping to see interoperability with other SQL analytics systems
© 2014 Center for Social Media Cloud Computing
Integration
 How it works:
– Hive can use tables that already exist in HBase or manage its own
ones, but they still all reside in the same HBase instance
Hive table definitions
Points to some column
Points to other columns,
different names
HBase
© 2014 Center for Social Media Cloud Computing
Integration
 How it works:
– Columns are mapped however you want, changing names and giving types
Hive table definitions Hbase table
name STRING
age INT
siblings MAP<string,
string>
d:fullname
d:age
d:address
f:
© 2014 Center for Social Media Cloud Computing
Integration
 Drawbacks (that can be fixed with brain juice):
– Binary keys and values (like integers represented on 4 bytes) aren’t supported
since Hive prefers string representations, HIVE-1634
– Compound row keys aren’t supported, there’s no way of using multiple parts
of a key as different “fields”
– This means that concatenated binary row keys are completely unusable,
which is what people often use for HBase
– Filters are done at Hive level instead of being pushed to the region servers
– Partitions aren’t supported
© 2014 Center for Social Media Cloud Computing
Apache Hive+HBase Architecture
© 2014 Center for Social Media Cloud Computing
Example: Hive+HBase (HBase table)
hbase(main):001:0> create 'short_urls', {NAME =>'u'}, {NAME=>'s'}
hbase(main):014:0> scan 'short_urls‘
ROW COLUMN+CELL
bit.ly/aaaa column=s:hits, value=100
bit.ly/aaaa column=u:url,
value=hbase.apache.org/
bit.ly/abcd column=s:hits, value=123
bit.ly/abcd column=u:url,
value=example.com/foo
© 2014 Center for Social Media Cloud Computing
Example: Hive+HBase (Hive table)
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int
)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits")
TBLPROPERTIES
("hbase.table.name" = ”short_urls");
© 2014 Center for Social Media Cloud Computing
Storage Handler
 Hive defines HiveStorageHandler class for different storage
backends: HBase/ Cassandra / MongoDB/ etc
 Storage Handler has hooks for
– Getting input / output formats
– Meta data operations hook: CREATE TABLE, DROP TABLE, etc
 Storage Handler is a table level concept
– Does not support Hive partitions, and buckets
© 2014 Center for Social Media Cloud Computing
Schema Mapping
 Hive table + columns + column types <=> HBase table + column
families (+ column qualifiers)
 Every field in Hive table is mapped in order to either:
– The table key (using :key as selector)
– A column family (cf:) -> MAP fields in Hive
– A column (cf:cq)
 Hive table does not need to include all columns in HBase
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int,
props, map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ": key, u:url, s:hits, p:")
© 2014 Center for Social Media Cloud Computing
Type Mapping
 Recently added to Hive (0.9.0)
 Previously all types were being converted to strings in HBase
 Hive has:
– Primitive types: INT, STRING, BINARY, DATE, etc
– ARRAY<Type>
– MAP<PrimitiveType, Type>
– STRUCT<a:INT, b:STRING, c:STRING>
 HBase does not have types
– Bytes.toBytes()
© 2014 Center for Social Media Cloud Computing
Type Mapping
 Table level property
"hbase.table.default.storage.type” = “binary”
 Type mapping can be given per column after #
– Any prefix of “binary” , eg u:url#b
– Any prefix of “string” , eg u:url#s
– The dash char “-” , eg u:url#-
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int,
props, map<string,string>
)
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key#b, u:url#b, s:hits#b, p:#s")
© 2014 Center for Social Media Cloud Computing
Type Mapping
 If the type is not a primitive or Map, it is converted to a JSON
string and serialized
 Still a few rough edges for schema and type mapping:
– No Hive BINARY support in HBase mapping
– No mapping of HBase timestamp (can only provide put
timestamp)
– No arbitrary mapping of Structs / Arrays into HBase schema
© 2014 Center for Social Media Cloud Computing
Data Flows
 Data is being generated all over the place:
– Apache logs
– Application logs
– MySQL clusters
– HBase clusters
© 2014 Center for Social Media Cloud Computing
Data Flows
 Moving application log files
© 2014 Center for Social Media Cloud Computing
Data Flows
 Moving MySQL data
© 2014 Center for Social Media Cloud Computing
Data Flows
 Moving HBase data
© 2014 Center for Social Media Cloud Computing
Use Cases
 Front-end engineers
– They need some statistics regarding their latest product
 Research engineers
– Ad-hoc queries on user data to validate some assumptions
– Generating statistics about recommendation quality
 Business analysts
– Statistics on growth and activity
– Effectiveness of advertiser campaigns
– Users’ behavior VS past activities to determine, for example, why
certain groups react better to email communications
– Ad-hoc queries on stumbling behaviors of slices of the user base
© 2014 Center for Social Media Cloud Computing
Use Cases
 Using a simple table in HBase
CREATE EXTERNAL TABLE blocked_users(
userid INT,
blockee INT,
blocker INT,
created BIGINT)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES
("hbase.columns.mapping" =":key,f:blockee,f:blocker,f:created")
TBLPROPERTIES("hbase.table.name" = "m2h_repl-userdb.stumble.blocked_users");
HBase is a special case here, it has a unique row key map with :key
Not all the columns in the table need to be mapped
© 2014 Center for Social Media Cloud Computing
Use Cases
 Using a complicated table in HBase
CREATE EXTERNAL TABLE ratings_hbase(
userid INT,
created BIGINT,
urlid INT,
rating INT,
topic INT,
modified BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key#b@0,:key#b@1,:key#b@2,
default:rating#b,default:topic#b,default:modified#b")
TBLPROPERTIES("hbase.table.name" = "ratings_by_userid");
#b means binary, @ means position in composite key (SU-specific hack)
© 2014 Center for Social Media Cloud Computing
Wrapping up
 Hive is a good complement to HBase for ad-hoc querying capabilities
without having to write a new MR job each time.
(All you need to know is SQL)
 Even though it enables relational queries, it is not meant for live systems.
(Not a MySQL replacement)
 The Hive/HBase integration is functional but still lacks some features to c
all it ready.
(Unless you want to get your hands dirty)
© 2014 Center for Social Media Cloud Computing
Thank you

More Related Content

PPTX
Hive vs Hbase, a Friendly Competition
PPTX
Intro to Hadoop
PPTX
Brief Introduction about Hadoop and Core Services.
PPTX
Hadoop Innovation Summit 2014
PPTX
SDMA-FDMA-TDMA-fixed TDM
PPTX
Big Data on the Microsoft Platform
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
PDF
Hadoop and Hive at Orbitz, Hadoop World 2010
Hive vs Hbase, a Friendly Competition
Intro to Hadoop
Brief Introduction about Hadoop and Core Services.
Hadoop Innovation Summit 2014
SDMA-FDMA-TDMA-fixed TDM
Big Data on the Microsoft Platform
Overview of Big data, Hadoop and Microsoft BI - version1
Hadoop and Hive at Orbitz, Hadoop World 2010

What's hot (11)

PPTX
Oncrawl elasticsearch meetup france #12
PPTX
Apache Hadoop Hive
PPTX
Schema-on-Read vs Schema-on-Write
PPTX
Tableau and hadoop
PPTX
SQL on Hadoop
PDF
SAP HORTONWORKS
PPTX
Impala Unlocks Interactive BI on Hadoop
PDF
Realtime Analytics with Hadoop and HBase
PDF
Big SQL Competitive Summary - Vendor Landscape
PDF
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
PPTX
Big data concepts
Oncrawl elasticsearch meetup france #12
Apache Hadoop Hive
Schema-on-Read vs Schema-on-Write
Tableau and hadoop
SQL on Hadoop
SAP HORTONWORKS
Impala Unlocks Interactive BI on Hadoop
Realtime Analytics with Hadoop and HBase
Big SQL Competitive Summary - Vendor Landscape
Using Oracle Big Data SQL 3.0 to add Hadoop & NoSQL to your Oracle Data Wareh...
Big data concepts
Ad

Similar to Hive and Hbase inegration (20)

PPT
HBase and Hive at StumbleUpon Presentation.ppt
PDF
HBaseCon 2013: Integration of Apache Hive and HBase
PDF
Mar 2012 HUG: Hive with HBase
PPTX
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
PPTX
HIVE-NEED, CHARACTERISTICS, OPTIMIZATION
PPTX
hive architecture and hive components in detail
PDF
Techincal Talk Hbase-Ditributed,no-sql database
PPTX
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
PPTX
Unit 5-apache hive
PPTX
Big Data and NoSQL for Database and BI Pros
PDF
Nyc hadoop meetup introduction to h base
PDF
Conhecendo o Apache HBase
PPTX
Hive and querying data
PPTX
Apache Hive
PDF
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
PPTX
Performance Hive+Tez 2
PPTX
Unit II Hadoop Ecosystem_Updated.pptx
PPTX
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
KEY
HBase and Hadoop at Urban Airship
ODP
Apache hive1
HBase and Hive at StumbleUpon Presentation.ppt
HBaseCon 2013: Integration of Apache Hive and HBase
Mar 2012 HUG: Hive with HBase
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
HIVE-NEED, CHARACTERISTICS, OPTIMIZATION
hive architecture and hive components in detail
Techincal Talk Hbase-Ditributed,no-sql database
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Unit 5-apache hive
Big Data and NoSQL for Database and BI Pros
Nyc hadoop meetup introduction to h base
Conhecendo o Apache HBase
Hive and querying data
Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Performance Hive+Tez 2
Unit II Hadoop Ecosystem_Updated.pptx
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
HBase and Hadoop at Urban Airship
Apache hive1
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Hybrid model detection and classification of lung cancer
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Encapsulation theory and applications.pdf
PDF
August Patch Tuesday
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Mushroom cultivation and it's methods.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
project resource management chapter-09.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
A Presentation on Touch Screen Technology
PDF
DP Operators-handbook-extract for the Mautical Institute
Approach and Philosophy of On baking technology
Hybrid model detection and classification of lung cancer
Web App vs Mobile App What Should You Build First.pdf
WOOl fibre morphology and structure.pdf for textiles
OMC Textile Division Presentation 2021.pptx
Encapsulation theory and applications.pdf
August Patch Tuesday
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Mushroom cultivation and it's methods.pdf
TLE Review Electricity (Electricity).pptx
Hindi spoken digit analysis for native and non-native speakers
Accuracy of neural networks in brain wave diagnosis of schizophrenia
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
project resource management chapter-09.pdf
A Presentation on Artificial Intelligence
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
A Presentation on Touch Screen Technology
DP Operators-handbook-extract for the Mautical Institute

Hive and Hbase inegration

  • 1. © 2014 Center for Social Media Cloud Computing
  • 2. © 2014 Center for Social Media Cloud Computing Contents HBase Hive Hive+HBase Motivation Integration StorageHandler Schema/Type Mapping Data Flows Use Cases I. II. III. IV. V. VI. VII VIII
  • 3. © 2014 Center for Social Media Cloud Computing HBase  Apache HBase in a few words: “HBase is an open-source, distributed, column-oriented, versioned NoSQL database modeled after Google's Bigtable”  Used for: – Powering websites/products, such as StumbleUpon and Facebook’s Messages – Storing data that’s used as a sink or a source to analytical jobs (usually MapReduce)  Main features: – Horizontal scalability – Machine failure tolerance – Row-level atomic operations including compare-and-swap- ops like incrementing counters – Augmented key-value schemas, the user can group columns into families which are configured independently – Multiple clients like its native Java library, Thrift, and REST
  • 4. © 2014 Center for Social Media Cloud Computing Apache HBase Architecture
  • 5. © 2014 Center for Social Media Cloud Computing Hive  Apache Hive in a few words: “A data warehouse infrastructure built on top of Apache Hadoop”  Used for: – Ad-hoc querying and analyzing large data sets without having to learn MapReduce  Main features: – SQL-like query language called HiveQL – Built-in user defined functions (UDFs) to manipulate dates, strings, and other data-mining tools – Plug-in capabilities for custom mappers, reducers, and UDFs – Support for different storage types such as plain text, RCFiles, HBase, and others – Multiple clients like a shell, JDBC, Thrift
  • 6. © 2014 Center for Social Media Cloud Computing Apache Hive Architecture
  • 7. © 2014 Center for Social Media Cloud Computing Hive+HBase Motivation  Hive and HBase has different characteristics High latency Low latency Structured vs. Unstructured Analysts Programmers  Hive data warehouses on Hadoop are high latency - Long ETL times - Accesss to real time data  Analyzing HBase data with MapReduce requires custom coding  Hive and SQL are already known by many analysts
  • 8. © 2014 Center for Social Media Cloud Computing Integration  Reasons to use Hive on HBase: – A lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis – Give access to data in HBase usually only queried through MapReduce to people that don’t code (business analysts) – When needing a more flexible storage solution, so that rows can be updated live by either a Hive job or an application and can be seen immediately to the other  Reasons not to do it: – Run SQL queries on HBase to answer live user requests (it’s still a MR job) – Hoping to see interoperability with other SQL analytics systems
  • 9. © 2014 Center for Social Media Cloud Computing Integration  How it works: – Hive can use tables that already exist in HBase or manage its own ones, but they still all reside in the same HBase instance Hive table definitions Points to some column Points to other columns, different names HBase
  • 10. © 2014 Center for Social Media Cloud Computing Integration  How it works: – Columns are mapped however you want, changing names and giving types Hive table definitions Hbase table name STRING age INT siblings MAP<string, string> d:fullname d:age d:address f:
  • 11. © 2014 Center for Social Media Cloud Computing Integration  Drawbacks (that can be fixed with brain juice): – Binary keys and values (like integers represented on 4 bytes) aren’t supported since Hive prefers string representations, HIVE-1634 – Compound row keys aren’t supported, there’s no way of using multiple parts of a key as different “fields” – This means that concatenated binary row keys are completely unusable, which is what people often use for HBase – Filters are done at Hive level instead of being pushed to the region servers – Partitions aren’t supported
  • 12. © 2014 Center for Social Media Cloud Computing Apache Hive+HBase Architecture
  • 13. © 2014 Center for Social Media Cloud Computing Example: Hive+HBase (HBase table) hbase(main):001:0> create 'short_urls', {NAME =>'u'}, {NAME=>'s'} hbase(main):014:0> scan 'short_urls‘ ROW COLUMN+CELL bit.ly/aaaa column=s:hits, value=100 bit.ly/aaaa column=u:url, value=hbase.apache.org/ bit.ly/abcd column=s:hits, value=123 bit.ly/abcd column=u:url, value=example.com/foo
  • 14. © 2014 Center for Social Media Cloud Computing Example: Hive+HBase (Hive table) CREATE TABLE short_urls( short_url string, url string, hit_count int ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key, u:url, s:hits") TBLPROPERTIES ("hbase.table.name" = ”short_urls");
  • 15. © 2014 Center for Social Media Cloud Computing Storage Handler  Hive defines HiveStorageHandler class for different storage backends: HBase/ Cassandra / MongoDB/ etc  Storage Handler has hooks for – Getting input / output formats – Meta data operations hook: CREATE TABLE, DROP TABLE, etc  Storage Handler is a table level concept – Does not support Hive partitions, and buckets
  • 16. © 2014 Center for Social Media Cloud Computing Schema Mapping  Hive table + columns + column types <=> HBase table + column families (+ column qualifiers)  Every field in Hive table is mapped in order to either: – The table key (using :key as selector) – A column family (cf:) -> MAP fields in Hive – A column (cf:cq)  Hive table does not need to include all columns in HBase CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ": key, u:url, s:hits, p:")
  • 17. © 2014 Center for Social Media Cloud Computing Type Mapping  Recently added to Hive (0.9.0)  Previously all types were being converted to strings in HBase  Hive has: – Primitive types: INT, STRING, BINARY, DATE, etc – ARRAY<Type> – MAP<PrimitiveType, Type> – STRUCT<a:INT, b:STRING, c:STRING>  HBase does not have types – Bytes.toBytes()
  • 18. © 2014 Center for Social Media Cloud Computing Type Mapping  Table level property "hbase.table.default.storage.type” = “binary”  Type mapping can be given per column after # – Any prefix of “binary” , eg u:url#b – Any prefix of “string” , eg u:url#s – The dash char “-” , eg u:url#- CREATE TABLE short_urls( short_url string, url string, hit_count int, props, map<string,string> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b, u:url#b, s:hits#b, p:#s")
  • 19. © 2014 Center for Social Media Cloud Computing Type Mapping  If the type is not a primitive or Map, it is converted to a JSON string and serialized  Still a few rough edges for schema and type mapping: – No Hive BINARY support in HBase mapping – No mapping of HBase timestamp (can only provide put timestamp) – No arbitrary mapping of Structs / Arrays into HBase schema
  • 20. © 2014 Center for Social Media Cloud Computing Data Flows  Data is being generated all over the place: – Apache logs – Application logs – MySQL clusters – HBase clusters
  • 21. © 2014 Center for Social Media Cloud Computing Data Flows  Moving application log files
  • 22. © 2014 Center for Social Media Cloud Computing Data Flows  Moving MySQL data
  • 23. © 2014 Center for Social Media Cloud Computing Data Flows  Moving HBase data
  • 24. © 2014 Center for Social Media Cloud Computing Use Cases  Front-end engineers – They need some statistics regarding their latest product  Research engineers – Ad-hoc queries on user data to validate some assumptions – Generating statistics about recommendation quality  Business analysts – Statistics on growth and activity – Effectiveness of advertiser campaigns – Users’ behavior VS past activities to determine, for example, why certain groups react better to email communications – Ad-hoc queries on stumbling behaviors of slices of the user base
  • 25. © 2014 Center for Social Media Cloud Computing Use Cases  Using a simple table in HBase CREATE EXTERNAL TABLE blocked_users( userid INT, blockee INT, blocker INT, created BIGINT) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" =":key,f:blockee,f:blocker,f:created") TBLPROPERTIES("hbase.table.name" = "m2h_repl-userdb.stumble.blocked_users"); HBase is a special case here, it has a unique row key map with :key Not all the columns in the table need to be mapped
  • 26. © 2014 Center for Social Media Cloud Computing Use Cases  Using a complicated table in HBase CREATE EXTERNAL TABLE ratings_hbase( userid INT, created BIGINT, urlid INT, rating INT, topic INT, modified BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b@0,:key#b@1,:key#b@2, default:rating#b,default:topic#b,default:modified#b") TBLPROPERTIES("hbase.table.name" = "ratings_by_userid"); #b means binary, @ means position in composite key (SU-specific hack)
  • 27. © 2014 Center for Social Media Cloud Computing Wrapping up  Hive is a good complement to HBase for ad-hoc querying capabilities without having to write a new MR job each time. (All you need to know is SQL)  Even though it enables relational queries, it is not meant for live systems. (Not a MySQL replacement)  The Hive/HBase integration is functional but still lacks some features to c all it ready. (Unless you want to get your hands dirty)
  • 28. © 2014 Center for Social Media Cloud Computing Thank you

Editor's Notes

  • #5: Apache ZooKeeper maintains an open-source server which enables highly reliable distributed coordination.