Apache Hive and commands PPT Presentation

Apache Hive
Presentation by,
Gyathri,
Dibakaran,
Dhanush,
Deveraj

Hive
• Data warehousing package built on top of
hadoop.
• Used for data analysis on structured data.
• Targeted towards users comfortable with SQL.
• It is similar to SQL and called HiveQL.
• Abstracts complexity of hadoop.
• No Java is required.
• Developed by Facebook.

Features of Hive
How is it Different from SQL
•The major difference is that a Hive query
executes on a Hadoop infrastructure rather than
a traditional database.
•This allows Hive to handle huge data sets - data
sets so large that high-end, expensive, traditional
databases would fail.
•The internal execution of a Hive query is via a
series of automatically generated Map Reduce
jobs

Hive Modes
Tostart the hive shell, type hive and Enter.
• Hive in Local mode
No HDFS is required, All files run on local file
system.
hive> SET mapred.job.tracker=local
• Hive in MapReduce(hadoop) mode
hive> SET mapred.job.tracker=master:9001;

Components
• Thrift Client
It is possible to interact with hive by using any
programming language that usages Thrift server. For e.g.
Python
Ruby
• JDBC Driver
Hive provides a pure java JDBC driver for java application
to connect to hive , defined in the class
org.hadoop.hive.jdbc.HiveDriver
• ODBC Driver
An ODBC driver allows application that supports ODBC
protocol

Hive Program Structure
• The Hive Shell
 The shell is the primary way that we will interact with Hive, by issuing
commands in HiveQL.
 HiveQL is heavily influenced by MySQL, so if you are familiar with
MySQL, you should feel at home using Hive.
 The command must be terminated with a semicolon to tell Hive to
execute it.
 HiveQL is generally case insensitive.
 The Tab key will autocomplete Hive keywords and functions.
• Hive can run in non-interactive mode.
 Use -f option to run the commands in the specified file,
 hive -f script.hql
 For short scripts, you can use the -e option to specify the commands
inline, in which case the final semicolon is not required.
 hive -e 'SELECT * FROM dummy'

Hive Tables
A Hive table is logically made up of the data being stored in HDFS and the
associated metadata describing the layout of the data in the MySQL table.
• Managed Table
 When you create a table in Hive and load data into a managed table, it is moved into
Hive’s warehouse directory.
 CREATE TABLE managed_table (dummy STRING);
 LOAD DATAINPATH '/user/tom/data.txt' INTO table managed_table;
• External Table
 Alternatively, you may create an external table, which tells Hive to refer to the data that
is at an existing location outside the warehouse directory.
 The location of the external data is specified at table creation time:
 CREATE EXTERNAL TABLE external_table (dummy STRING)
 LOCATION '/user/tom/external_table';
 LOAD DATAINPATH '/user/tom/data.txt' INTO TABLE external_table;
• When you drop an external table, Hive will leave the data untouched and
only delete the metadata.
• Hive does not do any transformation while loading data into tables. Load
operations are currently pure copy/move operations that move data files
into locations corresponding to Hive tables.

Thank You
• Question?
• Feedback?

Apache Hive and commands PPT Presentation

More Related Content

Similar to Apache Hive and commands PPT Presentation (20)

Recently uploaded (20)

Apache Hive and commands PPT Presentation