SlideShare a Scribd company logo
1 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Apache	Hive	2
~Interactive	SQL	for	Big	Data~
Yifeng	Jiang
Solutions	Engineering	Lead
August	5,	2017
2 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
About	Me
à 蒋 燚峰 (Yifeng Jiang)
à Solutions	Engineering	Lead,	Hortonworks
– Hadooper since	2009
– HBase book	author
– Software	engineer,	cloud,	PaaS,	DevOps
à Jogger,	hiker
à Twitter:	@uprush
3 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
DATA	AT	REST
DATA	IN	
MOTION
ACTIONABLE
INTELLIGENCE
Modern	Data	Applications
PERISHABLE	
INSIGHTS
HISTORICAL	
INSIGHTS
INTERNET
OF
ANYTHING
Hortonworks	
DataFlow
Hortonworks	
Data	Platform
Hortonworks	Delivers
Connected Data	Platforms
4 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hortonworks	Data	Platform	(HDP)
Powered	by	Apache	Hadoop,	Spark
Hive:	SQL	for	Big	Data
5 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive’s	Unique	Advantages
Why	Hive:
• The	data	warehouse	on	Hadoop
• Per-User	dynamic	row	and	column	security.
• Replication	and	DR	for	critical	workloads.
• Compatible	with	every	major	BI	Tool.
• Proven	at	300+	PB	Scale.
• Significant	innovation	in	Hive	2
6 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What's	new	in	HDP	2.6	for	Hive	2	and	Druid
7 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
HDP	2.6	Continues	Strong	Momentum	for	Hive
à At	a	High	Level:
– 1200+	features,	improvements	and	bug	fixes	
in	Hive	since	HDP	2.5.
– 400+	of	these	from	outside	of	Hortonworks.
à Major	Improvements:
– Hive	LLAP	Now	GA
– ACID	MERGE
– SQL:	All	99	TPC-DS	out-of-the-box	with	only	
trivial	rewrites
– Tech	Preview:	Hive	OLAP	Indexes	powered	
by	Druid
820
413
From	Hortonworks
From	Community
Hive	2	in	HDP	2.6	Improvements
Hive	LLAP	GA+
SQL	MERGE+
All	TPC-DS	Queries+
8 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	LLAP	– MPP	Performance	at	Hadoop	Scale
Deep	
Storage
YARN	Cluster
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
LLAP	Daemon
Query	
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2	
(Query	
Endpoint)
ODBC	/
JDBC
SQL
Queries In-Memory	Cache
(Shared	Across	All	Users)
HDFS	and	
Compatible
S3 WASB Isilon
9 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Enable	Hive	LLAP	in	Ambari
Enabling	LLAP	is	
one	click	in	
Ambari
10 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	LLAP	in	HDP	2.6:	Stable	Performance	with	High	Concurrency
4x	Queries,
2.8x
Runtime
Difference
5x	Queries,
4.6x
Runtime
Difference
Mark
Concurrent
Queries
Average
Runtime
5 7.76s
25 36.24s
100 102.89s
11 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	LLAP	RAM	&	SSD	Cache
à Use	the	combination	of	
DRAM	and	SSD	to	
dynamically	cache	data.
à Cache	4x	more	data	than	
using	DRAM	alone.
à Deliver	fast	analytics	on	
larger	datasets	with	higher	
concurrency.
à Especially	good	for	cloud	
environment.
Highlights
DRAM	
Cache
SSD	Cache
Deep	
Storage
Deep	
Storage
Deep	
Storage
12 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
ACID	MERGE
à ACID	MERGE	in	Hive,	based	on	
ANSI	standard	SQL.
à Efficiently	perform	record-level	
inserts,	updates	and	deletes	
within	Hive	tables.
à Delivers	real	Data	Management	
in	Hadoop,	massively	
simplifying	updates,	deletes	
and	change	data	capture.
Highlights
13 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	View	2.0
à Create	and	manage	
database	and	tables.
à View	&	compute	table	
and	column	stats	with	one	
click
à View	query	explain	plans	
with	costs.
Highlights
14 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Tez	UI
à Powerful	new	search	capabilities	to	help	you	find	queries	that	need	to	be	
optimized.
à New	Total	Timeline	View	shows	exactly	where	query	time	is	spent	to	quickly	
pinpoint	Hive	query	bottlenecks.
Highlights
15 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
What	Is	Druid?
Druid is a distributed, real-time, column-oriented datastore
designed to quickly ingest and index large amounts of data
and make it available for real-time query.
Features:
• Streaming	Data	Ingestion
• Real-Time	Query
• Merge	Historical	and	Real-Time	Data
• Approximate	Computation
16 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
(Tech	Preview)	Hive	+	Druid	=	Insight	When	You	Need	It
OLAP	Cubes SQL	Tables
Streaming	Data Historical	Data
Unified	SQL	Layer
Pre-Aggregate ACID	MERGE
Easily	ingest	event
data	into	OLAP	cubes
Keep	data	up-to-date
with	Hive	MERGE
Build	OLAP	Cubes	from	Hive
Archive	data	to	Hive	for	history
Run	OLAP	queries	in	real-time
or	Deep	Analytics	over	all	history
Deep	AnalyticsReal-Time	Query
17 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Preview:	OLAP	Analytics	in	Milliseconds	with	Hive	over	Druid
.0
.5
1.0
1.5
2.0
2.5
3.0
Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3
Runtime	(s)
Star	Schema	Benchmark	1TB	Scale	with	Hive	over	10	Druid	Nodes
18 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Create Druid Cube from Hive
19 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Create Druid Cube from Hive
20 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Query Druid Cube from Hive
21 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Preview	OLAP	Analytics	in	Milliseconds	with	Hive	over	Druid
22 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Tech	Preview:	Simple	Druid	Management	with	Ambari
23 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive	2	– Use	Cases
24 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Typical	Legacy	EDW	Implementations
25 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Hive2	EDW	Optimization	Use	Cases
ETL/ELT
DATA
MART
DATA
LANDING	&
DEEP
ARCHIVE
CUBE
MART
END	USER
APPLICATIONS
APPLICATIONS
APPLICATIONS
END	USERS
AND	APPS
EDW	OPTIMIZATIONUse	Case HDP	Advantage
Fast	BI	on	Hadoop
Hive	LLAP	in-memory	architecture	
makes	Fast	BI	a	reality	using	Hadoop-
native technologies.
ETL	Offload
Save 50-90%	of	EDW	CPU	cycles	by	
offloading	ETL	to	the	scale-out	HDP	
platform.
Active	Archive
With	cost	per	terabyte	on-par	with	
tape,	HDP	lets	you	store	and	analyze
years	of	data	rather	than	months.
26 ©	Hortonworks	Inc.	2011	– 2016.	All	Rights	Reserved
Thank	You

More Related Content

PDF
HDF 3.0 IoT Platform for Everyone
PDF
Hortonworks Data Cloud for AWS 1.11 Updates
PDF
Introduction to Streaming Analytics Manager
PDF
Introduction to Hortonworks Data Cloud for AWS
PPTX
Log Analytics Optimization
PDF
Spark Security
PPTX
The Elephant in the Clouds
PPTX
How to Use Apache Zeppelin with HWX HDB
HDF 3.0 IoT Platform for Everyone
Hortonworks Data Cloud for AWS 1.11 Updates
Introduction to Streaming Analytics Manager
Introduction to Hortonworks Data Cloud for AWS
Log Analytics Optimization
Spark Security
The Elephant in the Clouds
How to Use Apache Zeppelin with HWX HDB

What's hot (20)

PPTX
Apache Zeppelin and Spark for Enterprise Data Science
PDF
Running Zeppelin in Enterprise
PPTX
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
PDF
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
PPTX
Webinar Series Part 5 New Features of HDF 5
PDF
Introduction to Hadoop
PPTX
Hive present-and-feature-shanghai
PDF
Dataflow with Apache NiFi - Crash Course - HS16SJ
PDF
An Apache Hive Based Data Warehouse
PPTX
Building a Smarter Home with Apache NiFi and Spark
PDF
Apache NiFi Meetup - Princeton NJ 2016
PPTX
Apache Hadoop 0.23
PDF
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
PDF
Dataflow with Apache NiFi
PDF
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
PPTX
Mission to NARs with Apache NiFi
PPTX
What the #$* is a Business Catalog and why you need it
PPTX
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
PPTX
The Avant-garde of Apache NiFi
PPTX
An Overview on Optimization in Apache Hive: Past, Present Future
Apache Zeppelin and Spark for Enterprise Data Science
Running Zeppelin in Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enterprise Data Science at Scale @ Princeton, NJ 14-Nov-2017
Webinar Series Part 5 New Features of HDF 5
Introduction to Hadoop
Hive present-and-feature-shanghai
Dataflow with Apache NiFi - Crash Course - HS16SJ
An Apache Hive Based Data Warehouse
Building a Smarter Home with Apache NiFi and Spark
Apache NiFi Meetup - Princeton NJ 2016
Apache Hadoop 0.23
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Dataflow with Apache NiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Mission to NARs with Apache NiFi
What the #$* is a Business Catalog and why you need it
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
The Avant-garde of Apache NiFi
An Overview on Optimization in Apache Hive: Past, Present Future
Ad

Similar to Hive2 Introduction -- Interactive SQL for Big Data (20)

PDF
introduction-to-apache-kafka
PPTX
Log Analytics Optimization
PPTX
SoCal BigData Day
PDF
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
PDF
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
PPTX
Using Apache® NiFi to Empower Self-Organising Teams
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
PPTX
Enabling Modern Application Architecture using Data.gov open government data
PPTX
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PPTX
Big data spain keynote nov 2016
PDF
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
PDF
Hortonworks and Red Hat Webinar - Part 2
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Enterprise data science at scale
PDF
Discover.hdp2.2.h base.final[2]
PDF
Hadoop in adtech
PPTX
Hortonworks Hadoop summit 2011 keynote - eric14
PDF
Meet HBase 2.0 and Phoenix 5.0
introduction-to-apache-kafka
Log Analytics Optimization
SoCal BigData Day
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Spark HBase Connector: Feature Rich and Efficient Access to HBase Through Spa...
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Using Apache® NiFi to Empower Self-Organising Teams
Hortonworks - What's Possible with a Modern Data Architecture?
Enabling Modern Application Architecture using Data.gov open government data
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Big data spain keynote nov 2016
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Hortonworks and Red Hat Webinar - Part 2
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Enterprise data science at scale
Discover.hdp2.2.h base.final[2]
Hadoop in adtech
Hortonworks Hadoop summit 2011 keynote - eric14
Meet HBase 2.0 and Phoenix 5.0
Ad

More from Yifeng Jiang (16)

PDF
Hive spark-s3acommitter-hbase-nfs
PDF
Real-time Analytics in Financial
PPTX
sparksql-hive-bench-by-nec-hwx-at-hcj16
PDF
Nifi workshop
PDF
Sub-second-sql-on-hadoop-at-scale
PDF
Yifeng hadoop-present-public
PDF
Hive-sub-second-sql-on-hadoop-public
PDF
Yifeng spark-final-public
PDF
Kinesis vs-kafka-and-kafka-deep-dive
PDF
Hadoop Present - Open Enterprise Hadoop
PDF
Apache Hiveの今とこれから
PDF
HDFS Deep Dive
PDF
Hadoop Trends & Hadoop on EC2
PDF
Apache Ambari Overview -- Hadoop for Everyone
PDF
HDP Security Overview
PDF
Data Science on Hadoop
Hive spark-s3acommitter-hbase-nfs
Real-time Analytics in Financial
sparksql-hive-bench-by-nec-hwx-at-hcj16
Nifi workshop
Sub-second-sql-on-hadoop-at-scale
Yifeng hadoop-present-public
Hive-sub-second-sql-on-hadoop-public
Yifeng spark-final-public
Kinesis vs-kafka-and-kafka-deep-dive
Hadoop Present - Open Enterprise Hadoop
Apache Hiveの今とこれから
HDFS Deep Dive
Hadoop Trends & Hadoop on EC2
Apache Ambari Overview -- Hadoop for Everyone
HDP Security Overview
Data Science on Hadoop

Recently uploaded (20)

PPT
Introduction Database Management System for Course Database
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Nekopoi APK 2025 free lastest update
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
medical staffing services at VALiNTRY
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
System and Network Administration Chapter 2
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
top salesforce developer skills in 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Understanding Forklifts - TECH EHS Solution
Introduction Database Management System for Course Database
Softaken Excel to vCard Converter Software.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Nekopoi APK 2025 free lastest update
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Navsoft: AI-Powered Business Solutions & Custom Software Development
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PTS Company Brochure 2025 (1).pdf.......
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
medical staffing services at VALiNTRY
Design an Analysis of Algorithms I-SECS-1021-03
System and Network Administration Chapter 2
ManageIQ - Sprint 268 Review - Slide Deck
top salesforce developer skills in 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Operating system designcfffgfgggggggvggggggggg
Understanding Forklifts - TECH EHS Solution

Hive2 Introduction -- Interactive SQL for Big Data