Hive acid and_2.x new_features

Hive on ACID (GA) &
Hive 2.0 New Features
(non-GA)
(State of Hive in HDP 2.5)
Alberto Romero

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Hive Intro
 Improvements over time
 Hive LLAP (In-memory Cache Processing)
 Hive on ACID
 Demos:
– Hive LLAP
– Hive on ACID
 Giveaways
 Q&A

Hive: Single Tool for multiple SQL use cases
OLTP, ERP, CRM Systems
Clickstream
Sentiment, Web Data
Sensor. Machine Data
Geolocation
Interactive
Analytics
Batch Reports /
Deep Analytics
Hive - SQL
ETL / ELT

Apache Hive: Improvements over Time

Hive 0.10
Batch
Processing
Hive 0.13
Human
Interactive
(10 seconds)
Vectorized SQL Engine,
Tez Execution Engine,
ORC Columnar format
52x Average Query
Speedup
7.8 days to 9.3 hours
Hive 0.13
Human
Interactive
(10 seconds)
Cost Based Optimizer
Faster Map Joins
Hive 0.14
Human
Interactive
(5 seconds)
3x Average Query
Speedup
Hive 2.0
Sub-SecondLLAP In-memory Cache,
LLAP Resident Process,
New Metastore for Compile,
Vectorization Improvements
Stinger.Next Initiative
Significant Query
Speedup
Hive 1.2
Human
Interactive
(5 seconds)
Hive: Batch to Sub-Second

Hive: Scalable Modern Data Warehousing with HDP
Capabilities
Batch SQL OLAP / CubeInteractive SQL
Sub-Second
SQL
ACID / MERGE
Applications
• ETL
• Reporting
• Data Mining
• Deep Analytics
• Multidimensional
Analytics
• MDX Tools
• Excel
• Reporting
• BI Tools: Tableau,
Microstrategy,
Cognos
• Ad-Hoc
• Drill-Down
• BI Tools: Tableau,
Excel
• Continuous
Ingestion from
Operational DBMS
• Slowly Changing
Dimensions
Existing
Development
Future
Legend
CoreHive
Platform
Scale-Out Storage
Petabyte Scale
Processing
Core SQL Engine
Apache Tez: Scalable
Distributed Processing
Advanced Cost-Based
Optimizer
Connectivity
Advanced Security
JDBC / ODBC
Comprehensive
SQL:2011 Coverage

HDP 2.5 is a Major Milestone for Hive
 At a High Level:
– 2000+ features, improvements and bug fixes in
Hive since HDP 2.4.
– 600+ of these from outside of Hortonworks.
 Major Improvements:
– Preview: Hive LLAP: Persistent query servers
with intelligent in-memory caching.
– ACID GA: Hardened and proven at scale.
– Expanded SQL Compliance: More capable
integration with BI tools, Primary Key/Foreign
Key support
– Performance: Interactive query, 2x faster ETL.
– Security: Row / Column security extending to
views, Column level security for Spark.
– Operations: LLAP integration in Ambari, new
Grafana dashboards.
1391
642
From Hortonworks
From Community
Hive 2 Improvements
Interactive Query with Hive LLAP+
SQL ACID Fully Supported+
2x Faster ETL+

Hive: Real, Community-Driven Open Source
 This release of Hive incorporates improvements from individuals within these 43 organizations:
Aetna Hortonworks Microsoft Splunk
Agora SA Hotels.com Netflix Streamsets
Amazon Huawei NexR t4g
Baidu Information Control Company NTT Data Target
CellVision ING PA Consulting Treasure Data
Cerner InMobi Qubole Uber
Cloudera Intel Radius Intelligence WANdisco
Dell iQiYi.com Rocket Fuel Yahoo
EPAM Systems LG Samsung Yahoo Japan
Flipkart.com Linkedin science+computing ZTE
Grupa Allegro MapR Simba

Apache Hive: LLAP (Live Long And Process)

Hive 2 with LLAP: Why?
 Use of Hive is growing exponentially, and so are new requirements
 Disk->Memory is getting further away
– Cloud Storage isn’t co-located
– Disks are connected to the CPU via network
 Security landscape is changing
– Cells & Columns are the new security boundary, not files
– Safely masking columns needs a process boundary
 Concurrency, Performance & Scale are at conflict
– Concurrency at 100k queries/hour
– Latencies at 2-5 seconds/query
– Petabyte scale warehouses (with terabytes of “hot” data)

Hive 2 with LLAP: What is it?
 Hybrid model combining daemons and containers for fast, concurrent execution of
analytical workloads (e.g. Hive SQL queries)
– Concurrent queries without specialized YARN queue setup
– Multi-threaded execution of vectorized operator pipelines and facilitates JIT optimization
 Asynchronous IO and efficient in-memory caching
 Functions performed by the long-lived daemon
– Caching
– Pre-fetching
– Some query processing
– Access control
 Relational view of the data available thru the API
– High performance scans, execution code pushdown
– Centralized data security

Hive 2 with LLAP: What is it?
 Transparent to Hive users, BI tools, etc.
 Hive decides where query fragments run
(LLAP, Container, AM) based on
configuration, data size, format, etc.
 Each Query coordinated independently by a
Tez AM
 Number of concurrent queries throttled by
number of active AMs
 Hive Operators used for processing
 Tez Runtime used for data transfer
HiveServer2
Query/AM
Controller
Client(s)
YARN Cluster
AM1
llapd
llapd
Container AM1
Container AM1
llapd
Container AM2
AM2
AM3
llapd

Hive 2 with LLAP: Architecture Overview
Deep
Storage
HDFS
S3 + Other HDFS
Compatible Filesystems
YARN Cluster
LLAP Daemon
Query
Executors
In-Memory
Cache
LLAP Daemon
Query
Executors
In-Memory
Cache
LLAP Daemon
Query
Executors
In-Memory
Cache
LLAP Daemon
Query
Executors
In-Memory
Cache
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
HiveServer2
(Query
Endpoint)
ODBC /
JDBC SQL
Queries

Hive 2 with LLAP: Customer Case Study
Max
Max
Max
Avg
Avg Avg
0
1
2
3
4
5
6
7
8
D W M Y D W M Y D W M Y
Tez Phoenix LLAP
Executiontime,s
SELECT yyyymmdd,
sum(total_1),
sum(total_2),
...
from table
where yyyymmdd >= xxx
and yyyymmdd < xxx
and userid = xxx
group by userid, yyyymmdd;

LLAP
Queue
• LLAP daemon has a number of executors (think
containers) that execute work "fragments"
• Fragments are parts of one, or multiple parallel
workloads (e.g. Hive SQL queries)
• Work queue with pluggable priority
• Geared towards low latency queries over long-
running queries (by default)
• I/O is similar to containers – read/write to HDFS,
shuffle, other storages and formats
• Streaming output for data API
Executor
Q1 Map 1
Executor
External read
Executor
Q3 Reducer 3
Q1 Map 1
Q1 Map 1
Q3 Map 19
HDFS
Waiting for
shuffle inputs
HBase
Container
(shuffle input)
Spark
executor
Hive 2 with LLAP: Execution Overview

Executor• Optional: when executing inside LLAP
• All other formats use in-sync mode
• Asynchronous IO for Hive
• Wraps over InputFormat, reads through cache
• Supported with ORC
• Transparent, compressed in-memory cache
• Format-specific, extensible
• NVMe/NVDIMM caches
RecordReader
Fragment
Cache
IO thread
Plan & decode
Read, decompress
Metadata cache
Actual data (HDFS, S3, …)
What to read Data buffers
Splits Vectorized data
Indexes
Hive 2 with LLAP: IO Layer

Key Features: Spark Column Security with LLAP
 Fine-Grained Column Level Access Control for SparkSQL.
 Fully dynamic policies per user. Doesn’t rely on view-per-user.
 Use Standard Ranger policies and tools to control access and masking policies.
Flow:
1. SparkSQL gets data locations from
HiveServer and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query plan
based on Ranger security policy.
4. Spark reads data from LLAP with
security policies applied.
HiveServer2
Enforce Policies
Hive Metastore
Data Locations
View Definitions
LLAP
Data Read
Filter Pushdown
Ranger Server
Security Policies
Spark Client
1
2
4
3

Apache Hive: ACID
(Atomic, Consistent, Isolated, Durable)

ACID at High Level
 A new type of table that supports Insert/Update/Delete SQL operations
 Concept of ACID transaction
– Atomic, Consistent, Isolated, Durable
 Streaming Ingest API
– Write a continuous stream of events to Hive in micro batches with transactional semantics
 Data is not static (it changes daily, hourly…or even every second)
BUT…
– We need to have a consistent view…
– whilst allowing for concurrency

ACID Motivations/Goals
 Continuously adding new data to Hive in the past
– INSERT INTO Target as SELECT FROM Staging
– ALTER TABLE Target ADD PARTITION (dt=‘2016-06-30’)
• Lots of files – bad for performance
• Fewer files –users wait longer to see latest data
 Modifying existing data
– Analyzing log files – not that important. Sourcing data from an Operational Data Store – may be
really important.
– INSERT OVERWRITE TABLE Target SELECT * FROM Target WHERE …
• Concurrency
– Hope for the best (multiple updates)
– ZooKeeper lock manager S/X locks – restrictive
• Expensive to do repeatedly (write side)

Goals Summary
 Delete Old Records
– Remove records for compliance
 Update/Restate Table Dimensions
– Fix problems/update records after they are in the warehouse
 Long running analytics queries
– Run concurrently with update commands
 Streaming Data Ingest
– A continuous stream of data coming in
– Typically from Flume or Storm
 NOT OLTP!!!
– Support slowly changing tables
– Not for 100s of concurrent queries trying to update the same partition

User Point of View
 CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 8 BUCKETS STORED AS ORC
TBLPROPERTIES ('transactional'='true');
 Not all tables support transactional semantics
 Table must be bucketed – important for query performance
 Table cannot be sorted – ACID implementation requires its own sort order
 Currently requires ORC File but anything implementing format
– AcidInputFormat/AcidOutputFormat
 Snapshot Isolation
– Lock in the state of the DB as of the start of the query for the duration of the query
 autoCommit=true

Design – Storage Layer
 Storage layer enhanced to support MVCC architecture
– Multiple versions of each row
– Allows concurrent readers/writers
 HDFS – append only file system
– All update operations are written to a delta file first
– Files are combined on read and compaction
 Even if you could update a file in the middle
– The architecture of choice for analytics is columnar storage (ORC File)
– Compresses by column – difficult to update
 Random data access is prohibitively slow

Transaction Compactions
Read-
Optimized
ORCFile
Delta File
Merged Read-
Optimized
ORCFile
1. Original File
Task reads the latest ORCFile
Task
Read-
Optimized
ORCFile
Task Task
2. Edits Made
Task reads the ORCFile and merges the
delta file with the edits
3. Edits Merged
Task reads the updated
ORCFile
Hive ACID Compactor periodically
merges the delta files in the
background.

Transaction Compactions
Read-
Optimized
ORCFile
Delta File Merged
Read-
Optimized
ORCFile
Read-
Optimized
ORCFile
Delta File
Delta File
Delta File
Minor Compaction
10% local
Major Compaction
10% global
Minor and Major Compactions

Locking and Concurrency
 We follow Snapshot isolation where
– Writers do traditional 2 phase locking (and write newer versions of data)
– Readers read the latest version number when the query arrived
– Readers and writers do not block each other
• Writes can write newer versions
• Readers can read a consistent view based on version number
 We do Table level and Partition level locking
– If we cannot figure out the partition, we’ll do table level locking
– Two transactions trying to update the same Table/Partition will block behind one another

Storage Layer Example
 CREATE TABLE T(a int, b int) CLUSTERED BY (b) INTO 1 BUCKETS STORED AS ORC
TBLPROPERTIES ('transactional'='true');
 Suppose the table contains (1,2),(3,4)
hive> update T set a = -3 where a = 3;
hive> update T set a = -1 where a = 1;
Now the table has (-1,2),(-3,4)
 hive> dfs -ls -R /user/hive/warehouse/t;
/user/hive/warehouse/t/base_0000022/bucket_00000
/user/hive/warehouse/t/delta_0000023_0000023_0000/bucket_00000
/user/hive/warehouse/t/delta_0000024_0000024_0000/bucket_00000

Producing The Snapshot
base_0000022/bucket_00000
oTxn bucket rowId cTxn a b
22 0 0 22 3 4
22 0 1 22 1 2
select * from T
a b
-3 4
-1 2
delta_0000023_0000023_0000
22 0 0 23 -
3
4
delta_0000024_0000024_0000
22 0 1 24 -1 2

Example Continued
 bin/hive --orcfiledump -j -d /user/hive/warehouse/t/base_0000022/bucket_00000
{"operation":0,"originalTransaction":22,"bucket":0,"rowId":0,"currentTransaction":22,"row":{"a":3,"b":4}}
{"operation":0,"originalTransaction":22,"bucket":0,"rowId":1,"currentTransaction":22,"row":{"a":1,"b":2}}
 bin/hive --orcfiledump -j -d /…/t/delta_0000023_0000023_0000/bucket_00000
{"operation":1,"originalTransaction":22,"bucket":0,"rowId":0,"currentTransaction":23,"row":{"_col1":-3,"_col2":4}}
 Each file is sorted by PK: originalTransaction,bucket,rowid
 On read base & deltas are stitched together to produce correct version of each row.
 Each read operation “knows” the state of all transactions up to the moment it started

Design - Compactor
 More operations = more delta files
 Compactor rewrites the table in the background
– Minor compaction - merges delta files into fewer deltas
– Major compactor merges deltas with base - more expensive
– This amortizes the cost of updates and self tunes the tables
• Makes ORC more efficient - larger stripes, better compression
 Compaction can be triggered automatically or on demand
– There are various configuration options to control when the process kicks in.
– Compaction itself is a Map-Reduce job
 Key design principle is that compactor does not affect readers/writers
 Cleaner process – removes obsolete files

Design - Concurrency
 Transaction Manager
– manages transaction ID assignment
– keeps track of transaction state: open, committed, aborted
 Lock Manager
– DDL operations acquire eXclusive locks
– Read operations acquire Shared locks.
– Main goal is to prevent someone dropping a table while a query is in progress
 State of both persisted in Hive Metastore
 Write Set tracking to prevent Write-Write conflicts in concurrent transactions
 Note that 2 Inserts are never in conflict since Hive does not enforce unique constraints.

 You are allowed to read acid and non-acid tables in same query.
 You cannot write to acid and non-acid tables at the same time (multi-insert statement)

Giveaways: Test it yourself
 Download the Sandbox
– http://hortonworks.com/downloads/
 Spin up a cluster in the cloud
– http://hortonworks.github.io/hdp-aws/quick/
– http://hortonworks.com/hadoop-tutorial/deploying-hortonworks-sandbox-on-microsoft-azure/
– http://sequenceiq.com/cloudbreak-docs/latest/#introduction
 Deploy it with Vagrant
– https://github.com/hortonworks/structor
 Deploy it with Ambari shell, Docker and Blueprints
– Run in shell: curl -Lo .amb j.mp/docker-ambari && . .amb && amb-deploy-cluster

Q & A

Hive acid and_2.x new_features

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Hive acid and_2.x new_features (20)

Recently uploaded (20)

Hive acid and_2.x new_features