LLAP: Building Cloud First BI

© Hortonworks Inc. 2011 – 2015. All Rights Reserved
LLAP: Building Cloud-First BI
Sergey Shelukhin

LLAP: Building Cloud-First BI
• What is LLAP? Overview
• Cloud-first BI
• Efficient, scalable multi-user execution and caching
• Secure
• Universal (not just for Hive)
• Production-ready tools
• How to run LLAP

Overview

What is LLAP?
• Hybrid model combining daemons and containers for
fast, concurrent execution of analytical workloads
(e.g. Hive SQL queries)
• Concurrent queries without specialized YARN queue setup
• Multi-threaded execution of vectorized operator pipelines
• Asynchronous IO and efficient in-memory caching
• Relational view of the data available thru the API
• High performance scans, code pushdown, centralized security
• Not an "execution engine" (like Tez, MR, Spark)
• Not a storage substrate – reads from HDFS/S3/…
Node
LLAP Process
Cache
Query Fragment
HDFS
Query Fragment

LLAP and Hive
• Transparent to users, BI tools, etc. – HS2/JDBC is the access point
Deep
Storage
YARN Cluster
HiveServer2
(Query
Endpoint)
ODBC,
JDBC SQL
Queries
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
LLAP Daemon
Query
Executors
Query
Coordinators
Coord-
inator
Coord-
inator
Coord-
inator
In-Memory Cache
(Shared Across All Users)
HDFS and
Compatible
S3 WASB Isilon
DAGs

LLAP and Hive
• LLAP is Hive
• Supports all file formats that Hive does
• Hive Operators used for processing, same compiler, etc.
• ⇒ Automatic support for most new optimizations and features
• HS2 controls the concurrent queries (session pool)
• Each Query coordinated by a Tez AM; LLAP hosts Tez shuffle
• Can run in parallel with container-based jobs
• After perf testing, we recommend running most Hive workloads in LLAP
• In this new model, the cluster capacity is divided proportionally if needed

LLAP in a BI system - overview
• No split brain – a single platform for ETL…
• Fault-tolerant components and proven scalability (runs TPCDS 100 Tb)
• ANSI SQL, CB Optimizer, ACID transactions
• …and BI
• Vectorized engine for fast processing
• Intelligent scheduling of different workloads running together
• Caching, incl. on local disk (SSD), to avoid expensive reads
• Zero-ETL analytics with efficient caching of text data
• External access, one metadata store, unified security

Efficient BI queries

LLAP
Queue
Short overview – execution
• LLAP daemon has a number of executors
(think containers) that run "fragments"
• Fragments are parts of multiple parallel
workloads (e.g. Hive SQL queries)
• Work queue with pluggable priority
• Geared towards low latency queries over long-
running queries (by default)
• I/O is similar to containers – read/write to
HDFS, shuffle, other storages and formats
• Streaming output for data API
Executor
Q1 Map 1
Executor
External read
Executor
Q3 Reducer 3
Q1 Map 1
Q1 Map 1
Q3 Map 19
HDFS
Waiting for
shuffle inputs
HBase
LLAP
(shuffle input)
Spark
executor

Efficiency for individual queries
• Eliminates container startup costs
• JIT optimizer has a chance to work (esp. for vectorization)
• Data sharing (hash join tables, etc.)
0 5 10 15 20 25
New containers
Cold LLAP
Warm LLAP (no cache, new user)
4th run of the query (no cache,…
Time, sec

Individual queries – LLAP vs Hive 1 (x26 faster)
0
5
10
15
20
25
30
35
40
45
50
0
50
100
150
200
250
Speedup(xFactor)
QueryTime(s)(LowerisBetter)
Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor)

Hive LLAP vs Impala; TPCDS 10Tb on 9 nodes
5000 7000 9000 11000 13000
CDH 5.12
HDP 2.6
Total Runtime (sec)
(Lower is Better)
0 20 40 60 80 100
CDH 5.12
HDP 2.6
TPC-DS Queries
Supported
(Higher is Better)
60
99/99

Hive LLAP vs Impala (log scale; lower is better)
1
10
100
1000
10000
query64
query13
query83
query91
query93
query47
query85
query57
query99
query78
query59
query62
query2
query89
query71
query51
query43
query48
query88
query28
query17
query11
query25
query3
query97
query74
query50
query53
query63
query90
query33
query96
query60
query15
query69
query75
query46
query26
query34
query7
query1
query55
query52
query73
query42
query65
query30
query56
query81
query19
query61
query39
query79
query4
query29
query9
query68
query31
query49
query76
Runtime(s)(LogScale)
HDP 2.6 (Hive LLAP) versus CDH 5.12 (Impala)

Parallel queries – priorities, preemption
• Lower-priority fragments can be preempted
• For example, a fragment can start running before its inputs are
ready, for better pipelining; such fragments may be preempted
• LLAP work queue examines the DAG parameters to give
preference to interactive (BI) queries
LLAP
QueueExecutor
Executor
Interactive
query map 1/3
…
Interactive
query map 3/3
Executor
Interactive
query map 2/3
Wide query
reduce waiting
Time
ClusterUtilization
Long-Running Query
Short-Running Query

Workload management (WIP - HIVE-17481)

Overview
• Effectively share LLAP cluster resources
• Resource allocation per user policy; separate ETL and BI, etc.
• Resources based guardrails
• Protect against long running queries, high memory usage
• Improved, query-aware scheduling
• Scheduler is aware of query characteristics, types, etc.
• Fragments easy to pre-empt compared to containers
• Queries get guaranteed fractions of the cluster, but
can use empty space
Theory
Practice

Resource plans
• Resource plan is a workload management configuration for a cluster
• Switching is allowed without stopping queries, e.g. based on time of day
• Cluster is divided into query pools (optionally nested)
• Each pool defines query parallelism, cluster resources percentage
• Queries are automatically routed to pools based on user name, app, etc.
• Rules to kill, move, or deprioritized queries based on DFS usage, runtime, etc.
• Example (commands may change in the final version):
CREATE RESOURCE PLAN daytime;
CREATE POOL bi IN daytime (resource_percent=75, concurrent_queries=5);
CREATE POOL etl IN daytime TIME (resource_percent=25, concurrent_queries=10);
CREATE RULE downgrade IN daytime WHEN total_runtime > 300 THEN MOVE etl;
ADD RULE downgrade TO bi;
CREATE MAPPING tableau (application=“Tableau”, pool=bi);
ALTER PLAN daytime SET default_pool='etl';
APPLY PLAN daytime;

Decentralized guaranteed resources
• A guaranteed task for each resource (currently executor slots)
• HS2 gives N guaranteed tasks to an AM based on configured resource plan
• AMs mark N of its most important tasks as guaranteed at any given time
• Guarantee is not a requirement to use resource
• Guaranteed tasks pre-empt speculative tasks
• Future improvements – coordination improvements
• E.g. try to place on the least busy nodes

Guaranteed tasks – BI and ETL example
BI (80% = 14 guaranteed) ETL (20% = 4 guaranteed)
Query 1 Query 2
LLAP Daemon 1 LLAP Daemon 2 LLAP Daemon 3
Wait Queue
Executors
10 active tasks (running):
10 guaranteed (running)
4 unused for now
19 active tasks (8 running):
4 guaranteed (4 running)
15 speculative (4 running)
HS2
18 executors total

Caching

Caching for BI workloads - basics
• Fine-grained (columnar), compact (dictionary, RLE encoded)
• Important due to projections over many wide EDW tables
• Prioritized – indexes are cached with higher priority
• Important to make use of PPD for BI query filters
• Off-heap (no extra GC), supports SSD
• Saves on cloud reads
• LRFU replacement policy avoids the damage from large scans
• Automatic coherence, flexible locality (hash based)
• Locality can be strict, or create additional replicas to avoid hotspots

Caching for BI workloads – formats, zero-ETL
• On-prem, does not make large queries on ORC much faster
• On S3/Azure though, reads are much more slow
• Even disk cache is still much faster than FS reads
• Especially if text is involved
• Cache supports columnar data and text
• ORC cached natively; in 3.0, Parquet is also cached natively
• Possible future improvement – move Parquet caching onto IO threads
• Zero-ETL analytics on CSV and JSON data with text caching
• Text is efficiently encoded in background; once cached, queries speed up

I/O threads
In-memory processing – native columnar (ORC)
SSD cache
Off-heap cache
Compact encoded data
Distributed FS
Compressed data
Decoder: ORC
col1
col2
Compression
codec
Read planner
Execution thread
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
Native data
vectors
Replacement
policy

In-memory processing – columnar (Parquet)
SSD cache
Off-heap cache
Compact data
Distributed FS
Compressed data
Parquet reader
Execution thread
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
Native data
vectors
Replacement
policy
col1
col2
Cache
coordinator

Up to 10x speed up with a cloud FS (100 Tb dataset, SSD)
0
1
2
3
4
5
6
7
8
9
10
0
500
1000
1500
2000
2500
3000
query71
query19
query73
query82
query68
query91
query12
query90
query18
query85
query27
query45
query13
query40
query48
query89
query79
query58
query66
query15
query42
query98
query26
query20
query46
query7
query17
query34
query96
query25
query32
query28
query21
query49
query87
query84
query50
query95
query97
query64
query55
query52
query93
query39
query76
query92
query94
query88
query3
query43
Speedup
Time,sec
Cold
With cache
Speedup
10x
• 1.5x overall
• 2.1x small
queries

I/O threads
Execution thread
In-memory processing – text – the first read
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
SSD cache
Native data
vectors
Off-heap cache
Distributed FS
Compressed data
Decoder: ORC Read planner
Compression
codec
"Decoder": text
col1 col2
Encoder: ORC lite

I/O threads
Execution thread
In-memory processing – text – the second read
Fragment
Hive
operator
Hive
operator
Vectorized
processin
g
col1 col2
SSD cache
Native data
vectors
Off-heap cache
Distributed FS
Compressed data
Decoder: ORC
col2
Read planner
Compression
codec
"Decoder": text
Encoder: ORC lite

Up to 38x speed up with text cache + cloud FS

External access

External access – relational view for everyone
• Hive-on-Tez and other DAG executors can use LLAP directly
• LLAP also provides a "relational datanode" view of the data
• Anyone (with access) can push the (approved) code in, from
complex query fragments to simple data reads
• E.g. a Spark DataFrame can be created with LlapInputFormat
• Gives the external services the access to
• Hive data: centralized, secure data access
• Ability to read all Hive table types, like ACID transactional tables
• Hive features: from column-level security, to LLAP columnar cache

SparkSQL+LLAP example
• Ranger for Hive support cell-level security and masking
• With SparkSQL utilizing LLAP, this can be used from Spark
• More at https://hortonworks.com/blog/row-column-level-control-apache-spark/

Security

LLAP security – the EDW usage patterns
• Single, centrally administered LLAP cluster for all users
• For now, separate ad hoc clusters cannot use Ambari
• Use Ranger
• Hive SQL standard auth is an option; doAs is not recommended
• Hive session AMs and LLAP run as hive superuser; managed by HS2
• HS2 serves as a central coordinator for security
• Beeline and JDBC access; no CLI (requires client kinit)
• HS2 checks permissions, enforces Ranger policies
• Coordinates the usual Hadoop security dance (tokens for tasks, etc.)

LLAP daemon
Task Jo
b
FS
SM T TK
LLAP security – internals with Tez
HS2
SM T TK
SessionSession
FS
Tez AM
T FS
ZK
Paths w/ACLs
T TK
Jo
b
Ranger
Policies
FS
LLAP daemon
Task Jo
b
FS
SM T TK
FS
Jo
b
F💓
FS
FS T🔐
T Task
Jo
b

LLAP security – external (Spark) additions
FS
LLAP daemon
Task Jo
b
FS
SM T TK
Spark Client
FS Jo
b
FS
T
🔏 Specs
K
HS2
SM T TK
Session
FS
K
🔏 TaskT
K
K

Integration and tools

LLAP in Ambari (and HDP)
• Ambari 2.5 + HDP 2.6 = LLAP GA
• The latest recommended update is HDP 2.6.2.0
• Do not use Tech Preview versions; use GA
• Separate version of Hive, "Hive Interactive"
• Ambari 3.0 – no more separate versions
• Enable "Interactive Query" in Hive tab
• A default configuration is chosen; more on that later
• No Ambari? Will cover this later

LLAP on the cloud
• LLAP is in HDC (Hortonworks Data Cloud)
• For quick, automated cluster deployments on AWS
• Also available on Azure HDInsight
• Details and links at the end!

Tez UI – queries and LLAP integration

Tez UI – query swimlane

Tez UI – DAG swimlane with LLAP counters

Tez UI – query information and debug data

Monitoring
• LLAP exposes a UI for
monitoring
• Aggregate monitoring UI is
work in progress
• Lots of debug views and tools
– JMX, logging, etc. – see
backup slides

Making the best use of LLAP - summary
• Java 8, G1GC; some kernel configs, esp. for TCP connections
• Ambari comes with reasonable Hive perf configuration
• May still need tweaking for specific workload; more so w/o Ambari
• SSD and text cache require "advanced config" until Ambari 3.0
• LLAP cluster sizing in a nutshell
• AM per query (+ 1-2), AMs on all nodes; 2Gb RAM per AM, rest to LLAP
• (Executor+IO thread) per core, 3-4Gb RAM per executor, rest to cache
• Without Ambari, see hive --service llap command + Slider
• See backup slides for some details ;)

Future work for manageability
• UDF, configuration changes without affecting queries (rolling restart)
• In-place restart of the daemons – preserves cache
• Ambari integration for faster startup in common cases
• Improved scale-up/down (with Ambari and Slider integration)

Summary
LLAP provides a
• unified, managed and secure cloud-ready EDW
solution for BI and ETL workloads via a
• fast execution substrate harnessing Hive
vectorized SQL engine and
• efficient in-memory caching layer for columnar
and text data

Try Hive LLAP Today on-prem or in the Cloud
Hortonworks Data Platform 2.6
Powered by 100% open source Apache Hadoop
http://hortonworks.com/downloads/
Hortonworks Data Cloud
Easy HDP on Amazon Web Services
http://hortonworks.com/products/cloud/aws/
Microsoft Azure HDInsight
A cloud Spark and Hadoop service for your enterprise
http://azure.microsoft.com/en-us/services/hdinsight/

Questions?
?
Interested? Stop by the Hortonworks booth to learn more

Backup slides

Tez UI – LLAP counters

Debugging
• JMX view (:15002/jmx) contains many detailed metrics
• E.g. "ExecutorsState" shows the running tasks and their state
• Log lines for most operations are annotated with task attempt #
• By default, stores separate log files per query
• Logs can be downloaded (yarn logs ...) even for a running LLAP app
• Log file contains the statements annotated for the query
• File name contains session application ID and DAG# - "dag_TTTT_MMM_N"
- e.g. dag_1490656001509_4602_1 for Tez AM application_1490656001509_4602

Making the best use of LLAP – OS and Java
• Java 8 strongly recommended
• G1 GC recommended (e.g. --args " -XX:+UseG1GC -XX:+ResizeTLAB -XX:-ResizePLAB")
• Kernel settings
sysctl -w net.core.somaxconn=16384;
echo "never" > /sys/kernel/mm/transparent_hugepage/enabled
echo "never" > /sys/kernel/mm/transparent_hugepage/defrag
/etc/init.d/nscd restart

Making the best use of LLAP – cluster sizing
1. Pick the number of parallel queries (not just sessions)
• AM per query + some constant slack
2. Pick executors per node, and io.threadpool count (# of cores per node)
3. Determine total memory size for LLAP - subtract the AM(s) from each NM
(YARN memory per node)
• 1 AM = 2 Gb; best to spread the AMs across each node
4. Determine the Xmx for LLAP; one executor (core) == 3-4Gb
5. Determine cache size - from the total, take out Xmx + ~3Gb (if lower, 20%)
• ~3Gb for Java overhead, shared overhead
6. Tweak based on workload?

Cluster sizing example
• 6 node cluster – 100Gb RAM each, 24 cores, NM Size = 96Gb, 25
concurrent queries
• 28 AMs; total AM memory = 28*2 = 56
• (96*6 – 28*2)/6 = 86.66 => "Memory per daemon" = 85Gb
• "LLAP heap size" (Xmx) = 3Gb*24 = 72Gb
• "Cache size" = 85Gb – 72Gb – 3Gb ~= 10Gb

Making the best use of LLAP – Hive settings
• Ambari has a lot of this configured by default
• The basics - use ORC; enable PPD, configure mapjoin size, etc.
• Enable vectorization and consider new vectorization features
• hive.vectorized.execution.*.enabled – see the configuration file documentation
• Enable LLAP split locality - hive.llap.client.consistent.splits
• hive.llap.task.scheduler.locality.delay to tweak strict/relaxed locality
• Consider disabling CBO for interactive queries (test your queries!)
• Use parallel compilation in HS2 - hive.driver.parallel.compilation
• Shuffle improvement - tez.am.am-rm.heartbeat.interval-ms.max=5000

Making the best use of LLAP – new cache stuff
• Text cache (not turned on in Ambari until 3.0!)
• hive.llap.io.encode.enabled,
hive.vectorized.use.(row|vector).serde.deserialize
• SSD cache (not turned on in Ambari until 3.0!)
• hive.llap.io.allocator.mmap; hive.llap.io.allocator.mmap.path
• hive.llap.io.memory.size controls the total cache size (on disk)
• You have to disable YARN memory check for now until YARN 2.8
• Cache on cloud FS
• hive.orc.splits.allow.synthetic.fileid,
hive.llap.cache.allow.synthetic.fileid

LLAP without Ambari
• Requires Hive 2.X (2.2-2.3 are coming soon); ZK, YARN, Slider
• hive --service llap generates a slider package
• Run this as the correct user – slider paths are user-specific! kinit on secure cluster
• Specify a name, # of instances; memory, cache size, etc.; see --help
• Generates run.sh to start the cluster (in --output) directory
• Or use --startImmediately in newer versions
• Queries can be run from HS2 and CLI; basic configuration:
• hive.execution.mode=llap, hive.llap.execution.mode=all,
hive.llap.io.enabled=true, hive.llap.daemon.service.hosts=@<cluster
name>

LLAP: Building Cloud First BI

More Related Content

What's hot (20)

Similar to LLAP: Building Cloud First BI (20)

More from DataWorks Summit (20)

Recently uploaded (20)

LLAP: Building Cloud First BI

Editor's Notes