Dynamic Resource Allocation Spark on YARN

Dynamic Resource
Allocation for Spark on
YARN
ozawa@apache.org
Tsuyoshi Ozawa

What s YARN
• A resource manager
implementation 
for computer cluster

Hadoop Stack
HDFS
YARN
MapReduceSpark Tez

YARN overview
• All resources are managed by ResourceManager
• All tasks are launched on NodeManager
• Client submit jobs via ResourceManager
NodeManager NodeManager
ResourceManager client

Spark on YARN
• 2 mode
• yarn-cluster
• yarn-client

yarn-cluster mode
• Launching Spark driver on YARN container
• Working well with spark-submit
NodeManager NodeManager NM
container1 container2Spark
AppMaster
clientResource Manager
1 submit
2 launching
master
3 launching
executers
spark driver

yarn-client mode
• Launching Spark driver at client side
• Working well with spark-shell
AppMaster
client
Resource Manager
1 submit
2 launching
master
3 launching
executers spark driver
4. send
commands

Spark on YARN
• yarn-cluster mode
Node1 Node2 Node3
container
1
container
2
AppMaster
container
2

Problem
• Ineﬃcient resource management
• containers cannot exit until job exits
Node1 Node2
container container container container
stage1
stage2
100% 100% 100% 100%
100%0%0% 0%

Dynamic resource
allocation(since v1.2)
• Allocating containers more dynamically
• number of executers are decided by workload
AppMaster
clientResource Manager
1 submit
2 launching
master
3 launching
executers/
kill executors
spark driver

Yak shaving
• Where should we hold the state of  
Spark RDD?
• If executers are killed, it ll be lost…
NodeManager
executer executer
RDD RDD

external shuffle
• Saving Spark RDD to NodeManager
• NodeManager has a interface, 
external shuffle plugin
• Now executers are stateless!
NodeManager
executer executer
external
shuffle plugin
RDD
(IntermediateFile)
RDD
(IntermediateFile)

How to install
(with Apache Hadoop)
• Copy shuﬄe plugin to nodemanager s
classpath
• Edit yarn-site.xml
• Edit spark-defaults.conf

Copy shuﬄe jar to
nodemanager s classpath
$ cp
lib/spark-*-yarn-shuffle.jar
/home/ubuntu/hadoop/share/hadoop/yarn/

Edit yarn-site.xml
• Adding shuﬄe plugin
• Note that documentation for 1.2 includes typo - I PRed :-)
• See documentation for 1.4

We re ready!!
• num-executers are deﬁned automatically

Summary
• Spark on YARN
• yarn-client mode
• yarn-cluster mode
• Spark can launch jobs eﬃciently on YARN 
with dynamic allocation

Dynamic Resource Allocation Spark on YARN

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to Dynamic Resource Allocation Spark on YARN (20)

More from Tsuyoshi OZAWA (10)

Dynamic Resource Allocation Spark on YARN