SlideShare a Scribd company logo
Copyright©2016 NTT corp. All Rights Reserved.
YARN: A Resource
Manager for Analytic
Platform
Tsuyoshi Ozawa
ozawa.tsuyoshi@lab.ntt.co.jp
ozawa@apache.org
2Copyright©2016 NTT corp. All Rights Reserved.
• Tsuyoshi Ozawa
• Research Engineer @ NTT
• Twitter: @oza_x86_64
• Over 150 reviews in 2015
• Apache Hadoop Committer and PMC
• Introduction to Hadoop 2nd Edition(Japanese)” Chapter
22(YARN)
• Online article: gihyo.jp “Why and How does Hadoop
work?”
About me
3Copyright©2016 NTT corp. All Rights Reserved.
• 3 features of YARN
• Behaviors of DAG processing engine on YARN
• Case study1: Tez on YARN
• Case study2: Spark on YARN
• How do they work and best practice
Agenda
4Copyright©2016 NTT corp. All Rights Reserved.
• A resource manager for Apache Hadoop
• Being able to share resources not only across MapReduce
job, but also across the other processing framework
• 3 kind of features
1. Managing resources in cluster
2. Managing history logs of application
3. Mechanism for users to know locations running on
YARN
YARN
Copyright©2016 NTT corp. All Rights Reserved.
YARN features
1. Resource
Management
6Copyright©2016 NTT corp. All Rights Reserved.
• Taking different data analytics platform based on
workloads
• Google uses Tenzing/MapReduce for batch/ETL processing,
BigQuery for try and error,
TensorFlow for machine learning
• There are 2 ways to do so:
1. Separating each clusters
• Pros: easy to separate workload and resource
• Cons: difficult to copy big data between clusters
2. Living together in the same cluster
• Pros: no need to copy big data between clusters
• Cons: difficult to separate resources
Why manages resources with YARN?
7Copyright©2016 NTT corp. All Rights Reserved.
• Hadoop attaches weight to scalability
since Hadoop is for processing big data!
• Hence, YARN attaches weight to scalability!
• Different frameworks living together in the same
cluster without moving data
• Launching masters of processing framework per job
• At Hadoop MapReduce v1, master of MR get overload and
reaches limit of scalability in large scale processing
Design policy of YARN
8Copyright©2016 NTT corp. All Rights Reserved.
• Master-Worker architecture
• Master manages resources in cluster
(ResourceManager)
• Worker manages containers per machine
(NodeManager)
Architecture of YARN
ResourceManager
Worker
NodeManager
Container Container Container
Worker
NodeManager
Container Container Container
Worker
NodeManager
Container Container Container
Master Worker Worker MasterWorker WorkerMaster Worker Worker
Client
1. Submitting job
2. Allocating
a container for master,
launching master
3. Master requests containers to RM
and launching worker on the container
9Copyright©2016 NTT corp. All Rights Reserved.
• Issue
• Allocated containers are not monitored: it’s allocated, but
it has a possibility not to utilize them
• Suggestion by YARN community
YARN-1011(Karthik, Cloudera)
• Monitoring all allocated containers and launching(over
committing) tasks if it’s not used
• Effect
• NodeManager can utilize more resources per node
Issue of resource management in YARN
Copyright©2016 NTT corp. All Rights Reserved.
YARN features
2. Managing
history logs
11Copyright©2016 NTT corp. All Rights Reserved.
• Application information
which is running or completed
• It’s useful to tune performance and debug!
• Types of history log
1. Common information which all YARN applications have
• Allocated containers, name of used queues
2. Application specific information
• E.g. MapReduce case:
Mapper/Reducer’s counter/amount of shuffle etc.
• Server for preserving and displaying history
logs → Timeline Server
What is history log?
12Copyright©2016 NTT corp. All Rights Reserved.
• YARN applications only show generic log
with “YARN’s common information”
• Screenshot of Timeline Server
What can Timeline Server do?
13Copyright©2016 NTT corp. All Rights Reserved.
• Rich performance analysis With YARN
application specific information
• examples: Tez-UI
What can Timeline Server do?
http://hortonworks.com/wp-content/uploads/2014/09/tez_2.png
14Copyright©2016 NTT corp. All Rights Reserved.
Example of Tez-UI
• We can check job statistics in detail
From: http://hortonworks.com/wp-
content/uploads/2015/06/Tez-UI-
1024x685.png
15Copyright©2016 NTT corp. All Rights Reserved.
• Scalability to number of applications
• Easy to access history logs
• Users retrieve data via RESTful API
Design phylosophy of Timeline Server
16Copyright©2016 NTT corp. All Rights Reserved.
• 3 versions: V1, V1.5, V2
• V1 design
• Pluggable backend storage
• LevelDB is default backend storage
• Timeline Server includes Writer and Reader
• Limited scalability
• V2 design for scalability
• Distributed Writer and Reader
• Scalable KVS as Backend storage
• Focusing on HBase
• Changing API drastically
• Being enable to profiling across multiple jobs
• (New!)V1.5 design
• HDFS backend storage with V2 API
• Reader/Writer separation like V2
• Please note the incompatibility between V1 and V1.5
Current status of Timeline Server
17Copyright©2016 NTT corp. All Rights Reserved.
• Do you need other backends for Timeline
Server?
• Alternatives
• Time-series DB
• Apache Kudu
• RDBMS
• PostgreSQL or MySQL
Question from me
Copyright©2016 NTT corp. All Rights Reserved.
YARN feature
3. Service
Registry
19Copyright©2016 NTT corp. All Rights Reserved.
• A service to know the location(IP, port) of YARN
application
• For running long-running jobs like HBase on YARN, Hbase
on YARN, LLAP(Hive on Tez), MixServer in Hivemall
• Why do we need service registry?
• To notify clients destination of write operation
What is Service Registry?
NodeManager
Service
AM
Service
Container
client ZK Resource
Manager
1. Request to register
port and address
2. Preserving3. Get info
4. Connect
Copyright©2016 NTT corp. All Rights Reserved.
Behavior of
applications
on YARN
21Copyright©2016 NTT corp. All Rights Reserved.
• New processing frameworks include a high-
level DSL than MapReduce
• Apache Tez: HiveQL/Pig/Cascading
• Apache Spark: RDD/SQL/DataFrame/DataSets
• Apache Flink: Table/DataSets/DataStream
• Features
• More generic Directed Acyclic Graph = DAG, instead of
MapReduce, describes jobs
• One kind of compilation DSL to DAG with Spark, Hive on Tez,
Flink
MR-inspired distributed and parallel
processing framework on YARN
22Copyright©2016 NTT corp. All Rights Reserved.
• DAG can express complex job, like Map – Map –
Reduce, as single job
• Why it is bad to increase number of jobs?
• Before finishing Shuffle, Reduce cannot be started
• Before finishing Reduce, next job cannot be started
→ Decreasing parallelism
Why use DAG?
Figure from: https://tez.apache.org/
Hive on MR Hive on Tez/Spark
23Copyright©2016 NTT corp. All Rights Reserved.
• The first feature is job description with DAG
• The second one depends on processing framework
• Apache Tez
• DAG + Disk-IO optimization + Low-latency query support
• Hive/Pig/Spark/Flink can run on Tez
• Apache Spark
• DAG + Low-latency in-memory processing calculation
+ Functional programming-style interface
• Apache Flink
• DAG + Shuffle optimization
+ Streaming/Batch processing transparent interface
• Like, Apache Beam
Features of Modern processing frameworks
24Copyright©2016 NTT corp. All Rights Reserved.
• Large-scale batch processing
• Yes, MapReduce can handle it well
• Sometimes MapReduce is still best choice for the stability
• Short query for data analytics
• Trial and error to check the mood of data with changing
queries
• Google Dremel can handle it well
• After evolving interface of MR(SQL), users need
improvement of latency, not throughput
• Currently, we take on different processing framework
• However, it’s a just workaround
• YARN think the former is 1st citizen at first.
• How can we run the latter on YARN effectively?
Workloads modern processing frameworks
can handle
25Copyright©2016 NTT corp. All Rights Reserved.
• Overhead of job launching
• Because YARN needs 2 step to launch jobs:
master and workers
• Taking time to warm up
• Because YARN terminates containers after the job exits
• Tez/Spark/Flink runs on server-side JVM,
so these problems are remarkable
→ In this talk, I’ll talk about how Tez and Spark
manages resources on YARN!
Problem for running low-latency query on
YARN
26Copyright©2016 NTT corp. All Rights Reserved.
• MapReduce-inspired DAG processing framework
for large-scale data effectively under
DSL(Pig/Hive)
• Passing key-value pair internally like MapReduce
• Runtime DAG rewriting
• DAG plan itself is dumped by Hive optimizer or Pig Optimizer
• Being able to describe DAG without Sorting while shuffling
• Example of Hive query execution flow
• Writing Hive query
• Submitting Hive query
• Dumping DAG of Tez by Hive optimizer
• Executing Tez
Overview of Apache Tez
27Copyright©2016 NTT corp. All Rights Reserved.
• Container reusing
• Reusing containers instead of releasing containers in a job
as possible as master can
• Keeping allocated containers if session lives after
completing jobs
Q. What is different between MapReduce’s
Container Reuse and Tez’s one?
A.
• Container reusing is removed by accident at the
implementation of MapReduce on YARN
• Session enable us to reuse containers across multiple
jobs
Effective resource management in Tez
28Copyright©2016 NTT corp. All Rights Reserved.
• Long Lived and Process
• Allocating containers without relationship with “Session”
→ Being able to reuse warm-up JVM and cache
More low latency
• LLAP uses Registry Service
• Q. Is it like daemon of database?
A.
• That’s right.
The difference is that Tez can accelerate jobs by adding
resources from YARN.
Effective resource management in
Tez(cont.)
29Copyright©2016 NTT corp. All Rights Reserved.
• Rule
• Keeping container long can improve latency but can
worsen throughput
• Keeping container short can improve
throughput(resource utilization) but can worsen
latency
• Container reuse
• tez.am.container.idle.release-timeout-min.millis
• Keeping containers in this period
• tez.am.container.idle.release-timeout-max.millis
• Releasing containers after the period
• LLAP
• Looking like being WIP
• Note: Hive configuration override Tez configuration, so
please test it carefully
Tuning parameter
30Copyright©2016 NTT corp. All Rights Reserved.
• Hive on Tez assumed batch processing at first
• Releasing all containers after finishing jobs
→ Attaching more weight on throughput than latency
• As expanding use case of Hive, users need low
latency for interactive queries running on REPL
• Improving latency based on high-throughput
architecture
Tez on YARN summary
31Copyright©2016 NTT corp. All Rights Reserved.
• In-memory processing framework (originally!)
• Standalone mode is supported
• Spark on YARN is also supported
• Spark has 2 kinds of running mode on YARN
• yarn-cluster mode
• Submitting job (spark-submit)
• yarn-client mode
• REPL style (spark-shell)
• Both of them run on YARN
Overview of Apache Spark
32Copyright©2016 NTT corp. All Rights Reserved.
• Launching Spark driver on YARN container
• Assuming spark-submit command
yarn-cluster mode
ResourceManager
NodeManager
Spark App
Master
NodeManager
Worker 2Worker 1
Client
Driver
1. Submitting jobs via spark-submit
2. Launching master
3. Allocating resources for
workers by Spark AppMaster
33Copyright©2016 NTT corp. All Rights Reserved.
• Launching Spark driver at client-side
• Assuming REPL like spark-shell
yarn-client mode
ResourceManager
NodeManager
Spark App
Master
NodeManager
Worker 2Worker 1
Client
Driver
1. Launching spark-shell,
Connecting to RM
2. Launching Master3. Allocating resources
for workers by Spark
AppMaster 3
4. Enjoy interactive
programming!!
34Copyright©2016 NTT corp. All Rights Reserved.
• Spark have a feature of container reuse of Tez
natively
• Spark can handle low-latency query very well
• However, Spark cannot “release containers”
→ State of tasks in memory will be lost
→ Cannot release containers still when
allocated resources is not used
→ Decreasing resource utilization
Resource management of Spark on YARN
NodeManager
Container Container
NodeManager
Container Container
Stage 1
Stage 2
100% 100% 100% 100%
0% 0% 0%100%
35Copyright©2016 NTT corp. All Rights Reserved.
• Allocating/releasing containers based on
workload
• Intermediate data is persisted by using NodeManager’s
Shuffle Service
→ Not in memory data
• Triggers of allocating/releasing containers is timer
Dynamic resource allocation
(since Spark 1.2)
NodeManager
Exe1
NodeManager
Exe 2 Exe 3
Exe 4
(idle)
Intermediate data
Spark
AppMaster
Releasing containers
when executor is
idle for a period
Doubling number of
Executors if executors
has tasks for a period
Intermediate data
36Copyright©2016 NTT corp. All Rights Reserved.
• Rule
• Keeping container long can improve latency but can worsen
throughput
• Keeping container short can improve throughput(resource utilization)
but can worsen latency
→ Please tune this points!
• Max/Min/Initial number of YARN containers
→ high numbers of executors increases throughput, but can decrease resource utilizations
• spark.dynamicAllocation.maxExecutors
• spark.dynamicAllocation.minExecutors
• spark.dynamicAllocation.initialExecutors
• A period to double number of containers for Executers when all Executors are active
→ large value can improve latency and throughput but worsen stability and resource utilization
• spark.dynamicAllocation.schedulerBacklogTimeout
• A period to keep containers
→ large value can improve latency but worsen resource utilization
• spark.dynamicAllocation.sustainedSchedulerBacklogTimeout
• A period to release containers which is empty
→ large value can improve latency but worsen
• spark.dynamicAllocation.executorIdleTimeout
• Releasing executor with cache data if the executor is idle in the specified period
• spark.dynamicAllocation.cachedExecutorIdleTimeout
Tuning paramaters
37Copyright©2016 NTT corp. All Rights Reserved.
• Spark assumes interactive programming
interface natively
Container is not released by default
→ Attaching more weight on latency than
throughput
• Adding a features to release containers
dynamically when supporting YARN
• Increasing throughput based on low-latency
architecture
Spark on YARN summary
38Copyright©2016 NTT corp. All Rights Reserved.
• 3 kinds of features YARN provides
1. Resource management in cluster
2. Timeline Server
3. Service registry
• Dive into resource management in Tez/Spark
• Apache Tez, originated from batch processing
• Apache Spark, originated from interactive processing
Both of Tez and Spark can process wider query by
switching high throughput mode and low latency mode
Summary

More Related Content

PDF
Taming YARN @ Hadoop Conference Japan 2014
PDF
Dynamic Resource Allocation Spark on YARN
PDF
Taming YARN @ Hadoop conference Japan 2014
PPTX
Spark on Yarn
PPTX
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
PPTX
Investing the Effects of Overcommitting YARN resources
PDF
Extending Spark Streaming to Support Complex Event Processing
PPTX
Emr zeppelin & Livy demystified
Taming YARN @ Hadoop Conference Japan 2014
Dynamic Resource Allocation Spark on YARN
Taming YARN @ Hadoop conference Japan 2014
Spark on Yarn
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Investing the Effects of Overcommitting YARN resources
Extending Spark Streaming to Support Complex Event Processing
Emr zeppelin & Livy demystified

What's hot (20)

PDF
Suning OpenStack Cloud and Heat
PDF
Running Spark on Cloud
PPTX
Deploying Apache Flume to enable low-latency analytics
PDF
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
PPTX
Hello OpenStack, Meet Hadoop
PPTX
Solr Lucene Conference 2014 - Nitin Presentation
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
PPTX
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
PDF
How netflix manages petabyte scale apache cassandra in the cloud
PDF
PaaSTA: Autoscaling at Yelp
PPTX
Zoo keeper in the wild
PDF
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
PDF
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
PDF
An introduction into Spark ML plus how to go beyond when you get stuck
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Top 5 mistakes when writing Spark applications
PDF
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
PPTX
Apache Tez – Present and Future
PDF
Make 2016 your year of SMACK talk
PPTX
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Suning OpenStack Cloud and Heat
Running Spark on Cloud
Deploying Apache Flume to enable low-latency analytics
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Hello OpenStack, Meet Hadoop
Solr Lucene Conference 2014 - Nitin Presentation
Lessons Learned Running Hadoop and Spark in Docker Containers
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
How netflix manages petabyte scale apache cassandra in the cloud
PaaSTA: Autoscaling at Yelp
Zoo keeper in the wild
Running Dataproc At Scale in production - Searce Talk at GDG Delhi
MariaDB Auto-Clustering, Vertical and Horizontal Scaling within Jelastic PaaS
An introduction into Spark ML plus how to go beyond when you get stuck
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 mistakes when writing Spark applications
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
Apache Tez – Present and Future
Make 2016 your year of SMACK talk
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Ad

Viewers also liked (8)

PDF
Intel TSX HLE を触ってみた x86opti
PDF
10分で分かるデータストレージ
PDF
Tuning Java for Big Data
PDF
トランザクションの並行実行制御 rev.2
PDF
Effective Modern C++ 勉強会#6 Item25
PDF
Effective Modern C++ 勉強会#8 Item38
PDF
10分で分かるバックアップとレプリケーション
PDF
トランザクションの並行処理制御
Intel TSX HLE を触ってみた x86opti
10分で分かるデータストレージ
Tuning Java for Big Data
トランザクションの並行実行制御 rev.2
Effective Modern C++ 勉強会#6 Item25
Effective Modern C++ 勉強会#8 Item38
10分で分かるバックアップとレプリケーション
トランザクションの並行処理制御
Ad

Similar to YARN: a resource manager for analytic platform (20)

PDF
Hadoop 2 - Beyond MapReduce
PPT
Venturing into Large Hadoop Clusters
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
20140202 fosdem-nosql-devroom-hadoop-yarn
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
PDF
Venturing into Hadoop Large Clusters
PDF
Venturing into Large Hadoop Clusters
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PDF
Hadoop 2 - More than MapReduce
PPTX
Running Yarn at Scale
PPTX
Hackathon bonn
PDF
Hadoop 2.0 YARN webinar
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PDF
Hadoop ecosystem
PDF
Hadoop ecosystem
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PDF
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
PDF
Hadoop 2 - Going beyond MapReduce
PPTX
Huhadoop - v1.1
Hadoop 2 - Beyond MapReduce
Venturing into Large Hadoop Clusters
Apache Tez: Accelerating Hadoop Query Processing
20140202 fosdem-nosql-devroom-hadoop-yarn
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Running Non-MapReduce Big Data Applications on Apache Hadoop
Venturing into Hadoop Large Clusters
Venturing into Large Hadoop Clusters
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hadoop 2 - More than MapReduce
Running Yarn at Scale
Hackathon bonn
Hadoop 2.0 YARN webinar
YARN - Next Generation Compute Platform fo Hadoop
Hadoop ecosystem
Hadoop ecosystem
Apache Hadoop 3.0 What's new in YARN and MapReduce
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Hadoop 2 - Going beyond MapReduce
Huhadoop - v1.1

More from Tsuyoshi OZAWA (9)

PDF
Spark shark
PDF
Fluent logger-scala
PDF
Multilevel aggregation for Hadoop/MapReduce
PDF
Memcached as a Service for CloudFoundry
KEY
First step for dynticks in FreeBSD
PDF
Memory Virtualization
PDF
第二回Bitvisor読書会 前半 Intel-VT について
PDF
第二回KVM読書会
PDF
Linux KVM のコードを追いかけてみよう
Spark shark
Fluent logger-scala
Multilevel aggregation for Hadoop/MapReduce
Memcached as a Service for CloudFoundry
First step for dynticks in FreeBSD
Memory Virtualization
第二回Bitvisor読書会 前半 Intel-VT について
第二回KVM読書会
Linux KVM のコードを追いかけてみよう

Recently uploaded (20)

PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
Predictive modeling basics in data cleaning process
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
A Complete Guide to Streamlining Business Processes
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Transcultural that can help you someday.
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
Introduction to the R Programming Language
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Predictive modeling basics in data cleaning process
STERILIZATION AND DISINFECTION-1.ppthhhbx
[EN] Industrial Machine Downtime Prediction
Pilar Kemerdekaan dan Identi Bangsa.pptx
modul_python (1).pptx for professional and student
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
A Complete Guide to Streamlining Business Processes
ISS -ESG Data flows What is ESG and HowHow
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Transcultural that can help you someday.
Optimise Shopper Experiences with a Strong Data Estate.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
SAP 2 completion done . PRESENTATION.pptx
Microsoft Core Cloud Services powerpoint
Introduction to the R Programming Language
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf

YARN: a resource manager for analytic platform

  • 1. Copyright©2016 NTT corp. All Rights Reserved. YARN: A Resource Manager for Analytic Platform Tsuyoshi Ozawa ozawa.tsuyoshi@lab.ntt.co.jp ozawa@apache.org
  • 2. 2Copyright©2016 NTT corp. All Rights Reserved. • Tsuyoshi Ozawa • Research Engineer @ NTT • Twitter: @oza_x86_64 • Over 150 reviews in 2015 • Apache Hadoop Committer and PMC • Introduction to Hadoop 2nd Edition(Japanese)” Chapter 22(YARN) • Online article: gihyo.jp “Why and How does Hadoop work?” About me
  • 3. 3Copyright©2016 NTT corp. All Rights Reserved. • 3 features of YARN • Behaviors of DAG processing engine on YARN • Case study1: Tez on YARN • Case study2: Spark on YARN • How do they work and best practice Agenda
  • 4. 4Copyright©2016 NTT corp. All Rights Reserved. • A resource manager for Apache Hadoop • Being able to share resources not only across MapReduce job, but also across the other processing framework • 3 kind of features 1. Managing resources in cluster 2. Managing history logs of application 3. Mechanism for users to know locations running on YARN YARN
  • 5. Copyright©2016 NTT corp. All Rights Reserved. YARN features 1. Resource Management
  • 6. 6Copyright©2016 NTT corp. All Rights Reserved. • Taking different data analytics platform based on workloads • Google uses Tenzing/MapReduce for batch/ETL processing, BigQuery for try and error, TensorFlow for machine learning • There are 2 ways to do so: 1. Separating each clusters • Pros: easy to separate workload and resource • Cons: difficult to copy big data between clusters 2. Living together in the same cluster • Pros: no need to copy big data between clusters • Cons: difficult to separate resources Why manages resources with YARN?
  • 7. 7Copyright©2016 NTT corp. All Rights Reserved. • Hadoop attaches weight to scalability since Hadoop is for processing big data! • Hence, YARN attaches weight to scalability! • Different frameworks living together in the same cluster without moving data • Launching masters of processing framework per job • At Hadoop MapReduce v1, master of MR get overload and reaches limit of scalability in large scale processing Design policy of YARN
  • 8. 8Copyright©2016 NTT corp. All Rights Reserved. • Master-Worker architecture • Master manages resources in cluster (ResourceManager) • Worker manages containers per machine (NodeManager) Architecture of YARN ResourceManager Worker NodeManager Container Container Container Worker NodeManager Container Container Container Worker NodeManager Container Container Container Master Worker Worker MasterWorker WorkerMaster Worker Worker Client 1. Submitting job 2. Allocating a container for master, launching master 3. Master requests containers to RM and launching worker on the container
  • 9. 9Copyright©2016 NTT corp. All Rights Reserved. • Issue • Allocated containers are not monitored: it’s allocated, but it has a possibility not to utilize them • Suggestion by YARN community YARN-1011(Karthik, Cloudera) • Monitoring all allocated containers and launching(over committing) tasks if it’s not used • Effect • NodeManager can utilize more resources per node Issue of resource management in YARN
  • 10. Copyright©2016 NTT corp. All Rights Reserved. YARN features 2. Managing history logs
  • 11. 11Copyright©2016 NTT corp. All Rights Reserved. • Application information which is running or completed • It’s useful to tune performance and debug! • Types of history log 1. Common information which all YARN applications have • Allocated containers, name of used queues 2. Application specific information • E.g. MapReduce case: Mapper/Reducer’s counter/amount of shuffle etc. • Server for preserving and displaying history logs → Timeline Server What is history log?
  • 12. 12Copyright©2016 NTT corp. All Rights Reserved. • YARN applications only show generic log with “YARN’s common information” • Screenshot of Timeline Server What can Timeline Server do?
  • 13. 13Copyright©2016 NTT corp. All Rights Reserved. • Rich performance analysis With YARN application specific information • examples: Tez-UI What can Timeline Server do? http://hortonworks.com/wp-content/uploads/2014/09/tez_2.png
  • 14. 14Copyright©2016 NTT corp. All Rights Reserved. Example of Tez-UI • We can check job statistics in detail From: http://hortonworks.com/wp- content/uploads/2015/06/Tez-UI- 1024x685.png
  • 15. 15Copyright©2016 NTT corp. All Rights Reserved. • Scalability to number of applications • Easy to access history logs • Users retrieve data via RESTful API Design phylosophy of Timeline Server
  • 16. 16Copyright©2016 NTT corp. All Rights Reserved. • 3 versions: V1, V1.5, V2 • V1 design • Pluggable backend storage • LevelDB is default backend storage • Timeline Server includes Writer and Reader • Limited scalability • V2 design for scalability • Distributed Writer and Reader • Scalable KVS as Backend storage • Focusing on HBase • Changing API drastically • Being enable to profiling across multiple jobs • (New!)V1.5 design • HDFS backend storage with V2 API • Reader/Writer separation like V2 • Please note the incompatibility between V1 and V1.5 Current status of Timeline Server
  • 17. 17Copyright©2016 NTT corp. All Rights Reserved. • Do you need other backends for Timeline Server? • Alternatives • Time-series DB • Apache Kudu • RDBMS • PostgreSQL or MySQL Question from me
  • 18. Copyright©2016 NTT corp. All Rights Reserved. YARN feature 3. Service Registry
  • 19. 19Copyright©2016 NTT corp. All Rights Reserved. • A service to know the location(IP, port) of YARN application • For running long-running jobs like HBase on YARN, Hbase on YARN, LLAP(Hive on Tez), MixServer in Hivemall • Why do we need service registry? • To notify clients destination of write operation What is Service Registry? NodeManager Service AM Service Container client ZK Resource Manager 1. Request to register port and address 2. Preserving3. Get info 4. Connect
  • 20. Copyright©2016 NTT corp. All Rights Reserved. Behavior of applications on YARN
  • 21. 21Copyright©2016 NTT corp. All Rights Reserved. • New processing frameworks include a high- level DSL than MapReduce • Apache Tez: HiveQL/Pig/Cascading • Apache Spark: RDD/SQL/DataFrame/DataSets • Apache Flink: Table/DataSets/DataStream • Features • More generic Directed Acyclic Graph = DAG, instead of MapReduce, describes jobs • One kind of compilation DSL to DAG with Spark, Hive on Tez, Flink MR-inspired distributed and parallel processing framework on YARN
  • 22. 22Copyright©2016 NTT corp. All Rights Reserved. • DAG can express complex job, like Map – Map – Reduce, as single job • Why it is bad to increase number of jobs? • Before finishing Shuffle, Reduce cannot be started • Before finishing Reduce, next job cannot be started → Decreasing parallelism Why use DAG? Figure from: https://tez.apache.org/ Hive on MR Hive on Tez/Spark
  • 23. 23Copyright©2016 NTT corp. All Rights Reserved. • The first feature is job description with DAG • The second one depends on processing framework • Apache Tez • DAG + Disk-IO optimization + Low-latency query support • Hive/Pig/Spark/Flink can run on Tez • Apache Spark • DAG + Low-latency in-memory processing calculation + Functional programming-style interface • Apache Flink • DAG + Shuffle optimization + Streaming/Batch processing transparent interface • Like, Apache Beam Features of Modern processing frameworks
  • 24. 24Copyright©2016 NTT corp. All Rights Reserved. • Large-scale batch processing • Yes, MapReduce can handle it well • Sometimes MapReduce is still best choice for the stability • Short query for data analytics • Trial and error to check the mood of data with changing queries • Google Dremel can handle it well • After evolving interface of MR(SQL), users need improvement of latency, not throughput • Currently, we take on different processing framework • However, it’s a just workaround • YARN think the former is 1st citizen at first. • How can we run the latter on YARN effectively? Workloads modern processing frameworks can handle
  • 25. 25Copyright©2016 NTT corp. All Rights Reserved. • Overhead of job launching • Because YARN needs 2 step to launch jobs: master and workers • Taking time to warm up • Because YARN terminates containers after the job exits • Tez/Spark/Flink runs on server-side JVM, so these problems are remarkable → In this talk, I’ll talk about how Tez and Spark manages resources on YARN! Problem for running low-latency query on YARN
  • 26. 26Copyright©2016 NTT corp. All Rights Reserved. • MapReduce-inspired DAG processing framework for large-scale data effectively under DSL(Pig/Hive) • Passing key-value pair internally like MapReduce • Runtime DAG rewriting • DAG plan itself is dumped by Hive optimizer or Pig Optimizer • Being able to describe DAG without Sorting while shuffling • Example of Hive query execution flow • Writing Hive query • Submitting Hive query • Dumping DAG of Tez by Hive optimizer • Executing Tez Overview of Apache Tez
  • 27. 27Copyright©2016 NTT corp. All Rights Reserved. • Container reusing • Reusing containers instead of releasing containers in a job as possible as master can • Keeping allocated containers if session lives after completing jobs Q. What is different between MapReduce’s Container Reuse and Tez’s one? A. • Container reusing is removed by accident at the implementation of MapReduce on YARN • Session enable us to reuse containers across multiple jobs Effective resource management in Tez
  • 28. 28Copyright©2016 NTT corp. All Rights Reserved. • Long Lived and Process • Allocating containers without relationship with “Session” → Being able to reuse warm-up JVM and cache More low latency • LLAP uses Registry Service • Q. Is it like daemon of database? A. • That’s right. The difference is that Tez can accelerate jobs by adding resources from YARN. Effective resource management in Tez(cont.)
  • 29. 29Copyright©2016 NTT corp. All Rights Reserved. • Rule • Keeping container long can improve latency but can worsen throughput • Keeping container short can improve throughput(resource utilization) but can worsen latency • Container reuse • tez.am.container.idle.release-timeout-min.millis • Keeping containers in this period • tez.am.container.idle.release-timeout-max.millis • Releasing containers after the period • LLAP • Looking like being WIP • Note: Hive configuration override Tez configuration, so please test it carefully Tuning parameter
  • 30. 30Copyright©2016 NTT corp. All Rights Reserved. • Hive on Tez assumed batch processing at first • Releasing all containers after finishing jobs → Attaching more weight on throughput than latency • As expanding use case of Hive, users need low latency for interactive queries running on REPL • Improving latency based on high-throughput architecture Tez on YARN summary
  • 31. 31Copyright©2016 NTT corp. All Rights Reserved. • In-memory processing framework (originally!) • Standalone mode is supported • Spark on YARN is also supported • Spark has 2 kinds of running mode on YARN • yarn-cluster mode • Submitting job (spark-submit) • yarn-client mode • REPL style (spark-shell) • Both of them run on YARN Overview of Apache Spark
  • 32. 32Copyright©2016 NTT corp. All Rights Reserved. • Launching Spark driver on YARN container • Assuming spark-submit command yarn-cluster mode ResourceManager NodeManager Spark App Master NodeManager Worker 2Worker 1 Client Driver 1. Submitting jobs via spark-submit 2. Launching master 3. Allocating resources for workers by Spark AppMaster
  • 33. 33Copyright©2016 NTT corp. All Rights Reserved. • Launching Spark driver at client-side • Assuming REPL like spark-shell yarn-client mode ResourceManager NodeManager Spark App Master NodeManager Worker 2Worker 1 Client Driver 1. Launching spark-shell, Connecting to RM 2. Launching Master3. Allocating resources for workers by Spark AppMaster 3 4. Enjoy interactive programming!!
  • 34. 34Copyright©2016 NTT corp. All Rights Reserved. • Spark have a feature of container reuse of Tez natively • Spark can handle low-latency query very well • However, Spark cannot “release containers” → State of tasks in memory will be lost → Cannot release containers still when allocated resources is not used → Decreasing resource utilization Resource management of Spark on YARN NodeManager Container Container NodeManager Container Container Stage 1 Stage 2 100% 100% 100% 100% 0% 0% 0%100%
  • 35. 35Copyright©2016 NTT corp. All Rights Reserved. • Allocating/releasing containers based on workload • Intermediate data is persisted by using NodeManager’s Shuffle Service → Not in memory data • Triggers of allocating/releasing containers is timer Dynamic resource allocation (since Spark 1.2) NodeManager Exe1 NodeManager Exe 2 Exe 3 Exe 4 (idle) Intermediate data Spark AppMaster Releasing containers when executor is idle for a period Doubling number of Executors if executors has tasks for a period Intermediate data
  • 36. 36Copyright©2016 NTT corp. All Rights Reserved. • Rule • Keeping container long can improve latency but can worsen throughput • Keeping container short can improve throughput(resource utilization) but can worsen latency → Please tune this points! • Max/Min/Initial number of YARN containers → high numbers of executors increases throughput, but can decrease resource utilizations • spark.dynamicAllocation.maxExecutors • spark.dynamicAllocation.minExecutors • spark.dynamicAllocation.initialExecutors • A period to double number of containers for Executers when all Executors are active → large value can improve latency and throughput but worsen stability and resource utilization • spark.dynamicAllocation.schedulerBacklogTimeout • A period to keep containers → large value can improve latency but worsen resource utilization • spark.dynamicAllocation.sustainedSchedulerBacklogTimeout • A period to release containers which is empty → large value can improve latency but worsen • spark.dynamicAllocation.executorIdleTimeout • Releasing executor with cache data if the executor is idle in the specified period • spark.dynamicAllocation.cachedExecutorIdleTimeout Tuning paramaters
  • 37. 37Copyright©2016 NTT corp. All Rights Reserved. • Spark assumes interactive programming interface natively Container is not released by default → Attaching more weight on latency than throughput • Adding a features to release containers dynamically when supporting YARN • Increasing throughput based on low-latency architecture Spark on YARN summary
  • 38. 38Copyright©2016 NTT corp. All Rights Reserved. • 3 kinds of features YARN provides 1. Resource management in cluster 2. Timeline Server 3. Service registry • Dive into resource management in Tez/Spark • Apache Tez, originated from batch processing • Apache Spark, originated from interactive processing Both of Tez and Spark can process wider query by switching high throughput mode and low latency mode Summary