SlideShare a Scribd company logo
Apache Hadoop YARN:
State of the union
Wangda Tan, Billie Rinaldi
@ Hortonworks
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speaker intro
 Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on
YARN, worked on features of scheduler like node label / preemption, etc.
 Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level
Apache projects and incubating projects, currently focusing on long running services
and Docker containers on Apache Hadoop YARN
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 Past
 State of the Union
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi colored YARN
 Multi-colored YARN
– Apps
– Long running services
 It’s all about data!
 Layers that enable applications and
higher order frameworks that interact
with data
https://www.flickr.com/photos/happyskrappy/15699919424
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page 5
Containerization
Containers
GPUs /
FPGAs
More
powerful
scheduling
Much faster
scheduling
Scale
SLAs
Usability
Service
workloads
Categories of recent initiatives
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform – Today and Tomorrow
Platform Services
Storage
Resource
Management
Service
Discovery Cluster Mgmt
Monitoring
Alerts
IOT Assembly
Kafk
a
Storm HBase Solr
Security
Governance
MR Tez
Spark
Hive / Pig
LLAP
Flink
REEF
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Past: A quick history
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: Pre GA
• Sub-project of Apache Hadoop
• Alphas and betas
– In production at several large sites for MapReduce already by that time
June-July 2010 August 2011 May 2012 August 2013
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 1/3
2.2 2.3 2.4 2.5 2.6 2.7
15 October 2013 24 February 2014 07 April 2014 11 August 2014
• 1st GA
• MR binary
compatibility
• YARN API
cleanup
• Testing!
• 1st Post GA
• Bug fixes
• Alpha features
• RM Fail-over
• CS
Preemption
• Timeline
Service V1
• Writable
REST APIs
• Timeline
Service V1
security
• Rolling
Upgrades
• Docker
• Node labels
18 November 2014
• Moving to
JDK 7+
• Pluggable
YARN
authentication
21 Apr 2015
Most Essential Requirements for enterprise usage
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 2/3
2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3
25 January 2016 25 August 2016 18 October 2016 04 August 2017
• Application
Priority
• Reservations
• Node labels
improvements
22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017
Enterprise consumption, need stablization
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 3/3
3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0
Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017
• GPU/FPGA
• Native
Service
• Placement
Constraints
06 April 201825 March 201817 Nov 2017
• YARN
Federation
• Opportunistic
Container
• Resource
types
• New YARN UI
• Timeline
service V2
More requirements comes (computation intensive, larger, services)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 2.8/2.9
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application priorities – YARN-1963
• Allocate resource to important apps first.
• Within a leaf-queue
FIFO Policy App 1 App 2 App 3 App 4
FIFO Policy
With priorities
App 1App 2App 3App 4
Higher priority  Lower priority
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Queue priorities – YARN-5864
 For interactive / SLA sentitive workload
 Today
– Give to the least satisfied queue first
 With priorities
– Give to the highest priority queue first (for important workload).
root
A
20% Configured Capacity
But 5% of used Capacity
Usage = 5/20 = 25%
B
80% Configured Capacity
But 8% of used Capacity
Usage 8/80 = 10%
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reservations – YARN-1051
• “Run my workload tomorrow at 6AM”
• Persistence of the plans with RM failover: YARN-2573
Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0/3.1
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Looking at the Scale!
 Tons of sites with clusters made up of large amount of nodes
– Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.
 Previously, largest clusters
– 6K-8K
 Now: 40K nodes (federated), 20K nodes (single cluster).
 Roadmap: To 100K and beyond
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving towards Global & Fast Scheduling
 Problems
– Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
– Several coarse grained locks
 Current effort made us where we improved to
– Look at several nodes at a time
– Fine grained locks
– Multiple allocator threads
– YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
– 10X throughput gains with enhancement added recently
– Much better placement decisions
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles and custom resource types
 Past
– Supports only Memory and CPU
 Now
– A generalized vector
– Custom Resource Types!
 Ease of resource requesting model using
profiles
NodeManager
Memory
CPU
GPU
FPGA
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 Cores
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GPU support on YARN
 Why need isolation?
– Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
 GPU isolation on YARN: .
– Granularity is for per-GPU device.
– Use Cgroups / docker to enforce the isolation.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FPGA on YARN!
 FPGA isolation on YARN: .
– Granularity is for per-FPGA device.
– Use Cgroups to enforce the isolation.
 Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other
FPGA SDK.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better placement strategies (YARN-6592)
 Affinity  Anti-affinity
HBase Sto
rm
Hbase-
Region
Server
Hbase-
Region
Server
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Federation!
 Enables applications to scale to 100k of thousands of nodes
 Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters
 Federation negotiates with sub-clusters RM’s and provide resources to the application
 Applications can schedule tasks on any node
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services support
 Application & Services upgrades
– “Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
– regionserver-0.hbase-app-3.hadoop.yarn.site
 Placement policies
 Container restart
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified APIs for service definitions
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– YARN-4793 Simplified API layer for services and beyond
 Spawn services & Manage them
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services Framework
 Platform is only as good as the tools
 A native YARN services framework
– YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Assembly: Supporting a DAG of apps:
– SLIDER-875
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service
 Application History
– “Where did my containers run?”
– “Why is my application slow?”
– “Is it really slow?”
– “Why is my application failing?”
– “What happened with my application?
Succeeded?”
 Cluster History
– Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
– “Why is my cluster slow?”
– “Why is my cluster down?”
– “What happened in my clusters?”
 Collect and use past data
– To schedule “my application” better
– To do better capacity planning
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service 2.0
• Next generation
– Today’s solution helped us understand the space
– Limited scalability and availability
• “Analyzing Hadoop Clusters is becoming a big-data problem”
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.2 and beyond
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Node Attributes (YARN-3409)
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container overcommit (YARN-1011)
 Each node has some allocated but unutilized capacities
 Use such capacity to run opportunistic tasks
 Preemption such tasks when needed
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-spawning of system services
(YARN-8048)
• System services is services required by
YARN, need to be started during
bootstrap.
• For example YARN ATSv2 needs Hbase, so
Hbase is system service of YARN.
• Only Admin can configure
• Started along with ResourceManager
• Place spec files under
yarn.service.system-service.dir FS path
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons learned running a container cloud
on YARN
https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on-
yarn/
4PM, Room I, Wed April 18th
-- Related Session --
Billie Rinaldi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep learning on YARN: running
distributed Tensorflow, etc. on Hadoop
clusters
https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed-
tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/
2PM, Room II, Wed April 18th
-- Related Session --
Wangda Tan
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BoF’s: Apache Hadoop – YARN, HDFS
https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs
Thursday April 19th
-- Related Session --

More Related Content

PPTX
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
PPTX
Apache Ambari: Past, Present, Future
PPTX
A Multi Colored YARN
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Debugging Apache Hadoop YARN Cluster in Production
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Ozone- Object store for Apache Hadoop
PPTX
Apache Hadoop 0.23
Dataworks Berlin Summit 18' - Deep learning On YARN - Running Distributed Te...
Apache Ambari: Past, Present, Future
A Multi Colored YARN
Apache Hadoop YARN: Past, Present and Future
Debugging Apache Hadoop YARN Cluster in Production
Apache Hadoop YARN: Past, Present and Future
Ozone- Object store for Apache Hadoop
Apache Hadoop 0.23

What's hot (20)

PPTX
Apache Ambari - What's New in 2.4
PDF
Next Generation Execution for Apache Storm
PPTX
Running Enterprise Workloads in the Cloud
PPTX
Apache Ambari - What's New in 2.1
PPTX
Double Your Hadoop Hardware Performance with SmartSense
PPTX
Apache Ambari - What's New in 2.0.0
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
PPTX
Running a container cloud on YARN
PDF
What is new in Apache Hive 3.0?
PPTX
Apache Ambari Meetup - AMS & Grafana
PDF
Spark Security
PPTX
Apache Slider
PPTX
Apache Hive 2.0: SQL, Speed, Scale
PPTX
Hive2.0 big dataspain-nov-2016
PDF
What s new in spark 2.3 and spark 2.4
PPTX
Hive present-and-feature-shanghai
PDF
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
PPTX
LLAP: long-lived execution in Hive
PPTX
Running Services on YARN
PPTX
Enabling Diverse Workload Scheduling in YARN
Apache Ambari - What's New in 2.4
Next Generation Execution for Apache Storm
Running Enterprise Workloads in the Cloud
Apache Ambari - What's New in 2.1
Double Your Hadoop Hardware Performance with SmartSense
Apache Ambari - What's New in 2.0.0
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Running a container cloud on YARN
What is new in Apache Hive 3.0?
Apache Ambari Meetup - AMS & Grafana
Spark Security
Apache Slider
Apache Hive 2.0: SQL, Speed, Scale
Hive2.0 big dataspain-nov-2016
What s new in spark 2.3 and spark 2.4
Hive present-and-feature-shanghai
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
LLAP: long-lived execution in Hive
Running Services on YARN
Enabling Diverse Workload Scheduling in YARN
Ad

Similar to Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union (20)

PPTX
YARN - Past, Present, & Future
PPTX
Apache Hadoop YARN: Present and Future
PDF
Apache Hadoop YARN: State of the Union
PDF
Apache Hadoop YARN: state of the union
PDF
Apache Hadoop YARN: state of the union - Tokyo
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Apache Hadoop 3 updates with migration story
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
PPTX
Apache Hadoop YARN: state of the union
PPTX
MHUG - YARN
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
PPTX
Scheduling Policies in YARN
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
PPTX
YARN - Hadoop Next Generation Compute Platform
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
YARN - Past, Present, & Future
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop 3 updates with migration story
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Apache Hadoop YARN: state of the union
MHUG - YARN
Apache Hadoop YARN - Enabling Next Generation Data Applications
YARN - Next Generation Compute Platform fo Hadoop
Apache Hadoop 3.0 What's new in YARN and MapReduce
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Scheduling Policies in YARN
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
YARN - Hadoop Next Generation Compute Platform
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Combine SAS High-Performance Capabilities with Hadoop YARN
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Ad

More from Wangda Tan (6)

PDF
Apache Submarine: Unified Machine Learning Platform
PDF
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
PPTX
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
PPTX
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
PPTX
Node labels in YARN
PPTX
Hadoop summit-diverse-workload
Apache Submarine: Unified Machine Learning Platform
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Node labels in YARN
Hadoop summit-diverse-workload

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Monthly Chronicles - July 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.
The Rise and Fall of 3GPP – Time for a Sabbatical?
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Weekly Chronicles - August'25 Week I
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union

  • 1. Apache Hadoop YARN: State of the union Wangda Tan, Billie Rinaldi @ Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speaker intro  Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on YARN, worked on features of scheduler like node label / preemption, etc.  Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level Apache projects and incubating projects, currently focusing on long running services and Docker containers on Apache Hadoop YARN
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Introduction  Past  State of the Union
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi colored YARN  Multi-colored YARN – Apps – Long running services  It’s all about data!  Layers that enable applications and higher order frameworks that interact with data https://www.flickr.com/photos/happyskrappy/15699919424
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 5 Containerization Containers GPUs / FPGAs More powerful scheduling Much faster scheduling Scale SLAs Usability Service workloads Categories of recent initiatives
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform – Today and Tomorrow Platform Services Storage Resource Management Service Discovery Cluster Mgmt Monitoring Alerts IOT Assembly Kafk a Storm HBase Solr Security Governance MR Tez Spark Hive / Pig LLAP Flink REEF
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Past: A quick history
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: Pre GA • Sub-project of Apache Hadoop • Alphas and betas – In production at several large sites for MapReduce already by that time June-July 2010 August 2011 May 2012 August 2013
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 1/3 2.2 2.3 2.4 2.5 2.6 2.7 15 October 2013 24 February 2014 07 April 2014 11 August 2014 • 1st GA • MR binary compatibility • YARN API cleanup • Testing! • 1st Post GA • Bug fixes • Alpha features • RM Fail-over • CS Preemption • Timeline Service V1 • Writable REST APIs • Timeline Service V1 security • Rolling Upgrades • Docker • Node labels 18 November 2014 • Moving to JDK 7+ • Pluggable YARN authentication 21 Apr 2015 Most Essential Requirements for enterprise usage
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 2/3 2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3 25 January 2016 25 August 2016 18 October 2016 04 August 2017 • Application Priority • Reservations • Node labels improvements 22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017 Enterprise consumption, need stablization
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 3/3 3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0 Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017 • GPU/FPGA • Native Service • Placement Constraints 06 April 201825 March 201817 Nov 2017 • YARN Federation • Opportunistic Container • Resource types • New YARN UI • Timeline service V2 More requirements comes (computation intensive, larger, services)
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 2.8/2.9
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application priorities – YARN-1963 • Allocate resource to important apps first. • Within a leaf-queue FIFO Policy App 1 App 2 App 3 App 4 FIFO Policy With priorities App 1App 2App 3App 4 Higher priority  Lower priority
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Queue priorities – YARN-5864  For interactive / SLA sentitive workload  Today – Give to the least satisfied queue first  With priorities – Give to the highest priority queue first (for important workload). root A 20% Configured Capacity But 5% of used Capacity Usage = 5/20 = 25% B 80% Configured Capacity But 8% of used Capacity Usage 8/80 = 10%
  • 15. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reservations – YARN-1051 • “Run my workload tomorrow at 6AM” • Persistence of the plans with RM failover: YARN-2573 Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
  • 16. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0/3.1
  • 17. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Looking at the Scale!  Tons of sites with clusters made up of large amount of nodes – Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.  Previously, largest clusters – 6K-8K  Now: 40K nodes (federated), 20K nodes (single cluster).  Roadmap: To 100K and beyond
  • 18. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving towards Global & Fast Scheduling  Problems – Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. – Several coarse grained locks  Current effort made us where we improved to – Look at several nodes at a time – Fine grained locks – Multiple allocator threads – YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! – 10X throughput gains with enhancement added recently – Much better placement decisions
  • 19. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles and custom resource types  Past – Supports only Memory and CPU  Now – A generalized vector – Custom Resource Types!  Ease of resource requesting model using profiles NodeManager Memory CPU GPU FPGA Profile Memory CPU GPU Small 2 GB 4 Cores 0 Cores Medium 4 GB 8 Cores 0 Cores Large 16 GB 16 Cores 4 Cores
  • 20. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GPU support on YARN  Why need isolation? – Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily.  GPU isolation on YARN: . – Granularity is for per-GPU device. – Use Cgroups / docker to enforce the isolation.
  • 21. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FPGA on YARN!  FPGA isolation on YARN: . – Granularity is for per-FPGA device. – Use Cgroups to enforce the isolation.  Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other FPGA SDK.
  • 22. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better placement strategies (YARN-6592)  Affinity  Anti-affinity HBase Sto rm Hbase- Region Server Hbase- Region Server
  • 23. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Federation!  Enables applications to scale to 100k of thousands of nodes  Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters  Federation negotiates with sub-clusters RM’s and provide resources to the application  Applications can schedule tasks on any node
  • 24. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 25. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services support  Application & Services upgrades – “Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757 – regionserver-0.hbase-app-3.hadoop.yarn.site  Placement policies  Container restart
  • 26. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified APIs for service definitions  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – YARN-4793 Simplified API layer for services and beyond  Spawn services & Manage them
  • 27. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services Framework  Platform is only as good as the tools  A native YARN services framework – YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Assembly: Supporting a DAG of apps: – SLIDER-875
  • 28. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience API based queue management Decentralized (YARN-5734) Improved logs management (YARN-4904) Live application logs
  • 29. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 30. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 31. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service  Application History – “Where did my containers run?” – “Why is my application slow?” – “Is it really slow?” – “Why is my application failing?” – “What happened with my application? Succeeded?”  Cluster History – Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” – “Why is my cluster slow?” – “Why is my cluster down?” – “What happened in my clusters?”  Collect and use past data – To schedule “my application” better – To do better capacity planning
  • 32. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service 2.0 • Next generation – Today’s solution helped us understand the space – Limited scalability and availability • “Analyzing Hadoop Clusters is becoming a big-data problem” – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries
  • 33. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.2 and beyond
  • 34. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node Attributes (YARN-3409) • Node Partition vs. Node Attribute • Partition: • One partition for one node • ACL • Shares between queues • Preemption enforced. • Attribute: • For container placement • No ACL/Shares on attributes • First-come-first-serve
  • 35. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container overcommit (YARN-1011)  Each node has some allocated but unutilized capacities  Use such capacity to run opportunistic tasks  Preemption such tasks when needed
  • 36. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-spawning of system services (YARN-8048) • System services is services required by YARN, need to be started during bootstrap. • For example YARN ATSv2 needs Hbase, so Hbase is system service of YARN. • Only Admin can configure • Started along with ResourceManager • Place spec files under yarn.service.system-service.dir FS path
  • 37. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons learned running a container cloud on YARN https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on- yarn/ 4PM, Room I, Wed April 18th -- Related Session -- Billie Rinaldi
  • 38. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep learning on YARN: running distributed Tensorflow, etc. on Hadoop clusters https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed- tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/ 2PM, Room II, Wed April 18th -- Related Session -- Wangda Tan
  • 39. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BoF’s: Apache Hadoop – YARN, HDFS https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs Thursday April 19th -- Related Session --

Editor's Notes

  • #4: For new people, 10%
  • #5: Appplication centric
  • #6: Categories of recent initiatives
  • #10: Many users are requesting most necessary features, so that’s why we release so fast. Many of this featuress a necessary to run YARN cluster, such as RM fail-over/ HA, etc
  • #11: 2.6/2.7/2.8 are versions which most prod cluster are using, so many feature development remains in the background and many community effort are focusing on stablizing features.
  • #12: Again, the most essential YARN doesn’t meet user’s requirement anymore, that’s why we released 3 minor releases in 6 months, and include features like: YARN Federation (larger cluster) GPU / FPGA, etc.
  • #22: Even though TF provide options to use GPU memory less than whole device provided. But we cannot enforce this from external.
  • #34: High level talk on ATSv2, it is scalable solution compared to 1.5.