SlideShare a Scribd company logo
YARN Code Overview
Ocular bleeding is no reason to stop programing!

© Hortonworks Inc. 2013

Page 1
Quick Bio – Joseph Niemiec
• Hadoop user for 2+ years
• 1 of 5 Author’s for Apache Hadoop YARN
• Originally used Hadoop for location based services
(March 2014)

– Destination Prediction
– Traffic Analysis
– Effects of weather at client locations on call center call types

• Pending Patent in Automotive/Telematics domain
• Defensive Paper on M2M Validation
• Started on analytics to be better at an MMORPG

© Hortonworks Inc. 2013
Agenda
• What Is YARN
• YARN Concepts & Architecture
• Code and more Code
• Q&A

© Hortonworks Inc. 2013

Page 3
From Batch To Anything
Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce
(data processing)

MapReduce

Others
(data processing)

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, reliable storage)

© Hortonworks Inc. 2013

Page 4
Concepts
• Application
–Application is a job submitted to the framework
–Examples
– Map Reduce Job
– MoYa Cluster

• Container
–Basic unit of allocation
–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU

–Replaces the fixed map/reduce slots

© Hortonworks Inc. 2013

5
Architecture
• Resource Manager
–Global resource scheduler
–Hierarchical queues

• Node Manager
–Per-machine agent
–Manages the life-cycle of container
–Container resource monitoring

• Application Master
–Per-application
–Manages application scheduling and task execution
–E.g. MapReduce Application Master
© Hortonworks Inc. 2013

6
To the code!

© Hortonworks Inc. 2013

Page 7
Q&A

© Hortonworks Inc. 2013

Page 8
YARN - ApplicationMaster
• ApplicationMaster
– ApplicationSubmissionContext is the complete specification of the
ApplicationMaster, provided by Client
– ResourceManager responsible for allocating and launching
ApplicationMaster container

ApplicationSubmissionContext
resourceRequest
containerLaunchContext
appName
queue

© Hortonworks Inc. 2013

Page 9
YARN – Resource Allocation & Usage
• ContainerLaunchContext
– The context provided by ApplicationMaster to NodeManager to
launch the Container
– Complete specification for a process
– LocalResource used to specify container binary and
dependencies
– NodeManager responsible for downloading from shared namespace
(typically HDFS)

ContainerLaunchContext
container
commands
environment
localResources

LocalResource
uri
type

© Hortonworks Inc. 2013

Page 10
YARN – Resource Allocation & Usage
• ResourceRequest

priority

1

© Hortonworks Inc. 2013

<4gb, 1 core>

numContainers
1

rack0

1

*

<2gb, 1 core>

resourceName
host01

0

capability

1

*

1

Page 11
YARN – Resource Allocation & Usage
• Container
– The basic unit of allocation in YARN
– The result of the ResourceRequest provided by ResourceManager
to the ApplicationMaster
– A specific amount of resources (cpu, memory etc.) on a specific
machine
Container
containerId
resourceName
capability

tokens

© Hortonworks Inc. 2013

Page 12

More Related Content

PPTX
Introduction to the Hortonworks YARN Ready Program
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PPTX
YARN Ready: Integrating to YARN with Tez
PDF
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
PDF
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Introduction to the Hortonworks YARN Ready Program
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
YARN Ready: Integrating to YARN with Tez
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop

What's hot (20)

PDF
Splunk-hortonworks-risk-management-oct-2014
PPTX
Create a Smarter Data Lake with HP Haven and Apache Hadoop
PDF
Hortonworks and HP Vertica Webinar
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
PDF
Discover.hdp2.2.h base.final[2]
PPTX
State of the Union with Shaun Connolly
PDF
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
PDF
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
PDF
Apache Hadoop on the Open Cloud
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PDF
Hortonworks and Platfora in Financial Services - Webinar
Splunk-hortonworks-risk-management-oct-2014
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Hortonworks and HP Vertica Webinar
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Combine SAS High-Performance Capabilities with Hadoop YARN
Discover.hdp2.2.h base.final[2]
State of the Union with Shaun Connolly
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Enterprise Hadoop with Hortonworks and Nimble Storage
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Apache Hadoop on the Open Cloud
Hp Converged Systems and Hortonworks - Webinar Slides
Supporting Financial Services with a More Flexible Approach to Big Data
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks and Platfora in Financial Services - Webinar
Ad

Viewers also liked (19)

PPTX
YARN Ready - Integrating to YARN using Slider Webinar
PDF
Hortonworks Technical Workshop - build a yarn ready application with apache ...
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PDF
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
PPTX
Get Started Building YARN Applications
PDF
Dataguise hortonworks insurance_feb25
PDF
Hortonworks sqrrl webinar v5.pptx
PPTX
Apache Ambari: Managing Hadoop and YARN
PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
PPTX
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
PPTX
YARN Ready: Apache Spark
PDF
Discover.hdp2.2.storm and kafka.final
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
PDF
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
PDF
Hortonworks tech workshop in-memory processing with spark
PPTX
HPE and Hortonworks join forces to Deliver Healthcare Transformation
PPTX
Hortonworks Data In Motion Webinar Series Pt. 2
PPTX
Hortonworks Data in Motion Webinar Series - Part 1
PDF
Hortonworks Technical Workshop: Apache Ambari
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Developing YARN Applications - Integrating natively to YARN July 24 2014
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Get Started Building YARN Applications
Dataguise hortonworks insurance_feb25
Hortonworks sqrrl webinar v5.pptx
Apache Ambari: Managing Hadoop and YARN
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
YARN Ready: Apache Spark
Discover.hdp2.2.storm and kafka.final
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hortonworks tech workshop in-memory processing with spark
HPE and Hortonworks join forces to Deliver Healthcare Transformation
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Technical Workshop: Apache Ambari
Ad

Similar to Hortonworks Yarn Code Walk Through January 2014 (20)

PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
PPTX
Apache Hadoop YARN: best practices
PPTX
Yarnthug2014
PPTX
YARN - Presented At Dallas Hadoop User Group
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
PPTX
PPTX
MHUG - YARN
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PDF
YARN: Future of Data Processing with Apache Hadoop
PPTX
YARN - Hadoop Next Generation Compute Platform
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PPTX
Apache Tez : Accelerating Hadoop Query Processing
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PPTX
Hadoop: Beyond MapReduce
PPTX
Tez big datacamp-la-bikas_saha
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Apache Hadoop YARN: best practices
Yarnthug2014
YARN - Presented At Dallas Hadoop User Group
Apache Hadoop YARN - Enabling Next Generation Data Applications
Running Non-MapReduce Big Data Applications on Apache Hadoop
MHUG - YARN
Apache Hadoop YARN - The Future of Data Processing with Hadoop
YARN: Future of Data Processing with Apache Hadoop
YARN - Hadoop Next Generation Compute Platform
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez : Accelerating Hadoop Query Processing
YARN - Next Generation Compute Platform fo Hadoop
Hadoop: Beyond MapReduce
Tez big datacamp-la-bikas_saha

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
NewMind AI Monthly Chronicles - July 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced methodologies resolving dimensionality complications for autism neur...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Hortonworks Yarn Code Walk Through January 2014

  • 1. YARN Code Overview Ocular bleeding is no reason to stop programing! © Hortonworks Inc. 2013 Page 1
  • 2. Quick Bio – Joseph Niemiec • Hadoop user for 2+ years • 1 of 5 Author’s for Apache Hadoop YARN • Originally used Hadoop for location based services (March 2014) – Destination Prediction – Traffic Analysis – Effects of weather at client locations on call center call types • Pending Patent in Automotive/Telematics domain • Defensive Paper on M2M Validation • Started on analytics to be better at an MMORPG © Hortonworks Inc. 2013
  • 3. Agenda • What Is YARN • YARN Concepts & Architecture • Code and more Code • Q&A © Hortonworks Inc. 2013 Page 3
  • 4. From Batch To Anything Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce (data processing) MapReduce Others (data processing) YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, reliable storage) © Hortonworks Inc. 2013 Page 4
  • 5. Concepts • Application –Application is a job submitted to the framework –Examples – Map Reduce Job – MoYa Cluster • Container –Basic unit of allocation –Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU –Replaces the fixed map/reduce slots © Hortonworks Inc. 2013 5
  • 6. Architecture • Resource Manager –Global resource scheduler –Hierarchical queues • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master © Hortonworks Inc. 2013 6
  • 7. To the code! © Hortonworks Inc. 2013 Page 7
  • 9. YARN - ApplicationMaster • ApplicationMaster – ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client – ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue © Hortonworks Inc. 2013 Page 9
  • 10. YARN – Resource Allocation & Usage • ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext container commands environment localResources LocalResource uri type © Hortonworks Inc. 2013 Page 10
  • 11. YARN – Resource Allocation & Usage • ResourceRequest priority 1 © Hortonworks Inc. 2013 <4gb, 1 core> numContainers 1 rack0 1 * <2gb, 1 core> resourceName host01 0 capability 1 * 1 Page 11
  • 12. YARN – Resource Allocation & Usage • Container – The basic unit of allocation in YARN – The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster – A specific amount of resources (cpu, memory etc.) on a specific machine Container containerId resourceName capability tokens © Hortonworks Inc. 2013 Page 12

Editor's Notes

  • #5: So while Hadoop 1.x had its uses this is really about turning Hadoop into the next generation platform. So what does that mean? A platform should be able to do multiple things, ergo more then just batch processing. Need Batch, Interactive, Online, and Streaming capabilities to really turn Hadoop into a Next Gen Platform.SCALES! Yahoo plans to move into a 10k node cluster
  • #6: Now we have a concept of deploying applications into the hadoop clusterThese applications run in containers of set resources
  • #7: RM takes place of JT and still has scheduling ques and such like the fair, capacity and hierarchical ques