SlideShare a Scribd company logo
Writing Application Frameworks
on Apache Hadoop YARN


Hitesh Shah
hitesh@hortonworks.com




© Hortonworks Inc. 2011      Page 1
Hitesh Shah - Background
• Member of Technical Staff at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
  infrastructure pieces all the way from data storage
  platforms to high throughput online ad-serving
  systems.




     Architecting the Future of Big Data
                                                        Page 2
     © Hortonworks Inc. 2011
Agenda

•YARN Architecture and Concepts
•Writing a New Framework




   Architecting the Future of Big Data
                                         Page 3
   © Hortonworks Inc. 2011
YARN Architecture
• Resource Manager
  –Global resource scheduler
  –Hierarchical queues
• Node Manager
  –Per-machine agent
  –Manages the life-cycle of container
  –Container resource monitoring
• Application Master
  –Per-application
  –Manages application scheduling and task execution
  –E.g. MapReduce Application Master

     Architecting the Future of Big Data
                                                       Page 4
     © Hortonworks Inc. 2011
YARN Architecture

                                                            Node
                                                           Manager


                                                    Container   App Mstr


            Client

                                         Resource           Node
                                         Manager           Manager
            Client

                                                    App Mstr    Container




              MapReduce Status                              Node
                                                           Manager
                 Job Submission
                 Node Status
               Resource Request                     Container   Container




   Architecting the Future of Big Data
                                                                            Page 5
   © Hortonworks Inc. 2011
YARN Concepts
• Application ID
  –Application Attempt IDs
• Container
  –ContainerLaunchContext
• ResourceRequest
  –Host/Rack/Any match
  –Priority
  –Resource constraints
• Local Resource
  –File/Archive
  –Visibility – public/private/application


      Architecting the Future of Big Data
                                             Page 6
      © Hortonworks Inc. 2011
What you need for a new Framework
• Application Submission Client
  –For example, the MR Job Client
• Application Master
  –The core framework library
• Application History ( optional )
  –History of all previously run instances
• Auxiliary Services ( optional )
  –Long-running application-specific services running on the
   NodeManager




     Architecting the Future of Big Data
                                                               Page 7
     © Hortonworks Inc. 2011
Use Case: Distributed Shell
• Take a user-provided script               Node
  or application and run it on a            Manager
  set of nodes in the Cluster
                                               DS AppMaster

• Input:
   – User Script to execute
   – Number of containers to run on         Node
                                            Manager
   – Variable arguments for each
     different container                         Shell Script
   – Memory requirements for the
     shell script                           Node
   – Output Location/Dir                    Manager
                                                 Shell Script


      Architecting the Future of Big Data
                                                                Page 8
      © Hortonworks Inc. 2011
Client: RPC calls
• Uses ClientRM Protocol
                                                        ClientRMProtocol#getNewApplication

• Get a new Application
  ID from the RM
                                                        ClientRMProtocol#submitApplication



• Application Submission                       CLIENT
                                                                                                RM

                                                        ClientRMProtocol#getApplicationReport


• Application Monitoring
                                                         ClientRMProtocol#killApplication


• Kill the Application?




         Architecting the Future of Big Data
                                                                                                Page 9
         © Hortonworks Inc. 2011
Client
• Registration with the RM
  –New Application ID


• Application Submission
  –User information
  –Scheduler queue
  –Define the container for the Distributed Shell App Master via
   the ContainerLaunchContext

• Application Monitoring
  – AppMaster host details with tokens if needed, tracking url
  – Application Status (submitted/running/finished)


      Architecting the Future of Big Data
                                                                 Page 10
      © Hortonworks Inc. 2011
Defining a Container
• ContainerLaunchContext class
  –Can run a shell script, a java process or launch a VM


• Command(s) to run
• Local resources needed for the process to run
  –Dependent jars, native libs, data files/archives
• Environment to setup
  –Java Classpath
• Security-related data
  –Container Tokens



      Architecting the Future of Big Data
                                                           Page 11
      © Hortonworks Inc. 2011
Application Master: RPC calls
• AMRM and CM protocols
                                             Client

• Register AM with RM                                         AMRM.registerAM


• Ask RM to allocate
  resources                                                       AMRM.allocate
                                                         AM
                                                                                         RM
• Launch tasks on
  allocated containers                                                       AMRM.
                                                                            finishAM
                                                App-specific
• Manage tasks to final                            RPC

  completion
                                                               CM.startContainer

• Inform RM of completion                               NM      NM




       Architecting the Future of Big Data
                                                                                      Page 12
       © Hortonworks Inc. 2011
Application Master
• Setup RPC to handle requests from Client and/or tasks launched
  on Containers

• Register and send regular heartbeats to the RM

• Request resources from the RM.

• Launch user shell script on containers as and when allocated.

• Monitor status of user script of remote containers and manage
  failures by retrying if needed.

• Inform RM of completion when application is done.


      Architecting the Future of Big Data
                                                                  Page 13
      © Hortonworks Inc. 2011
AMRM#allocate
• Request:
  – Containers needed
      – Not a delta protocol
      – Locality constraints: Host/Rack/Any
      – Resource constraints: memory
      – Priority-based assignments

  – Containers to release – extra/unwanted?
      – Only non-launched containers

• Response:
  – Allocated Containers
      – Launch or release

  – Completed Containers
      – Status of completion

     Architecting the Future of Big Data
                                              Page 14
     © Hortonworks Inc. 2011
YARN Applications
• Data Processing:
  – OpenMPI on Hadoop
  – Spark (UC Berkeley)
       – Shark ( Hive-on-Spark )

  – Real-time data processing
       – Storm ( Twitter )
       – Apache S4

  – Graph processing – Apache Giraph
• Beyond data:
  – Deploying Apache HBase via YARN (HBASE-4329)
  – Hbase Co-processors via YARN (HBASE-4047)




      Architecting the Future of Big Data
                                                   Page 15
      © Hortonworks Inc. 2011
References

•Doc on writing new applications:
  –WritingYarnApplications.html ( available at
   http://hadoop.apache.org/common/docs/r2.0.0-
   alpha/ )




     Architecting the Future of Big Data
                                                 Page 16
     © Hortonworks Inc. 2011
Questions?


Thank You!
Hitesh Shah
hitesh@hortonworks.com




       Architecting the Future of Big Data
                                             Page 17
       © Hortonworks Inc. 2011
Appendix: Code
Examples



  Architecting the Future of Big Data
                                        Page 18
  © Hortonworks Inc. 2011
Client: Registration
ClientRMProtocol applicationsManager;
YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress = NetUtils.createSocketAddr(
  yarnConf.get(YarnConfiguration.RM_ADDRESS));

applicationsManager = ((ClientRMProtocol)
  rpc.getProxy(ClientRMProtocol.class,
               rmAddress, appsManagerServerConf));

GetNewApplicationRequest request =
  Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
  applicationsManager.getNewApplication(request);




       Architecting the Future of Big Data
                                                            Page 19
       © Hortonworks Inc. 2011
Client: App Submission
ApplicationSubmissionContext appContext;

ContainerLaunchContext amContainer;
amContainer.setLocalResources(Map<String, LocalResource> localResources);
amContainer.setEnvironment(Map<String, String> env);
String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2
“;
amContainer.setCommands(List<String> commands);
Resource capability; capability.setMemory(amMemory);
amContainer.setResource(capability);

appContext.setAMContainerSpec(amContainer);

SubmitApplicationRequest appRequest;
appRequest.setApplicationSubmissionContext(appContext);

applicationsManager.submitApplication(appRequest);


        Architecting the Future of Big Data
                                                                          Page 20
        © Hortonworks Inc. 2011
Client: App Monitoring
• Get Application Status

GetApplicationReportRequest reportRequest =
    Records.newRecord(GetApplicationReportRequest.class);
reportRequest.setApplicationId(appId);
GetApplicationReportResponse reportResponse =
  applicationsManager.getApplicationReport(reportRequest);
ApplicationReport report = reportResponse.getApplicationReport();


• Kill the application

KillApplicationRequest killRequest =
      Records.newRecord(KillApplicationRequest.class);
killRequest.setApplicationId(appId);
applicationsManager.forceKillApplication(killRequest);

       Architecting the Future of Big Data
                                                                    Page 21
       © Hortonworks Inc. 2011
AM: Ask RM for Containers
ResourceRequest rsrcRequest;
rsrcRequest.setHostName("*”); // hostname, rack, wildcard
rsrcRequest.setPriority(pri);
Resource capability; capability.setMemory(containerMemory);
rsrcRequest.setCapability(capability)
rsrcRequest.setNumContainers(numContainers);

List<ResourceRequest> requestedContainers;
List<ContainerId> releasedContainers;

AllocateRequest req;
req.setResponseId(rmRequestID);
req.addAllAsks(requestedContainers);
req.addAllReleases(releasedContainers);
req.setProgress(currentProgress);
AllocateResponse allocateResponse = resourceManager.allocate(req);



        Architecting the Future of Big Data
                                                                     Page 22
        © Hortonworks Inc. 2011
AM: Launch Containers
AMResponse amResp = allocateResponse.getAMResponse();

ContainerManager cm = (ContainerManager)rpc.getProxy
  (ContainerManager.class, cmAddress, conf);

List<Container> allocatedContainers = amResp.getAllocatedContainers();
for (Container allocatedContainer : allocatedContainers) {
   ContainerLaunchContext ctx;
   ctx.setContainerId(allocatedContainer .getId());
   ctx.setResource(allocatedContainer .getResource());
   // set env, command, local resources, …

    StartContainerRequest startReq;
    startReq.setContainerLaunchContext(ctx);
    cm.startContainer(startReq);
}

        Architecting the Future of Big Data
                                                                         Page 23
        © Hortonworks Inc. 2011
AM: Monitoring Containers
• Running Containers
GetContainerStatusRequest statusReq;
statusReq.setContainerId(containerId);
GetContainerStatusResponse statusResp =
  cm.getContainerStatus(statusReq);


• Completed Containers
AMResponse amResp = allocateResponse.getAMResponse();
List<Container> completedContainersStatus =
  amResp.getCompletedContainerStatuses();
for (ContainerStatus containerStatus : completedContainers) {
    // containerStatus.getContainerId()
    // containerStatus.getExitStatus()
    // containerStatus.getDiagnostics()
}



        Architecting the Future of Big Data
                                                                Page 24
        © Hortonworks Inc. 2011
AM: I am done
FinishApplicationMasterRequest finishReq;
finishReq.setAppAttemptId(appAttemptID);

finishReq.setFinishApplicationStatus
   (FinalApplicationStatus.SUCCEEDED); // or FAILED

finishReq.setDiagnostics(diagnostics);

resourceManager.finishApplicationMaster(finishReq);




       Architecting the Future of Big Data
                                                      Page 25
       © Hortonworks Inc. 2011

More Related Content

PPTX
Apache Hadoop YARN: Present and Future
PPTX
YARN - Next Generation Compute Platform fo Hadoop
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
PPTX
Apache Hadoop YARN: best practices
PPTX
Apache Hadoop YARN 2015: Present and Future
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
PDF
Yarns About Yarn
PDF
Apache Hadoop YARN
Apache Hadoop YARN: Present and Future
YARN - Next Generation Compute Platform fo Hadoop
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Apache Hadoop YARN: best practices
Apache Hadoop YARN 2015: Present and Future
Running Non-MapReduce Big Data Applications on Apache Hadoop
Yarns About Yarn
Apache Hadoop YARN

What's hot (20)

PPTX
Towards SLA-based Scheduling on YARN Clusters
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
PPTX
Hadoop Summit Europe 2015 - YARN Present and Future
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
PDF
Yarn
PDF
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
PPTX
Apache Hadoop YARN: Present and Future
PPTX
YARN - Hadoop's Resource Manager
PPTX
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PPTX
YARN - Presented At Dallas Hadoop User Group
PPTX
YARN - Hadoop Next Generation Compute Platform
PPTX
Enabling Diverse Workload Scheduling in YARN
PDF
Introduction to YARN Apps
PPTX
Hadoop YARN overview
PPTX
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
PDF
Writing app framworks for hadoop on yarn
ODP
An Introduction to Apache Hadoop Yarn
PPTX
Apache Tez - Accelerating Hadoop Data Processing
PPTX
NextGen Apache Hadoop MapReduce
Towards SLA-based Scheduling on YARN Clusters
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hadoop Summit Europe 2015 - YARN Present and Future
Apache Hadoop YARN - Enabling Next Generation Data Applications
Yarn
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN: Present and Future
YARN - Hadoop's Resource Manager
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Developing YARN Applications - Integrating natively to YARN July 24 2014
YARN - Presented At Dallas Hadoop User Group
YARN - Hadoop Next Generation Compute Platform
Enabling Diverse Workload Scheduling in YARN
Introduction to YARN Apps
Hadoop YARN overview
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Writing app framworks for hadoop on yarn
An Introduction to Apache Hadoop Yarn
Apache Tez - Accelerating Hadoop Data Processing
NextGen Apache Hadoop MapReduce
Ad

Viewers also liked (16)

PDF
Lars George HBase Seminar with O'REILLY Oct.12 2012
PDF
並列データベースシステムの概念と原理
ODP
Data analytics with hadoop hive on multiple data centers
PDF
【17-E-3】 オンライン機械学習で実現する大規模データ処理
PPTX
Future of HCatalog - Hadoop Summit 2012
PDF
Cloudera Manager4.0とNameNode-HAセミナー資料
PDF
Database smells
PDF
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PDF
20120830 DBリファクタリング読書会第三回
PDF
あなたの知らないPostgreSQL監視の世界
PDF
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
KEY
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
PPTX
SQLチューニング入門 入門編
PPTX
ならば(その弐)
PDF
Datalogからsqlへの トランスレータを書いた話
PPTX
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
Lars George HBase Seminar with O'REILLY Oct.12 2012
並列データベースシステムの概念と原理
Data analytics with hadoop hive on multiple data centers
【17-E-3】 オンライン機械学習で実現する大規模データ処理
Future of HCatalog - Hadoop Summit 2012
Cloudera Manager4.0とNameNode-HAセミナー資料
Database smells
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
20120830 DBリファクタリング読書会第三回
あなたの知らないPostgreSQL監視の世界
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
SQLチューニング入門 入門編
ならば(その弐)
Datalogからsqlへの トランスレータを書いた話
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
Ad

Similar to Writing Yarn Applications Hadoop Summit 2012 (20)

PDF
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
PDF
YARN: Future of Data Processing with Apache Hadoop
PPTX
Overview of slider project
PPTX
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
PPTX
Hortonworks Yarn Code Walk Through January 2014
PPTX
Apache Hadoop YARN - Hortonworks Meetup Presentation
PPTX
Field Notes: YARN Meetup at LinkedIn
PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
PPTX
Apache Hadoop YARN: state of the union
PPTX
Virtualizing Latency Sensitive Workloads and vFabric GemFire
PPTX
Apache Slider
PDF
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
PDF
Apache Hadoop YARN: State of the Union
PDF
Taming YARN @ Hadoop Conference Japan 2014
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Get most out of Spark on YARN
PDF
Running Legacy Applications with Containers
PDF
Taming YARN @ Hadoop conference Japan 2014
PDF
Apache Hadoop YARN: state of the union - Tokyo
PDF
Apache Hadoop YARN: state of the union
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
YARN: Future of Data Processing with Apache Hadoop
Overview of slider project
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
Hortonworks Yarn Code Walk Through January 2014
Apache Hadoop YARN - Hortonworks Meetup Presentation
Field Notes: YARN Meetup at LinkedIn
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop YARN: state of the union
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Apache Slider
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Apache Hadoop YARN: State of the Union
Taming YARN @ Hadoop Conference Japan 2014
Apache Hadoop YARN: Past, Present and Future
Get most out of Spark on YARN
Running Legacy Applications with Containers
Taming YARN @ Hadoop conference Japan 2014
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Modernizing your data center with Dell and AMD
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Per capita expenditure prediction using model stacking based on satellite ima...
Building Integrated photovoltaic BIPV_UPV.pdf
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Modernizing your data center with Dell and AMD

Writing Yarn Applications Hadoop Summit 2012

  • 1. Writing Application Frameworks on Apache Hadoop YARN Hitesh Shah hitesh@hortonworks.com © Hortonworks Inc. 2011 Page 1
  • 2. Hitesh Shah - Background • Member of Technical Staff at Hortonworks Inc. • Committer for Apache MapReduce and Ambari • Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Agenda •YARN Architecture and Concepts •Writing a New Framework Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
  • 4. YARN Architecture • Resource Manager –Global resource scheduler –Hierarchical queues • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
  • 5. YARN Architecture Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
  • 6. YARN Concepts • Application ID –Application Attempt IDs • Container –ContainerLaunchContext • ResourceRequest –Host/Rack/Any match –Priority –Resource constraints • Local Resource –File/Archive –Visibility – public/private/application Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
  • 7. What you need for a new Framework • Application Submission Client –For example, the MR Job Client • Application Master –The core framework library • Application History ( optional ) –History of all previously run instances • Auxiliary Services ( optional ) –Long-running application-specific services running on the NodeManager Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
  • 8. Use Case: Distributed Shell • Take a user-provided script Node or application and run it on a Manager set of nodes in the Cluster DS AppMaster • Input: – User Script to execute – Number of containers to run on Node Manager – Variable arguments for each different container Shell Script – Memory requirements for the shell script Node – Output Location/Dir Manager Shell Script Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Client: RPC calls • Uses ClientRM Protocol ClientRMProtocol#getNewApplication • Get a new Application ID from the RM ClientRMProtocol#submitApplication • Application Submission CLIENT RM ClientRMProtocol#getApplicationReport • Application Monitoring ClientRMProtocol#killApplication • Kill the Application? Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
  • 10. Client • Registration with the RM –New Application ID • Application Submission –User information –Scheduler queue –Define the container for the Distributed Shell App Master via the ContainerLaunchContext • Application Monitoring – AppMaster host details with tokens if needed, tracking url – Application Status (submitted/running/finished) Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
  • 11. Defining a Container • ContainerLaunchContext class –Can run a shell script, a java process or launch a VM • Command(s) to run • Local resources needed for the process to run –Dependent jars, native libs, data files/archives • Environment to setup –Java Classpath • Security-related data –Container Tokens Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
  • 12. Application Master: RPC calls • AMRM and CM protocols Client • Register AM with RM AMRM.registerAM • Ask RM to allocate resources AMRM.allocate AM RM • Launch tasks on allocated containers AMRM. finishAM App-specific • Manage tasks to final RPC completion CM.startContainer • Inform RM of completion NM NM Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Application Master • Setup RPC to handle requests from Client and/or tasks launched on Containers • Register and send regular heartbeats to the RM • Request resources from the RM. • Launch user shell script on containers as and when allocated. • Monitor status of user script of remote containers and manage failures by retrying if needed. • Inform RM of completion when application is done. Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. AMRM#allocate • Request: – Containers needed – Not a delta protocol – Locality constraints: Host/Rack/Any – Resource constraints: memory – Priority-based assignments – Containers to release – extra/unwanted? – Only non-launched containers • Response: – Allocated Containers – Launch or release – Completed Containers – Status of completion Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
  • 15. YARN Applications • Data Processing: – OpenMPI on Hadoop – Spark (UC Berkeley) – Shark ( Hive-on-Spark ) – Real-time data processing – Storm ( Twitter ) – Apache S4 – Graph processing – Apache Giraph • Beyond data: – Deploying Apache HBase via YARN (HBASE-4329) – Hbase Co-processors via YARN (HBASE-4047) Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
  • 16. References •Doc on writing new applications: –WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0- alpha/ ) Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. Questions? Thank You! Hitesh Shah hitesh@hortonworks.com Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
  • 18. Appendix: Code Examples Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
  • 19. Client: Registration ClientRMProtocol applicationsManager; YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr( yarnConf.get(YarnConfiguration.RM_ADDRESS)); applicationsManager = ((ClientRMProtocol) rpc.getProxy(ClientRMProtocol.class, rmAddress, appsManagerServerConf)); GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = applicationsManager.getNewApplication(request); Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Client: App Submission ApplicationSubmissionContext appContext; ContainerLaunchContext amContainer; amContainer.setLocalResources(Map<String, LocalResource> localResources); amContainer.setEnvironment(Map<String, String> env); String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2 “; amContainer.setCommands(List<String> commands); Resource capability; capability.setMemory(amMemory); amContainer.setResource(capability); appContext.setAMContainerSpec(amContainer); SubmitApplicationRequest appRequest; appRequest.setApplicationSubmissionContext(appContext); applicationsManager.submitApplication(appRequest); Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
  • 21. Client: App Monitoring • Get Application Status GetApplicationReportRequest reportRequest = Records.newRecord(GetApplicationReportRequest.class); reportRequest.setApplicationId(appId); GetApplicationReportResponse reportResponse = applicationsManager.getApplicationReport(reportRequest); ApplicationReport report = reportResponse.getApplicationReport(); • Kill the application KillApplicationRequest killRequest = Records.newRecord(KillApplicationRequest.class); killRequest.setApplicationId(appId); applicationsManager.forceKillApplication(killRequest); Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
  • 22. AM: Ask RM for Containers ResourceRequest rsrcRequest; rsrcRequest.setHostName("*”); // hostname, rack, wildcard rsrcRequest.setPriority(pri); Resource capability; capability.setMemory(containerMemory); rsrcRequest.setCapability(capability) rsrcRequest.setNumContainers(numContainers); List<ResourceRequest> requestedContainers; List<ContainerId> releasedContainers; AllocateRequest req; req.setResponseId(rmRequestID); req.addAllAsks(requestedContainers); req.addAllReleases(releasedContainers); req.setProgress(currentProgress); AllocateResponse allocateResponse = resourceManager.allocate(req); Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
  • 23. AM: Launch Containers AMResponse amResp = allocateResponse.getAMResponse(); ContainerManager cm = (ContainerManager)rpc.getProxy (ContainerManager.class, cmAddress, conf); List<Container> allocatedContainers = amResp.getAllocatedContainers(); for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx; ctx.setContainerId(allocatedContainer .getId()); ctx.setResource(allocatedContainer .getResource()); // set env, command, local resources, … StartContainerRequest startReq; startReq.setContainerLaunchContext(ctx); cm.startContainer(startReq); } Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. AM: Monitoring Containers • Running Containers GetContainerStatusRequest statusReq; statusReq.setContainerId(containerId); GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq); • Completed Containers AMResponse amResp = allocateResponse.getAMResponse(); List<Container> completedContainersStatus = amResp.getCompletedContainerStatuses(); for (ContainerStatus containerStatus : completedContainers) { // containerStatus.getContainerId() // containerStatus.getExitStatus() // containerStatus.getDiagnostics() } Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. AM: I am done FinishApplicationMasterRequest finishReq; finishReq.setAppAttemptId(appAttemptID); finishReq.setFinishApplicationStatus (FinalApplicationStatus.SUCCEEDED); // or FAILED finishReq.setDiagnostics(diagnostics); resourceManager.finishApplicationMaster(finishReq); Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2011