SlideShare a Scribd company logo
Building Applications on YARN




Chris Riccomini
10/11/2012
Staff Software Engineer at LinkedIn
      http://riccomini.name
           @criccomini
What I want to Talk About
Anatomy of a YARN Application

Things to consider when building your application
  Architecture
  Operations
Anatomy of a YARN App
Client

Application Master

Container Code

Resource Manager

Node Manager
Anatomy of a YARN App
Client
                     Client
                     Client    RM
                               RM
Application Master

Container Code

Resource Manager
                     NM
                     NM        NM
                               NM
Node Manager

                     AM
                     AM         CC
                                CC


                              * simplified
A lot to consider
Deployment            Logging

Metrics               Fault Tolerance

Configuration         Isolation

Security              Dashboard

Language              State
Deployment
HDFS

HTTP

File (NFS)

DDOS’ing your servers

What we do: Tarball over HTTP. Life is easier with HDFS,
but operational overhead is too high.
Metrics
Application-level metrics

YARN-level metrics

metrics2

Containers are transient

What we do: Both app-level and framework-level metrics use
same metrics framework. Pipe to in-house metrics
dashboard. We don’t use metrics2 since we don’t want a
dependency on Hadoop in our core jar.
Metrics
Configuration
YARN config (yarn-site.xml, core-site.xml, etc)

Application Configuration

Transporting Configuration

What we do: Config is fully resolved at client execution time.
No admin-override/locked config protection yet. Config is
passed from client to AM to containers via environment
variables.
Security
Kerberos?

Firewalls are your friend

Gateway machine

Dashboard

What we do: Firewall all YARN machines so they can only
talk to each-other. All users go through LDAP controlled
dashboard.
Language
Favor complexity in Application Master, and make
container-logic thin

Talk to RM via REST

Potential to talk to RM via Protobuf RPC

What we do: Application AM is Java. Tasks-side of
application has Python and Java implementations.
Logging
Local storage (application is running)

HDFS storage (application has stopped for a while)

Be careful with STDOUT/STDERR (rollover)

What we do: No HDFS. Logs sit for 7 days, then disappear.
Not ideal.
Fault Tolerance
Failure matrix

HA RM/NM

Orphaned processes

Pay attention to process trees

What we do: No HA. Manual fail over when RM dies.
Orphaned process monitor (proc start time < RM start time).
Fault Tolerance
Isolation
Memory

Disk

CPU

Network

What we do: Nothing, right now. Hoping YARN will solve
this before we need it (cgroups?).
Dashboard
Application-specific information

Integrate with YARN

Application Master or Standalone?

What we do: Dashboard enforces security, talks to RM/AM
via HTTP/JSON to get information about jobs.
Dashboard
State
HDFS

Deployed with Application

Remote data store

What we do: Nothing, right now.
Takeaways
There’s a lot more than just the YARN API

Look for examples (Spark, Storm, Map-Reduce)

Decide your level of Hadoop integration
  Metrics2

  HDFS

  Config

  Kerberos and doAs
Questions?

More Related Content

PPTX
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
PPTX
55502459 swe631 atsadang
PDF
My experience writing DR service for CloudStack
PDF
Introduction to YARN Apps
PDF
Building a REST Job Server for Interactive Spark as a Service
PPTX
Get Started Building YARN Applications
PPTX
Overview of slider project
PDF
Unit 05: Physical Architecture Design
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
55502459 swe631 atsadang
My experience writing DR service for CloudStack
Introduction to YARN Apps
Building a REST Job Server for Interactive Spark as a Service
Get Started Building YARN Applications
Overview of slider project
Unit 05: Physical Architecture Design

Similar to Building Applications on YARN (20)

PPTX
Continuous delivery on the cloud
DOC
PDF
Zend In The Cloud
PPT
WinConnections Spring, 2011 - How to Securely Connect Remote Desktop Services...
PPT
E Snet Raf Essc Jan2005
PDF
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
PDF
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
Hadoop and Voldemort @ LinkedIn
PDF
Configuration Management and Transforming Legacy Applications in the Enterpri...
PPTX
Handling Data in Mega Scale Systems
PDF
Ebs architecture con9036_pdf_9036_0001
PPTX
DockerCon EU 2015: The Missing Piece: when Docker networking unleashing soft ...
PPTX
Containerized Hadoop beyond Kubernetes
PDF
RackN Physical Layer Automation Innovation
PDF
Meteor South Bay Meetup - Kubernetes & Google Container Engine
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PPTX
70-410 Practice Test
PDF
Technical Recruitment Overview & Tips
PDF
State of Resource Management in Big Data
Continuous delivery on the cloud
Zend In The Cloud
WinConnections Spring, 2011 - How to Securely Connect Remote Desktop Services...
E Snet Raf Essc Jan2005
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
Migrating Hundreds of Legacy Applications to Kubernetes - The Good, the Bad, ...
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Configuration Management and Transforming Legacy Applications in the Enterpri...
Handling Data in Mega Scale Systems
Ebs architecture con9036_pdf_9036_0001
DockerCon EU 2015: The Missing Piece: when Docker networking unleashing soft ...
Containerized Hadoop beyond Kubernetes
RackN Physical Layer Automation Innovation
Meteor South Bay Meetup - Kubernetes & Google Container Engine
Developing YARN Applications - Integrating natively to YARN July 24 2014
70-410 Practice Test
Technical Recruitment Overview & Tips
State of Resource Management in Big Data
Ad

More from Chris Riccomini (6)

PDF
Data Warehousing Trends
PDF
What Your Tech Lead Thinks You Know (But Didn't Teach You)
PPTX
The Future of Data Engineering - 2019 InfoQ QConSF
PPTX
Airflow at WePay
PPTX
Apache Incubator Samza: Stream Processing at LinkedIn
PPTX
Apache Incubator Samza: Stream Processing at LinkedIn
Data Warehousing Trends
What Your Tech Lead Thinks You Know (But Didn't Teach You)
The Future of Data Engineering - 2019 InfoQ QConSF
Airflow at WePay
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Ad

Building Applications on YARN