High-Availability of YARN (MRv2)

Project presentation by Mário Almeida
Implementation of Distributed Systems
EMDC @ KTH

1

Outline
 What is YARN?
 Why is YARN not Highly Available?
 How to make it Highly Available?
 What storage to use?
 Why about NDB?
 Our Contribution
 Results
 Future work
 Conclusions
 Our Team
2

What is YARN?
 Yarn or MapReduce v2 is a complete overhaul of the
original MapReduce.

No more
M/R
Split containers
JobTracker

Per-App
AppMaster
3

Is YARN Highly-Available?

All jobs are
lost!

4

How to make it H.A?
 Store application states!

5

How to make it H.A?
 Failure recovery

RM1 Downtime RM1

store load

6

How to make it H.A?
 Failure recovery -> Fail-over chain

RM1 No Downtime RM2

store load

7

How to make it H.A?
 Failure recovery -> Fail-over chain -> Stateless RM

RM1 RM2 RM3

The Scheduler
would have to be
sync!

8

What storage to use?
 Hadoop proposed:
 Hadoop Distributed File System (HDFS).
 Fault-tolerant, large datasets, streaming access to data and
more.
 Zookeeper – highly reliable distributed coordination.
 Wait-free, FIFO client ordering, linearizable writes and more.

9

What about NDB?
 NDB MySQL Cluster is a scalable, ACID-compliant
transactional database
 Some features:
 Auto-sharding for R/W scalability;
 SQL and NoSQL interfaces;
 No single point of failure;
 In-memory data;
 Load balancing;
 Adding nodes = no Downtime;
 Fast R/W rate
 Fine grained locking
 Now for G.A!

10

What about NDB?
Connected
to all
clustered
storage
nodes

Configuration
and network
partitioning
11

What about NDB?

Linear
horizontal
scalability

Up to 4.3
Billion reads
p/minute!

12

Our Contribution
 Two phases, dependent on YARN patch releases.

 Phase 1 Not really
H.A!
 Apache
 Implemented Resource Manager recovery using a Memory
Store (MemoryRMStateStore).
 Stores the Application State and Application Attempt State.
 We Up to 10.5x
 Implemented NDB MySQL Cluster Store faster than
openjpa-jdbc
(NdbRMStateStore) using clusterj.
 Implemented TestNdbRMRestart to prove the H.A of YARN.

13

Our Contribution
 testNdbRMRestart

Restarts all
unfinished
jobs

14

Our Contribution
 Phase 2:
 Apache
 Implemented Zookeeper Store (ZKRMStateStore).
 Implemented FileSystem Store (FileSystemRMStateStore).
 We
 Developed a storage benchmark framework
 To benchmark both performances with our store.

 https://github.com/4knahs/zkndb

For
supporting
clusterj

15

Our contribution
 Zkndb architecture:

16

Our Contribution
 Zkndb extensibility:

17

Results
Runed multiple
experiments: ZK is limited
by the store
1 nodes
12 Threads,
60 seconds HDFS has
problems
Each node with: with creation
Dual Six-core CPUs of files
@2.6Ghz

All clusters with 3
nodes.
Not good
Same code as for small
Hadoop (ZK & HDFS) files!
18

Results
Runed multiple
ZK could
experiments:
scale a bit
more!
3 nodes
12 Threads each,
30 seconds
Gets even
Each node with: worse due to
Dual Six-core CPUs root lock in
@2.6Ghz NameNode

All clusters with 3
nodes.

Same code as
Hadoop (ZK & HDFS)
19

Future work
 Implement stateless architecture.
 Study the overhead of writing state to NDB.

20

Conclusions
 HDFS and Zookeeper have both disadvantages for this
purpose.
 HDFS performs badly for multiple small file creation,
so it would not be suitable for storing state from the
Application Masters.
 Zookeeper serializes all updates through a single
leader (up to 50K requests). Horizontal scalability?
 NDB throughput outperforms both HDFS and ZK.
 A combination of HDFS and ZK does support apache’s
proposal with a few restrictions.
21

Our team!
 Mário Almeida (site – 4knahs(at)gmail)
 Arinto Murdopo (site – arinto(at)gmail)
 Strahinja Lazetic (strahinja1984(at)gmail)
 Umit Buyuksahin (ucbuyuksahin(at)gmail)

 Special thanks
 Jim Dowling (SICS, supervisor)
 Vasia Kalavri (EMJD-DC, supervisor)
 Johan Montelius (EMDC coordinator, course teacher)

22

High-Availability of YARN (MRv2)

More Related Content

What's hot (20)

Similar to High-Availability of YARN (MRv2) (20)

More from Mário Almeida (13)

High-Availability of YARN (MRv2)

Editor's Notes