SlideShare a Scribd company logo
Project presentation by Mário Almeida
          Implementation of Distributed Systems
                                  EMDC @ KTH




                                                  1
Outline
   What is YARN?
   Why is YARN not Highly Available?
   How to make it Highly Available?
   What storage to use?
   Why about NDB?
   Our Contribution
   Results
   Future work
   Conclusions
   Our Team
                                        2
What is YARN?
  Yarn or MapReduce v2 is a complete overhaul of the
    original MapReduce.

                                                 No more
                                                   M/R
   Split                                        containers
JobTracker




              Per-App
             AppMaster
                                                             3
Is YARN Highly-Available?




     All jobs are
         lost!

                            4
How to make it H.A?
 Store application states!




                              5
How to make it H.A?
 Failure recovery

            RM1              Downtime          RM1




                     store              load




                                                     6
How to make it H.A?
 Failure recovery -> Fail-over chain

            RM1           No Downtime          RM2




                  store                 load




                                                     7
How to make it H.A?
 Failure recovery -> Fail-over chain -> Stateless RM




          RM1                 RM2                 RM3




                                The Scheduler
                               would have to be
                                    sync!


                                                        8
What storage to use?
 Hadoop proposed:
   Hadoop Distributed File System (HDFS).
       Fault-tolerant, large datasets, streaming access to data and
        more.
   Zookeeper – highly reliable distributed coordination.
       Wait-free, FIFO client ordering, linearizable writes and more.




                                                                         9
What about NDB?
 NDB MySQL Cluster is a scalable, ACID-compliant
  transactional database
 Some features:
   Auto-sharding for R/W scalability;
   SQL and NoSQL interfaces;
   No single point of failure;
   In-memory data;
   Load balancing;
   Adding nodes = no Downtime;
   Fast R/W rate
   Fine grained locking
   Now for G.A!


                                                    10
What about NDB?
                  Connected
                     to all
                   clustered
                    storage
                     nodes




Configuration
and network
 partitioning
                          11
What about NDB?



                     Linear
                   horizontal
                   scalability

                    Up to 4.3
                  Billion reads
                   p/minute!

                                  12
Our Contribution
 Two phases, dependent on YARN patch releases.

 Phase 1                                                   Not really
                                                              H.A!
    Apache
       Implemented Resource Manager recovery using a Memory
        Store (MemoryRMStateStore).
       Stores the Application State and Application Attempt State.
   We                                              Up to 10.5x
     Implemented NDB MySQL Cluster Store           faster than
                                                   openjpa-jdbc
      (NdbRMStateStore) using clusterj.
     Implemented TestNdbRMRestart to prove the H.A of YARN.


                                                                         13
Our Contribution
                testNdbRMRestart

                               Restarts all
                               unfinished
                                  jobs




                                          14
Our Contribution
 Phase 2:
    Apache
       Implemented Zookeeper Store (ZKRMStateStore).
       Implemented FileSystem Store (FileSystemRMStateStore).
   We
       Developed a storage benchmark framework
         To benchmark both performances with our store.

         https://github.com/4knahs/zkndb

                                                        For
                                                    supporting
                                                      clusterj

                                                                 15
Our contribution
 Zkndb architecture:




                        16
Our Contribution
 Zkndb extensibility:




                         17
Results
  Runed multiple
   experiments:        ZK is limited
                       by the store
      1 nodes
    12 Threads,
    60 seconds                          HDFS has
                                        problems
  Each node with:                      with creation
 Dual Six-core CPUs                       of files
      @2.6Ghz

 All clusters with 3
       nodes.
                                         Not good
   Same code as                          for small
Hadoop (ZK & HDFS)                         files!
                                                 18
Results
  Runed multiple
                       ZK could
   experiments:
                       scale a bit
                         more!
      3 nodes
  12 Threads each,
     30 seconds
                                      Gets even
  Each node with:                    worse due to
 Dual Six-core CPUs                  root lock in
      @2.6Ghz                        NameNode

 All clusters with 3
       nodes.

   Same code as
Hadoop (ZK & HDFS)
                                              19
Future work
 Implement stateless architecture.
 Study the overhead of writing state to NDB.




                                                20
Conclusions
 HDFS and Zookeeper have both disadvantages for this
    purpose.
   HDFS performs badly for multiple small file creation,
    so it would not be suitable for storing state from the
    Application Masters.
   Zookeeper serializes all updates through a single
    leader (up to 50K requests). Horizontal scalability?
   NDB throughput outperforms both HDFS and ZK.
   A combination of HDFS and ZK does support apache’s
    proposal with a few restrictions.
                                                             21
Our team!
 Mário Almeida (site – 4knahs(at)gmail)
 Arinto Murdopo (site – arinto(at)gmail)
 Strahinja Lazetic (strahinja1984(at)gmail)
 Umit Buyuksahin (ucbuyuksahin(at)gmail)


 Special thanks
    Jim Dowling (SICS, supervisor)
    Vasia Kalavri (EMJD-DC, supervisor)
    Johan Montelius (EMDC coordinator, course teacher)

                                                          22

More Related Content

PDF
Tungsten University: Setup and Operate Tungsten Replicators
PDF
Tarpm Clustering
 
PDF
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
PDF
Large customers want postgresql too !!
ODP
Apache con 2013-hadoop
PPTX
Presentation
PDF
Cassandra勉強会
PDF
Tungsten University: Setup and Operate Tungsten Replicators
Tarpm Clustering
 
HA Clustering of PostgreSQL(replication)@2012.9.29 PG Study.
Large customers want postgresql too !!
Apache con 2013-hadoop
Presentation
Cassandra勉強会

What's hot (20)

PDF
Lug best practice_hpc_workflow
PPTX
Cassandra in Operation
PDF
linux.conf.au-HAminiconf-pgsql91-20120116
PDF
Implementing distributed mclock in ceph
PPTX
Notes on a High-Performance JSON Protocol
PDF
Preventing multi master conflicts with tungsten
PDF
optimizing_ceph_flash
PDF
Vizuri exadata virtual
PDF
Solving MySQL replication problems with Tungsten
PDF
Bluestore oio adaptive_throttle_analysis
PDF
Ph.D. thesis presentation
PPTX
70a monitoring & troubleshooting
PPTX
Jug Lugano - Scale over the limits
PPT
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
ODP
brief introduction of drbd in SLE12SP2
PDF
Cat @ scale
PDF
Memory Bandwidth QoS
PPTX
How swift is your Swift - SD.pptx
PDF
Cache-partitioning
PDF
Cacheconcurrencyconsistency cassandra svcc
Lug best practice_hpc_workflow
Cassandra in Operation
linux.conf.au-HAminiconf-pgsql91-20120116
Implementing distributed mclock in ceph
Notes on a High-Performance JSON Protocol
Preventing multi master conflicts with tungsten
optimizing_ceph_flash
Vizuri exadata virtual
Solving MySQL replication problems with Tungsten
Bluestore oio adaptive_throttle_analysis
Ph.D. thesis presentation
70a monitoring & troubleshooting
Jug Lugano - Scale over the limits
How To Build A Scalable Storage System with OSS at TLUG Meeting 2008/09/13
brief introduction of drbd in SLE12SP2
Cat @ scale
Memory Bandwidth QoS
How swift is your Swift - SD.pptx
Cache-partitioning
Cacheconcurrencyconsistency cassandra svcc
Ad

Similar to High-Availability of YARN (MRv2) (20)

PDF
PPTX
High Availability in YARN
PPTX
Strata + Hadoop World 2012: HDFS: Now and Future
ODP
Hug Hbase Presentation.
PPTX
HugNov14
PDF
Hadoop Cluster With High Availability
PPTX
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
PDF
Hadoop at Rakuten, 2011/07/06
PPTX
20140708hcj
PDF
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
PDF
Hyperdex - A closer look
PDF
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
PDF
SAP Virtualization Week 2012 - The Lego Cloud
PPTX
Optimizing your Infrastrucure and Operating System for Hadoop
PPTX
Hadoop administration
PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PDF
Ceph as storage for CloudStack
PPT
Mysql talk
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
High Availability in YARN
Strata + Hadoop World 2012: HDFS: Now and Future
Hug Hbase Presentation.
HugNov14
Hadoop Cluster With High Availability
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Hadoop at Rakuten, 2011/07/06
20140708hcj
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
Hyperdex - A closer look
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
SAP Virtualization Week 2012 - The Lego Cloud
Optimizing your Infrastrucure and Operating System for Hadoop
Hadoop administration
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Ceph as storage for CloudStack
Mysql talk
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Ad

More from Mário Almeida (13)

PDF
Empirical Study of Android Alarm Usage for Application Scheduling
PDF
Android reverse engineering - Analyzing skype
PDF
Flume impact of reliability on scalability
PDF
Dimemas and Multi-Level Cache Simulations
PDF
Self-Adapting, Energy-Conserving Distributed File Systems
PDF
Smith waterman algorithm parallelization
PDF
Man-In-The-Browser attacks
PDF
Flume-based Independent News Aggregator
PDF
Exploiting Availability Prediction in Distributed Systems
PDF
High Availability of Services in Wide-Area Shared Computing Networks
PDF
Instrumenting parsecs raytrace
PDF
Architecting a cloud scale identity fabric
PDF
SOAP vs REST
Empirical Study of Android Alarm Usage for Application Scheduling
Android reverse engineering - Analyzing skype
Flume impact of reliability on scalability
Dimemas and Multi-Level Cache Simulations
Self-Adapting, Energy-Conserving Distributed File Systems
Smith waterman algorithm parallelization
Man-In-The-Browser attacks
Flume-based Independent News Aggregator
Exploiting Availability Prediction in Distributed Systems
High Availability of Services in Wide-Area Shared Computing Networks
Instrumenting parsecs raytrace
Architecting a cloud scale identity fabric
SOAP vs REST

High-Availability of YARN (MRv2)

  • 1. Project presentation by Mário Almeida Implementation of Distributed Systems EMDC @ KTH 1
  • 2. Outline  What is YARN?  Why is YARN not Highly Available?  How to make it Highly Available?  What storage to use?  Why about NDB?  Our Contribution  Results  Future work  Conclusions  Our Team 2
  • 3. What is YARN?  Yarn or MapReduce v2 is a complete overhaul of the original MapReduce. No more M/R Split containers JobTracker Per-App AppMaster 3
  • 4. Is YARN Highly-Available? All jobs are lost! 4
  • 5. How to make it H.A?  Store application states! 5
  • 6. How to make it H.A?  Failure recovery RM1 Downtime RM1 store load 6
  • 7. How to make it H.A?  Failure recovery -> Fail-over chain RM1 No Downtime RM2 store load 7
  • 8. How to make it H.A?  Failure recovery -> Fail-over chain -> Stateless RM RM1 RM2 RM3 The Scheduler would have to be sync! 8
  • 9. What storage to use?  Hadoop proposed:  Hadoop Distributed File System (HDFS).  Fault-tolerant, large datasets, streaming access to data and more.  Zookeeper – highly reliable distributed coordination.  Wait-free, FIFO client ordering, linearizable writes and more. 9
  • 10. What about NDB?  NDB MySQL Cluster is a scalable, ACID-compliant transactional database  Some features:  Auto-sharding for R/W scalability;  SQL and NoSQL interfaces;  No single point of failure;  In-memory data;  Load balancing;  Adding nodes = no Downtime;  Fast R/W rate  Fine grained locking  Now for G.A! 10
  • 11. What about NDB? Connected to all clustered storage nodes Configuration and network partitioning 11
  • 12. What about NDB? Linear horizontal scalability Up to 4.3 Billion reads p/minute! 12
  • 13. Our Contribution  Two phases, dependent on YARN patch releases.  Phase 1 Not really H.A!  Apache  Implemented Resource Manager recovery using a Memory Store (MemoryRMStateStore).  Stores the Application State and Application Attempt State.  We Up to 10.5x  Implemented NDB MySQL Cluster Store faster than openjpa-jdbc (NdbRMStateStore) using clusterj.  Implemented TestNdbRMRestart to prove the H.A of YARN. 13
  • 14. Our Contribution  testNdbRMRestart Restarts all unfinished jobs 14
  • 15. Our Contribution  Phase 2:  Apache  Implemented Zookeeper Store (ZKRMStateStore).  Implemented FileSystem Store (FileSystemRMStateStore).  We  Developed a storage benchmark framework  To benchmark both performances with our store.  https://github.com/4knahs/zkndb For supporting clusterj 15
  • 16. Our contribution  Zkndb architecture: 16
  • 17. Our Contribution  Zkndb extensibility: 17
  • 18. Results Runed multiple experiments: ZK is limited by the store 1 nodes 12 Threads, 60 seconds HDFS has problems Each node with: with creation Dual Six-core CPUs of files @2.6Ghz All clusters with 3 nodes. Not good Same code as for small Hadoop (ZK & HDFS) files! 18
  • 19. Results Runed multiple ZK could experiments: scale a bit more! 3 nodes 12 Threads each, 30 seconds Gets even Each node with: worse due to Dual Six-core CPUs root lock in @2.6Ghz NameNode All clusters with 3 nodes. Same code as Hadoop (ZK & HDFS) 19
  • 20. Future work  Implement stateless architecture.  Study the overhead of writing state to NDB. 20
  • 21. Conclusions  HDFS and Zookeeper have both disadvantages for this purpose.  HDFS performs badly for multiple small file creation, so it would not be suitable for storing state from the Application Masters.  Zookeeper serializes all updates through a single leader (up to 50K requests). Horizontal scalability?  NDB throughput outperforms both HDFS and ZK.  A combination of HDFS and ZK does support apache’s proposal with a few restrictions. 21
  • 22. Our team!  Mário Almeida (site – 4knahs(at)gmail)  Arinto Murdopo (site – arinto(at)gmail)  Strahinja Lazetic (strahinja1984(at)gmail)  Umit Buyuksahin (ucbuyuksahin(at)gmail)  Special thanks  Jim Dowling (SICS, supervisor)  Vasia Kalavri (EMJD-DC, supervisor)  Johan Montelius (EMDC coordinator, course teacher) 22

Editor's Notes

  • #4: Guest talks + student presentations
  • #11: Data nodes manage the storage and access to data. Tables are automatically sharded across the data nodes which also transparently handle load balancing, replication, failover and self-healing.
  • #12: MySQL Cluster is deployed in the some of the largest web, telecomsThe storage nodes (SN) are the main nodes of the system. All data is stored on the storage nodes.Data is replicated between storage nodes to ensure data is continuously available in case one ormore storage nodes fail. The storage nodes handle all database transactions.The management server nodes (MGM) handle the system configuration and are used to changethe setup of the system. Usually only one management server node is used, but there is also apossibility to run several. The management server node is only used at startup and system reconfiguration,which means that storage nodes are operable without the management nodes.