SlideShare a Scribd company logo
Anty Rao
April 10, 2011
Outline
 Architecture of HDFS
 Available NN HA options
HDFS architecture




NN is SPOF, need some kind of HA for NN.
NN HA
Currently two main available HA options:
 AvatarNode (facebook)
 BackupNode(yahoo!) (available?)
AvatarNode
AvatarNode (AN)
 Active-Standby Pair                                     Client
    Coordinated via ZooKeeper
    Failover in few seconds                          Client retrieves
                                                      block location from
    Wrapper over NameNode                            Primary or Standby


 Active AvatarNode                                Write
                                                                 Read
                                       Active      transaction                   Standby
    Writes transaction log to       AvatarNode
                                                                 transaction
                                                                                AvatarNode
     NFS filter
                                    (NameNode)                                 (NameNode)
 Standby AvatarNode
    Reads/Consumes
     transactions from NFS filter       Block                                   Block
    Processes all messages from        Location                                Location
     DataNodes                          messages                                messages
    Latest metadata in memory
                                                      DataNodes
Four steps to failover
 Wipe ZooKeeper entry. Clients will know the failover is in
  progress. (0 seconds)
 Stop the primary NameNode. Last bits of data will be
  flushed to Transaction Log and it will die. (Seconds)
 Switch Standby to Primary. It will consume the rest of the
  Transaction log and get out of SafeMode ready to serve
  traffic. (Seconds)
 Update the entry in ZooKeeper. All the clients waiting for
  failover will pick up the new connection (0 seconds)

 After: Start the first node in the Standby Mode (Takes a
  while, but the cluster is up and running)
AvatarNode @Facebook




 Diagram from Facebook   Contrib@hadoop 0.20 (HDFS-976)
Conclusions
 Complete Hot Standby
    NFS for storage of fsimage and editlogs. (no data loss)
    Standby node Consumes transactions from editlogs on NFS
     continuously. (namespace hot standby)
    DataNodes send message to both primary and standby node.
     (block reports hot standby)

 Fast Switchover
    Less than a minute


 Make sense!
BackupNode
BackupNode (BN)
 NN synchronously streams                   Client

    transaction log to                    Client retrieves block location
    BackupNode                            from NN
   BackupNode applies log                        Synchronous
                                    NN
    to in-memory and disk                         stream transacton
                                (NameNode)        logs to BN
    image
   BN always commit to disk                                    BN
                                           Block           (BackupNode
    before success to NN                   Location
                                                                 )
   If BN restarts, it has to              messages

     catch up with NN
   Available in HDFS 0.20.1
    release                         DataNodes
Limitations of BackupNode(BN)
 Maximum of one BackupNode per NN
   Support only two-machine failure
 NN doesn’t forward block reports to BackupNode
 Time to restart from 12GB image, 70M files + 100M
 blocks
   3-5 minutes to read the image from the disk
   20 min to process block reports
   BN will still take 25+ minutes to failover!
Conclusions
 Incomplete Hot Standby / Semi-Hot Standby
    Namespace: hot standby
    Block reports: cold standby


 Still-Slow Switchover
Other HA solutions
 DRDB + Linux HA
 http://www.cloudera.com/blog/2009/07/hadoop-ha-
 configuration/

 metadata backup
  http://wiki.apache.org/hadoop/NameNodeFailover

More Related Content

PPTX
Hadoop Distributed File System(HDFS) : Behind the scenes
PDF
Hadoop introduction
PDF
Interacting with hdfs
PPTX
Hadoop HDFS Architeture and Design
PPTX
Hadoop HDFS Detailed Introduction
PDF
HDFS Design Principles
PDF
Hadoop Distributed File System
PPTX
Hadoop HDFS Concepts
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop introduction
Interacting with hdfs
Hadoop HDFS Architeture and Design
Hadoop HDFS Detailed Introduction
HDFS Design Principles
Hadoop Distributed File System
Hadoop HDFS Concepts

What's hot (20)

ODP
Hadoop HDFS by rohitkapa
PDF
Hdfs architecture
PPTX
Introduction to hadoop and hdfs
PPTX
Snapshot in Hadoop Distributed File System
PDF
HDFS User Reference
PPTX
Ravi Namboori Hadoop & HDFS Architecture
PDF
Hadoop Introduction
PPTX
Hadoop Distributed File System
PPTX
Hadoop HDFS Concepts
PPTX
Hadoop Distributed File System
PDF
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
PDF
HDFS Trunncate: Evolving Beyond Write-Once Semantics
PPT
Anatomy of file read in hadoop
PPTX
Hadoop Distributed File System
PPT
Anatomy of file write in hadoop
PPTX
Hadoop and HDFS
PPTX
Hadoop hdfs
PPTX
PDF
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
PPT
Hadoop Architecture
Hadoop HDFS by rohitkapa
Hdfs architecture
Introduction to hadoop and hdfs
Snapshot in Hadoop Distributed File System
HDFS User Reference
Ravi Namboori Hadoop & HDFS Architecture
Hadoop Introduction
Hadoop Distributed File System
Hadoop HDFS Concepts
Hadoop Distributed File System
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
HDFS Trunncate: Evolving Beyond Write-Once Semantics
Anatomy of file read in hadoop
Hadoop Distributed File System
Anatomy of file write in hadoop
Hadoop and HDFS
Hadoop hdfs
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Hadoop Architecture
Ad

Similar to Hadoop HDFS NameNode HA (20)

PDF
Hdfs high availability
PDF
Hdfs high availability
PDF
Hdfs Dhruba
PDF
HDFS Architecture
PPTX
Hadoop Summit 2012 | HDFS High Availability
PPTX
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
PPTX
Availability and Integrity in hadoop (Strata EU Edition)
PDF
PPTX
Introduction to HDFS
PPTX
Hadoop Distributed File System
PPTX
HA Hadoop -ApacheCon talk
PPT
hdfs filesystem in bigdata for hadoop configuration
PPTX
HDFS Namenode High Availability
PPTX
Nn ha hadoop world.final
PPTX
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
PDF
Hadoop distributed file system
PDF
HDFS NameNode High Availability
PPTX
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
PDF
Apache Hadoop YARN, NameNode HA, HDFS Federation
Hdfs high availability
Hdfs high availability
Hdfs Dhruba
HDFS Architecture
Hadoop Summit 2012 | HDFS High Availability
Hadoop Meetup Jan 2019 - HDFS Scalability and Consistent Reads from Standby Node
Availability and Integrity in hadoop (Strata EU Edition)
Introduction to HDFS
Hadoop Distributed File System
HA Hadoop -ApacheCon talk
hdfs filesystem in bigdata for hadoop configuration
HDFS Namenode High Availability
Nn ha hadoop world.final
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
Hadoop distributed file system
HDFS NameNode High Availability
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
Apache Hadoop YARN, NameNode HA, HDFS Federation
Ad

More from Hanborq Inc. (11)

PDF
Introduction to Cassandra
PDF
Hadoop大数据实践经验
PPTX
FlumeBase Study
PPTX
Flume and Flive Introduction
PPTX
Hadoop MapReduce Streaming and Pipes
PPTX
HBase Introduction
PPTX
Hadoop Versioning
PPTX
Hadoop MapReduce Task Scheduler Introduction
PPTX
Hadoop MapReduce Introduction and Deep Insight
PDF
How to Build Cloud Storage Service Systems
PPTX
Hanborq Optimizations on Hadoop MapReduce
Introduction to Cassandra
Hadoop大数据实践经验
FlumeBase Study
Flume and Flive Introduction
Hadoop MapReduce Streaming and Pipes
HBase Introduction
Hadoop Versioning
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Introduction and Deep Insight
How to Build Cloud Storage Service Systems
Hanborq Optimizations on Hadoop MapReduce

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx

Hadoop HDFS NameNode HA

  • 2. Outline  Architecture of HDFS  Available NN HA options
  • 3. HDFS architecture NN is SPOF, need some kind of HA for NN.
  • 4. NN HA Currently two main available HA options:  AvatarNode (facebook)  BackupNode(yahoo!) (available?)
  • 6. AvatarNode (AN)  Active-Standby Pair Client  Coordinated via ZooKeeper  Failover in few seconds Client retrieves block location from  Wrapper over NameNode Primary or Standby  Active AvatarNode Write Read Active transaction Standby  Writes transaction log to AvatarNode transaction AvatarNode NFS filter (NameNode) (NameNode)  Standby AvatarNode  Reads/Consumes transactions from NFS filter Block Block  Processes all messages from Location Location DataNodes messages messages  Latest metadata in memory DataNodes
  • 7. Four steps to failover  Wipe ZooKeeper entry. Clients will know the failover is in progress. (0 seconds)  Stop the primary NameNode. Last bits of data will be flushed to Transaction Log and it will die. (Seconds)  Switch Standby to Primary. It will consume the rest of the Transaction log and get out of SafeMode ready to serve traffic. (Seconds)  Update the entry in ZooKeeper. All the clients waiting for failover will pick up the new connection (0 seconds)  After: Start the first node in the Standby Mode (Takes a while, but the cluster is up and running)
  • 8. AvatarNode @Facebook Diagram from Facebook Contrib@hadoop 0.20 (HDFS-976)
  • 9. Conclusions  Complete Hot Standby  NFS for storage of fsimage and editlogs. (no data loss)  Standby node Consumes transactions from editlogs on NFS continuously. (namespace hot standby)  DataNodes send message to both primary and standby node. (block reports hot standby)  Fast Switchover  Less than a minute  Make sense!
  • 11. BackupNode (BN)  NN synchronously streams Client transaction log to Client retrieves block location BackupNode from NN  BackupNode applies log Synchronous NN to in-memory and disk stream transacton (NameNode) logs to BN image  BN always commit to disk BN Block (BackupNode before success to NN Location )  If BN restarts, it has to messages catch up with NN  Available in HDFS 0.20.1 release DataNodes
  • 12. Limitations of BackupNode(BN)  Maximum of one BackupNode per NN  Support only two-machine failure  NN doesn’t forward block reports to BackupNode  Time to restart from 12GB image, 70M files + 100M blocks  3-5 minutes to read the image from the disk  20 min to process block reports  BN will still take 25+ minutes to failover!
  • 13. Conclusions  Incomplete Hot Standby / Semi-Hot Standby  Namespace: hot standby  Block reports: cold standby  Still-Slow Switchover
  • 14. Other HA solutions  DRDB + Linux HA http://www.cloudera.com/blog/2009/07/hadoop-ha- configuration/  metadata backup http://wiki.apache.org/hadoop/NameNodeFailover