SlideShare a Scribd company logo
High Availability Hadoop
Clusters
• Planned downtime
−Upgrades
−Config changes
• Unplanned downtime
−Hardware failure
−Server unresponsive
−Software failures
−Occurs infrequently
Impact
• HDFS HA using QJM
• HDFS HA using NFS for shared storage
• Resource manager HA
Different Kinds Of HA Configurations
HDFS HA - Necessary Hardware Resources
• Name node machines
− Active NN
− Stand by NN
Both of these should ideally be of equivalent hardware.
• Journal Nodes
− Light weight daemons that can be run on machines running other hadoop daemons.
− There must be at least 3 journal node daemons running at any point of time as the
shared edit logs are published to a majority of the journal nodes.
− Journal node daemons should be run in odd numbers (3,5,7 etc)
− When running N journal nodes the system tolerates a maximum of (N-1)/2 failures.
• Zookeeper Service
HDFS HA Architecture Using The Quorum Journal
Manager
RM HA -Necessary Hardware Resources
• Resource manager machines
− Active RM
− Stand by RM
Both of these should ideally be of equivalent hardware.
• Zookeeper service
Resource Manager HA Architecture
RM Failover
• Two failover mechanisms
− Manual Transition - Transition current active rm to standby and then transition standby
rm to Active
− Automatic failover - Embedded zookeeper based ActiveStandby elector to decide which
rm is in active state.
• Each client must have the all resource managers listed with them. The clients use a round
robin fashion to connect to the active resource manager.
• Promoted RM continues to perform from where the previous RM left off. The new RM
spawns new attempts for each of the managed applications. Applications can create
checkpoints to avoid losing work. All states are stored in the zookeeper state store which
allows only a single rm to get write access.

More Related Content

PPTX
Hardware considerations for different node types
PDF
02 2017 emea_roadshow_milan_ha
PDF
SAP Rolling Kernel Switch RKS
ODP
M|18 Where and How to Optimize for Performance
PDF
MariaDB High Availability Webinar
PPTX
Throughput oriented aarchitectures
PDF
SAP ASE Migration Lessons Learned
PPTX
Apache Apex Introduction with PubMatic
Hardware considerations for different node types
02 2017 emea_roadshow_milan_ha
SAP Rolling Kernel Switch RKS
M|18 Where and How to Optimize for Performance
MariaDB High Availability Webinar
Throughput oriented aarchitectures
SAP ASE Migration Lessons Learned
Apache Apex Introduction with PubMatic

What's hot (12)

PDF
Introduction to Apache Apex - CoDS 2016
PDF
SAP LVM Custom Instances
PDF
SAP LVM Integration with SAP BPA
PDF
How to Meet Your P99 Goal While Overcommitting Another Workload
PPTX
Stream Processing with Apache Apex
PDF
2016 may-countdown-to-postgres-v96-parallel-query
PPTX
MapReduce
PDF
SAP LVM Post Copy Automation Integration
PPT
Real time database
PPTX
Tuning Slow Running SQLs in PostgreSQL
PPTX
Application performance analytics with Applications Manager
PPTX
NCM Training - Part 2 - Automation, Notification, Compliance and Reports
Introduction to Apache Apex - CoDS 2016
SAP LVM Custom Instances
SAP LVM Integration with SAP BPA
How to Meet Your P99 Goal While Overcommitting Another Workload
Stream Processing with Apache Apex
2016 may-countdown-to-postgres-v96-parallel-query
MapReduce
SAP LVM Post Copy Automation Integration
Real time database
Tuning Slow Running SQLs in PostgreSQL
Application performance analytics with Applications Manager
NCM Training - Part 2 - Automation, Notification, Compliance and Reports
Ad

Viewers also liked (20)

PDF
Building high scalable distributed framework on apache mesos
PDF
Equation solving-at-scale-using-apache-spark
PDF
Productionizing spark
PDF
Graph computation
PPTX
Angular js performance improvements
PPTX
WEBSOCKETS AND WEBWORKERS
PDF
Real-time Supply Chain Analytics
PPTX
Sparkstreaming with kafka and h base at scale (1)
PPT
Spark and spark streaming internals
PDF
Composing and scaling data platforms
PPTX
Introduction to apache nutch
PPTX
Approaches to text analysis
PPTX
Tale of Kafka Consumer for Spark Streaming
PDF
Introduction to Spark R with R studio - Mr. Pragith
PPTX
Joining Large data at Scale
PPTX
Building bots to automate common developer tasks - Writing your first smart c...
PPT
Graph Analytics for big data
PDF
Time series database by Harshil Ambagade
PPTX
Using spark for timeseries graph analytics
PPTX
SORT & JOIN IN SPARK 2.0
Building high scalable distributed framework on apache mesos
Equation solving-at-scale-using-apache-spark
Productionizing spark
Graph computation
Angular js performance improvements
WEBSOCKETS AND WEBWORKERS
Real-time Supply Chain Analytics
Sparkstreaming with kafka and h base at scale (1)
Spark and spark streaming internals
Composing and scaling data platforms
Introduction to apache nutch
Approaches to text analysis
Tale of Kafka Consumer for Spark Streaming
Introduction to Spark R with R studio - Mr. Pragith
Joining Large data at Scale
Building bots to automate common developer tasks - Writing your first smart c...
Graph Analytics for big data
Time series database by Harshil Ambagade
Using spark for timeseries graph analytics
SORT & JOIN IN SPARK 2.0
Ad

Similar to Failsafe Hadoop Infrastructure and the way they work (20)

PPTX
Field Notes: YARN Meetup at LinkedIn
PDF
Hadoop ecosystem
PDF
Hadoop ecosystem
PPTX
Apache Hadoop YARN: best practices
PDF
Choosing the right high availability strategy
PDF
Choosing the right high availability strategy
PDF
Yarn
PPTX
MariaDB High Availability
PDF
Best Practice for Achieving High Availability in MariaDB
PPTX
Running Services on YARN
PDF
IBM MQ - High Availability and Disaster Recovery
PDF
M|18 Choosing the Right High Availability Strategy for You
PDF
IBM MQ High Availabillity and Disaster Recovery (2017 version)
PPTX
HDFS Namenode High Availability
PPTX
MHUG - YARN
PPT
lecture4(VM).ppt
PPTX
Nn ha hadoop world.final
PDF
Taming YARN @ Hadoop conference Japan 2014
PDF
Taming YARN @ Hadoop Conference Japan 2014
PDF
High availability networking openstack
Field Notes: YARN Meetup at LinkedIn
Hadoop ecosystem
Hadoop ecosystem
Apache Hadoop YARN: best practices
Choosing the right high availability strategy
Choosing the right high availability strategy
Yarn
MariaDB High Availability
Best Practice for Achieving High Availability in MariaDB
Running Services on YARN
IBM MQ - High Availability and Disaster Recovery
M|18 Choosing the Right High Availability Strategy for You
IBM MQ High Availabillity and Disaster Recovery (2017 version)
HDFS Namenode High Availability
MHUG - YARN
lecture4(VM).ppt
Nn ha hadoop world.final
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
High availability networking openstack

More from Sigmoid (10)

PPTX
Monitoring and tuning Spark applications
PPTX
Structured Streaming Using Spark 2.1
PDF
Real-Time Stock Market Analysis using Spark Streaming
PPTX
Levelling up in Akka
PDF
Expression Problem: Discussing the problems in OOPs language & their solutions
PPTX
Spark 1.6 vs Spark 2.0
PPTX
ML on Big Data: Real-Time Analysis on Time Series
PDF
Dashboard design By Anu Vijayan
PDF
Spark Dataframe - Mr. Jyotiska
PPTX
Real Time search using Spark and Elasticsearch
Monitoring and tuning Spark applications
Structured Streaming Using Spark 2.1
Real-Time Stock Market Analysis using Spark Streaming
Levelling up in Akka
Expression Problem: Discussing the problems in OOPs language & their solutions
Spark 1.6 vs Spark 2.0
ML on Big Data: Real-Time Analysis on Time Series
Dashboard design By Anu Vijayan
Spark Dataframe - Mr. Jyotiska
Real Time search using Spark and Elasticsearch

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
Teaching material agriculture food technology
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
cuic standard and advanced reporting.pdf
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Modernizing your data center with Dell and AMD
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
GamePlan Trading System Review: Professional Trader's Honest Take
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Teaching material agriculture food technology
Advanced Soft Computing BINUS July 2025.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
cuic standard and advanced reporting.pdf
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Modernizing your data center with Dell and AMD

Failsafe Hadoop Infrastructure and the way they work

  • 2. • Planned downtime −Upgrades −Config changes • Unplanned downtime −Hardware failure −Server unresponsive −Software failures −Occurs infrequently Impact
  • 3. • HDFS HA using QJM • HDFS HA using NFS for shared storage • Resource manager HA Different Kinds Of HA Configurations
  • 4. HDFS HA - Necessary Hardware Resources • Name node machines − Active NN − Stand by NN Both of these should ideally be of equivalent hardware. • Journal Nodes − Light weight daemons that can be run on machines running other hadoop daemons. − There must be at least 3 journal node daemons running at any point of time as the shared edit logs are published to a majority of the journal nodes. − Journal node daemons should be run in odd numbers (3,5,7 etc) − When running N journal nodes the system tolerates a maximum of (N-1)/2 failures. • Zookeeper Service
  • 5. HDFS HA Architecture Using The Quorum Journal Manager
  • 6. RM HA -Necessary Hardware Resources • Resource manager machines − Active RM − Stand by RM Both of these should ideally be of equivalent hardware. • Zookeeper service
  • 7. Resource Manager HA Architecture
  • 8. RM Failover • Two failover mechanisms − Manual Transition - Transition current active rm to standby and then transition standby rm to Active − Automatic failover - Embedded zookeeper based ActiveStandby elector to decide which rm is in active state. • Each client must have the all resource managers listed with them. The clients use a round robin fashion to connect to the active resource manager. • Promoted RM continues to perform from where the previous RM left off. The new RM spawns new attempts for each of the managed applications. Applications can create checkpoints to avoid losing work. All states are stored in the zookeeper state store which allows only a single rm to get write access.