SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2015
Enterprise-Grade Rolling Upgrade for a Live
Hadoop Cluster
Sanjay Radia, Vinod Kumar Vavilapalli
Hortonworks Inc
June 9, 2015
Page2 © Hortonworks Inc. 2015
Agenda
•Introduction
•What is Rolling Upgrade?
•Problem – Several key issues to be addressed
–Wire compatibility and side-by-side installs are not sufficient!!
–Must Address: Data safety, Service degradation and disruption
•Enhancements to various components
–Packaging – side-by-side install
–HDFS, YARN, Hive, Oozie, …
Page3 © Hortonworks Inc. 2015
Sanjay Radia
•Chief Architect, Founder, Hortonworks
•Part of the Hadoop team at Yahoo! since 2007
–Chief Architect of Hadoop Core at Yahoo!
–Apache Hadoop PMC and Committer
• Prior
–Data center automation, schedulers, virtualization, Java, HA, OSs, File
Systems
– (Startup, Sun Microsystems, Inria …)
–Ph.D., University of Waterloo
Page4 © Hortonworks Inc. 2015
Vinod Kumar Vavilapalli
– Long time Hadooper since 2007
– Apache Hadoop Committer / PMC
– Apache Member
– Yahoo! -> Hortonworks
– MapReduce -> YARN from day one
Page5 © Hortonworks Inc. 2015
HDP Upgrade: Two Upgrade Modes
Stop the Cluster Upgrade
Shutdown services and cluster and then upgrade.
Traditionally this was the only way
Rolling Upgrade
Upgrade cluster and its services while cluster is
actively running applications
Note: Upgrade time is proportional to # nodes, not data size
Enterprises run critical services and data on a Hadoop cluster.
Need live cluster upgrade that maintains SLAs without degradation
Page6 © Hortonworks Inc. 2015
But you can also “Revert to Prior State”
Rollback
Revert bits and state of cluster and its services back to a
checkpoint’d state.
Why? This is an emergency procedure.
Downgrade
Downgrade the service and component to prior version, but
keep any new data and metadata that has been generated
Why? You are not happy with performance, or app compatibility, ….
Page7 © Hortonworks Inc. 2015
But aren’t wire compatibility and
side-by-side installs sufficient for
Rolling upgrades?
Unfortunately No!! Not if you want
• Data safety
• Keep running jobs/apps during upgrades; continue to run
correctly
• Maintain SLAs
• Allow downgrade/rollbacks in case of problems
Page8 © Hortonworks Inc. 2015
Issues that need to be addressed (1)
• Data safety
• HDFS’s upgrade checkpoint does not work for rolling upgrade
• Service degradation – note every daemon is restarted in rolling fashion
• HDFS write pipeline
• Application Masters on YARN restart
• NodeManagers restart
• Hive server is processing client queries – it cannot restart to new version without loss
• Client must not see failures – many components do not have retry
BUT Hadoop deals with failures, it will fix pipelines, restart tasks –
what is the big deal!!
Service degradation will be high because every daemon is restarted
Page9 © Hortonworks Inc. 2015
Issues that need to be addressed (2)
• Maintaining the application submitter’s context (correctness)
• MR tasks get their context from the local node
– In the past the submitters and node’s context were identical
– But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter
- Half of the job could execute with old binaries and the other with the new one!!
• Persistent state
• Backward compatibility for upgrade (or convert)
• Forward compatibility for downgrade (or convert)
• Wire compatibility
• With clients (forward and backward)
• Internally (Between Masters and Slaves or Peers)
– Note: the upgrade is in a rolling fashion
Page10 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page11 © Hortonworks Inc. 2015
Packaging: Side-by-side Installs (1)
• Need side-by-side installs of multiple versions on same node
• Some components are version N, while others are N+1
• For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN)
• HDP’s solution: Use OS-distro standard packaging solution
• Rejected proprietary packing as a solution (no lock-in)
• Want to support RU via Ambari and Manually
• Standard packaging solutions like RPMs have useful tools and mechanisms
– Tools to install, uninstall, query, etc
– Manage dependencies automatically
– Admins do not need to learn new tools and formats
• Side benefits for ‘stop-the-world” upgrade:
• Can install the new binaries before the shutdown
Page12 © Hortonworks Inc. 2015
Packaging: Side-by-side installs (2)
• Layout: side-by-side
• /usr/hdp/2.2.0.0/hadoop
• /usr/hdp/2.2.0.0/hive
• /usr/hdp/2.3.0.0/hadoop
• /usr/hdp/2.3.0.0/hive
• Define what is current for each component’s
daemon and clients
• /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop
• /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop
• /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop
• Distro-select helps you manage the version switch
• Our solution: the package name contains the version number:
• E.g hadoop_2_2_0_0 is the RPM package name itself
– Hadoop_2_3_0_0 is different peer package
• Bin commands point to current:
/usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
Page13 © Hortonworks Inc. 2015
Packaging: Side-by-side installs (3)
• distro-select tool to select current binary
• Per-component, Per-daemon
• Maintain stack consistency – that is what QE tested
• Each component refers to its siblings of same stack version
• Each component knows the “hadoop home” of the same stack
– Wrapper bin-scripts set this up
• Config updates can be optionally synchronized with binary upgrade
• Configs can sit in their old location
• But what if the new binary version requires slightly different config?
• Each binary version has its own config pointer
– /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
Page14 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page15 © Hortonworks Inc. 2015
HDFS Enhancements (1)
Data safety
• Since version 2007, HDFS supported an upgrade-checkpoint
• Backups of HDFS not practical – too large
• Protects against HDFS bugs in new version deleting files
– Standard practice to use for ALL upgrade even patch releases
• But this only works for “stop-the-world” full upgrade and does not support downgrade
• Irresponsible to do rolling upgrade without such a mechanism
HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535)
• Markers for rollback
• “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code
– Old scheme had hardlinks but we now delay the deletes
• Added downgrade capability
• Protobuf based fsImage for compatible extensibility
Page16 © Hortonworks Inc. 2015
HDFS Enhancements (2)
Minimize service degradation and retain data safety
• Fast datanode restart (HDFS-5498)
• Write pipeline – every DN will be upgraded and hence many write
pipelines will break and repaired
• Umbrella Jira HDFS-5535
– Repair it to the same DN during RU (avoid replica data copy)
– Retain same number of replicas in pipeline
• Upgrade HA standby and failover (NN HA available for a long time)
Page17 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page18 © Hortonworks Inc. 2015
YARN Enhancements: Minimize Service Degradation
• YARN RM retains application queue (2013)
• YARN RM fail-over (2014)
– Note this retains the queues but ALL jobs are rekicked
• YARN RM can restart while retaining applications (2015)
Page19 © Hortonworks Inc. 2015
YARN Enhancements: Minimize Service Degradation
• A restarted YARN NodeManager retains existing containers (2015)
• Recall: restarting containers will cause serious SLA degradation
Page20 © Hortonworks Inc. 2015
YARN Enhancements: Compatibility
• Versioning of state-stores of RM and NMs
• Compatible evolution of tokens over time
• Wire compatibility between mixed versions of RM
Page21 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page22 © Hortonworks Inc. 2015
Retaining Job/App context
• Previously a Job/Apps used libraries from the local node
• Worked because client-node & compute-nodes had same version
• But during RU, the NodeManager has multiple versions
• Must use the same version as used by the client when submitting a job
• Solution:
• Framework libraries are now installed in HDFS
• Client-context sent as “distro-version” variable in job config
• Has side benefits
– Frameworks now installed in single node and then uploaded to HDFS
• Note Oozie also enhanced to maintain consistent context
Page23 © Hortonworks Inc. 2015
YARN Rolling Upgrades: A Cluster Snapshot
Page24 © Hortonworks Inc. 2015
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• YARN Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page25 © Hortonworks Inc. 2015
Hive Enhancements
• Fast restarts + client-side reconnection
• Hive metastore and Hive client
• Hive-server2: stateful server that submits the client’s query
• Need to keep it running till the old queries complete
• Solution:
• Allow multiple Hive-servers to run, each registered in Zookeeper
• New client requests go to new servers
• Old server completes old queries but does not receive any new ones
– Old-server is removed from Zookeeper
• Side benefits
• HA + Load balancing solution for Hiveserver2
Page26 © Hortonworks Inc. 2015
Automated Rolling Upgrade
Via Ambari
Via Your own cluster management scripts
Page27 © Hortonworks Inc. 2015
HDP Rolling Upgrades Runbook
Pre-requisites
• HA
• Configs
Prepare
• Install bits
• DB backups
• HDFS
checkpoint
Rolling Upgrade Finalize
Rolling
Downgrade
Rollback
NOT Rolling. Shutdown all
services.
Note: Upgrade time is proportional to # nodes, not data size
Page30 © Hortonworks Inc. 2015
Both Manual and Automated Rolling Upgrade
• Ambari supports fully automated upgrades
• Verifies prerequisites
• Performs HDFS upgrade-checkpoint, prompts for DB backups
• Performs rolling upgrade
• All the components, in the right order
• Smoke tests at each critical stages
• Opportunities for Admin verification at critical stages
• Downgrade if you change your mind
• Have published the runbook for those that do not use Ambari
• You can do it manually or automate your own process
Page31 © Hortonworks Inc. 2015
Runbook: Rolling Upgrade
Ambari has automated
process for Rolling Upgrades
Services are switched over to
new version in rolling fashion
Any components not installed
on cluster are skipped
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Finalize
HDFS, YARN, MR,
Tez, HBase, Pig.
Hive, Phoenix,
Mahout
HDFS
YARN
HBase
Page32 © Hortonworks Inc. 2015
Runbook: Rolling Downgrade
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Downgrade
Finalize
Page33 © Hortonworks Inc. 2015
Summary
• Enterprises run critical services and data on a Hadoop cluster.
• Need a live cluster upgrade without degradation and maintaining SLAs
• We enhanced Hadoop components for enterprise-grade rolling upgrade
• Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, )
• Data safety
– HDFS checkpoints and write-pipelines
• Maintain SLAs – solve a number of service degradation problems
– HDFS write pipelines, Yarn RM, NM state recovery, Hive, …
• Jobs/apps continue to run correctly with the right context
• Allow downgrade/rollbacks in case of problems
• All enhancements truly open source and pushed back to Apache?
• Yes of course – that is how Hortonworks does business …
Page34 © Hortonworks Inc. 2015
Backup slides
Page35 © Hortonworks Inc. 2015
Why didn’t you use alternatives
• Alternatives generally keep one version active, not two
• We need to move some services as a pack (clients)
• We need to support managing confs and binaries together and
separately
• Maybe we could have done it, but it was getting complex …..

More Related Content

PPTX
Docker based Hadoop provisioning - anywhere
PDF
ukoug-soa-sig-june-2016 v0.5
PDF
SOA 12c upgrade OGh-Tech-2017
PPTX
Apache Slider
PDF
Sub-second-sql-on-hadoop-at-scale
PDF
Deploying and Managing Hadoop Clusters with AMBARI
PPTX
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012
PPTX
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
Docker based Hadoop provisioning - anywhere
ukoug-soa-sig-june-2016 v0.5
SOA 12c upgrade OGh-Tech-2017
Apache Slider
Sub-second-sql-on-hadoop-at-scale
Deploying and Managing Hadoop Clusters with AMBARI
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...

What's hot (20)

PDF
High Availability with MariaDB Enterprise
PPTX
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
PPTX
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
PPTX
Install Oracle FMW - 'Mostly Scripted'
PDF
Status Quo on the automation support in SOA Suite OGhTech17
PPTX
Oracle Enterprise Linux
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
PDF
33616611930205162156 upgrade internals_19c
PDF
Hortonworks Technical Workshop: Interactive Query with Apache Hive
PPTX
YARN and the Docker container runtime
PPTX
Best Practices for Virtualizing Hadoop
PPT
Overview about OracleVM and Oracle Linux
PDF
MySQL in the Cloud, is Amazon RDS for you?
PDF
DevOps Culture & Enablement with Postgres Plus Cloud Database
 
PDF
Database as a Service on the Oracle Database Appliance Platform
PDF
20618782218718364253 emea12 vldb
PPTX
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
PPTX
Managing 2000 Node Cluster with Ambari
PPTX
Apache Ambari - What's New in 1.7.0
High Availability with MariaDB Enterprise
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
Install Oracle FMW - 'Mostly Scripted'
Status Quo on the automation support in SOA Suite OGhTech17
Oracle Enterprise Linux
Apache Hadoop YARN: Past, Present and Future
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
33616611930205162156 upgrade internals_19c
Hortonworks Technical Workshop: Interactive Query with Apache Hive
YARN and the Docker container runtime
Best Practices for Virtualizing Hadoop
Overview about OracleVM and Oracle Linux
MySQL in the Cloud, is Amazon RDS for you?
DevOps Culture & Enablement with Postgres Plus Cloud Database
 
Database as a Service on the Oracle Database Appliance Platform
20618782218718364253 emea12 vldb
Oracle E-Business Suite R12.2.6 on Database 12c: Install, Patch and Administer
Managing 2000 Node Cluster with Ambari
Apache Ambari - What's New in 1.7.0
Ad

Similar to Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster (20)

PPTX
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
PPTX
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
PDF
Discover.hdp2.2.ambari.final[1]
PPTX
Hadoop operations-2014-strata-new-york-v5
PPTX
Apache Hadoop YARN: Past, Present and Future
PPTX
Keep your hadoop cluster at its best! v4
PPTX
Keep your Hadoop Cluster at its Best
PPTX
What's new in Ambari
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
PPTX
A First-Hand Look at What's New in HDP 2.3
PPTX
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
PPTX
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
PPTX
Hadoop: today and tomorrow
PDF
Keep your Hadoop cluster at its best!
PPTX
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Hortonworks Technical Workshop: What's New in HDP 2.3
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
Discover.hdp2.2.ambari.final[1]
Hadoop operations-2014-strata-new-york-v5
Apache Hadoop YARN: Past, Present and Future
Keep your hadoop cluster at its best! v4
Keep your Hadoop Cluster at its Best
What's new in Ambari
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
A First-Hand Look at What's New in HDP 2.3
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Hadoop: today and tomorrow
Keep your Hadoop cluster at its best!
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Machine Learning_overview_presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Tartificialntelligence_presentation.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
1. Introduction to Computer Programming.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Machine Learning_overview_presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Tartificialntelligence_presentation.pptx
Group 1 Presentation -Planning and Decision Making .pptx

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

  • 1. Page1 © Hortonworks Inc. 2015 Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster Sanjay Radia, Vinod Kumar Vavilapalli Hortonworks Inc June 9, 2015
  • 2. Page2 © Hortonworks Inc. 2015 Agenda •Introduction •What is Rolling Upgrade? •Problem – Several key issues to be addressed –Wire compatibility and side-by-side installs are not sufficient!! –Must Address: Data safety, Service degradation and disruption •Enhancements to various components –Packaging – side-by-side install –HDFS, YARN, Hive, Oozie, …
  • 3. Page3 © Hortonworks Inc. 2015 Sanjay Radia •Chief Architect, Founder, Hortonworks •Part of the Hadoop team at Yahoo! since 2007 –Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer • Prior –Data center automation, schedulers, virtualization, Java, HA, OSs, File Systems – (Startup, Sun Microsystems, Inria …) –Ph.D., University of Waterloo
  • 4. Page4 © Hortonworks Inc. 2015 Vinod Kumar Vavilapalli – Long time Hadooper since 2007 – Apache Hadoop Committer / PMC – Apache Member – Yahoo! -> Hortonworks – MapReduce -> YARN from day one
  • 5. Page5 © Hortonworks Inc. 2015 HDP Upgrade: Two Upgrade Modes Stop the Cluster Upgrade Shutdown services and cluster and then upgrade. Traditionally this was the only way Rolling Upgrade Upgrade cluster and its services while cluster is actively running applications Note: Upgrade time is proportional to # nodes, not data size Enterprises run critical services and data on a Hadoop cluster. Need live cluster upgrade that maintains SLAs without degradation
  • 6. Page6 © Hortonworks Inc. 2015 But you can also “Revert to Prior State” Rollback Revert bits and state of cluster and its services back to a checkpoint’d state. Why? This is an emergency procedure. Downgrade Downgrade the service and component to prior version, but keep any new data and metadata that has been generated Why? You are not happy with performance, or app compatibility, ….
  • 7. Page7 © Hortonworks Inc. 2015 But aren’t wire compatibility and side-by-side installs sufficient for Rolling upgrades? Unfortunately No!! Not if you want • Data safety • Keep running jobs/apps during upgrades; continue to run correctly • Maintain SLAs • Allow downgrade/rollbacks in case of problems
  • 8. Page8 © Hortonworks Inc. 2015 Issues that need to be addressed (1) • Data safety • HDFS’s upgrade checkpoint does not work for rolling upgrade • Service degradation – note every daemon is restarted in rolling fashion • HDFS write pipeline • Application Masters on YARN restart • NodeManagers restart • Hive server is processing client queries – it cannot restart to new version without loss • Client must not see failures – many components do not have retry BUT Hadoop deals with failures, it will fix pipelines, restart tasks – what is the big deal!! Service degradation will be high because every daemon is restarted
  • 9. Page9 © Hortonworks Inc. 2015 Issues that need to be addressed (2) • Maintaining the application submitter’s context (correctness) • MR tasks get their context from the local node – In the past the submitters and node’s context were identical – But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter - Half of the job could execute with old binaries and the other with the new one!! • Persistent state • Backward compatibility for upgrade (or convert) • Forward compatibility for downgrade (or convert) • Wire compatibility • With clients (forward and backward) • Internally (Between Masters and Slaves or Peers) – Note: the upgrade is in a rolling fashion
  • 10. Page10 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 11. Page11 © Hortonworks Inc. 2015 Packaging: Side-by-side Installs (1) • Need side-by-side installs of multiple versions on same node • Some components are version N, while others are N+1 • For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN) • HDP’s solution: Use OS-distro standard packaging solution • Rejected proprietary packing as a solution (no lock-in) • Want to support RU via Ambari and Manually • Standard packaging solutions like RPMs have useful tools and mechanisms – Tools to install, uninstall, query, etc – Manage dependencies automatically – Admins do not need to learn new tools and formats • Side benefits for ‘stop-the-world” upgrade: • Can install the new binaries before the shutdown
  • 12. Page12 © Hortonworks Inc. 2015 Packaging: Side-by-side installs (2) • Layout: side-by-side • /usr/hdp/2.2.0.0/hadoop • /usr/hdp/2.2.0.0/hive • /usr/hdp/2.3.0.0/hadoop • /usr/hdp/2.3.0.0/hive • Define what is current for each component’s daemon and clients • /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop • /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop • /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop • Distro-select helps you manage the version switch • Our solution: the package name contains the version number: • E.g hadoop_2_2_0_0 is the RPM package name itself – Hadoop_2_3_0_0 is different peer package • Bin commands point to current: /usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
  • 13. Page13 © Hortonworks Inc. 2015 Packaging: Side-by-side installs (3) • distro-select tool to select current binary • Per-component, Per-daemon • Maintain stack consistency – that is what QE tested • Each component refers to its siblings of same stack version • Each component knows the “hadoop home” of the same stack – Wrapper bin-scripts set this up • Config updates can be optionally synchronized with binary upgrade • Configs can sit in their old location • But what if the new binary version requires slightly different config? • Each binary version has its own config pointer – /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
  • 14. Page14 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 15. Page15 © Hortonworks Inc. 2015 HDFS Enhancements (1) Data safety • Since version 2007, HDFS supported an upgrade-checkpoint • Backups of HDFS not practical – too large • Protects against HDFS bugs in new version deleting files – Standard practice to use for ALL upgrade even patch releases • But this only works for “stop-the-world” full upgrade and does not support downgrade • Irresponsible to do rolling upgrade without such a mechanism HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535) • Markers for rollback • “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code – Old scheme had hardlinks but we now delay the deletes • Added downgrade capability • Protobuf based fsImage for compatible extensibility
  • 16. Page16 © Hortonworks Inc. 2015 HDFS Enhancements (2) Minimize service degradation and retain data safety • Fast datanode restart (HDFS-5498) • Write pipeline – every DN will be upgraded and hence many write pipelines will break and repaired • Umbrella Jira HDFS-5535 – Repair it to the same DN during RU (avoid replica data copy) – Retain same number of replicas in pipeline • Upgrade HA standby and failover (NN HA available for a long time)
  • 17. Page17 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 18. Page18 © Hortonworks Inc. 2015 YARN Enhancements: Minimize Service Degradation • YARN RM retains application queue (2013) • YARN RM fail-over (2014) – Note this retains the queues but ALL jobs are rekicked • YARN RM can restart while retaining applications (2015)
  • 19. Page19 © Hortonworks Inc. 2015 YARN Enhancements: Minimize Service Degradation • A restarted YARN NodeManager retains existing containers (2015) • Recall: restarting containers will cause serious SLA degradation
  • 20. Page20 © Hortonworks Inc. 2015 YARN Enhancements: Compatibility • Versioning of state-stores of RM and NMs • Compatible evolution of tokens over time • Wire compatibility between mixed versions of RM
  • 21. Page21 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 22. Page22 © Hortonworks Inc. 2015 Retaining Job/App context • Previously a Job/Apps used libraries from the local node • Worked because client-node & compute-nodes had same version • But during RU, the NodeManager has multiple versions • Must use the same version as used by the client when submitting a job • Solution: • Framework libraries are now installed in HDFS • Client-context sent as “distro-version” variable in job config • Has side benefits – Frameworks now installed in single node and then uploaded to HDFS • Note Oozie also enhanced to maintain consistent context
  • 23. Page23 © Hortonworks Inc. 2015 YARN Rolling Upgrades: A Cluster Snapshot
  • 24. Page24 © Hortonworks Inc. 2015 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • YARN Enhancements • Retaining Job/App Context • Hive Enhancements
  • 25. Page25 © Hortonworks Inc. 2015 Hive Enhancements • Fast restarts + client-side reconnection • Hive metastore and Hive client • Hive-server2: stateful server that submits the client’s query • Need to keep it running till the old queries complete • Solution: • Allow multiple Hive-servers to run, each registered in Zookeeper • New client requests go to new servers • Old server completes old queries but does not receive any new ones – Old-server is removed from Zookeeper • Side benefits • HA + Load balancing solution for Hiveserver2
  • 26. Page26 © Hortonworks Inc. 2015 Automated Rolling Upgrade Via Ambari Via Your own cluster management scripts
  • 27. Page27 © Hortonworks Inc. 2015 HDP Rolling Upgrades Runbook Pre-requisites • HA • Configs Prepare • Install bits • DB backups • HDFS checkpoint Rolling Upgrade Finalize Rolling Downgrade Rollback NOT Rolling. Shutdown all services. Note: Upgrade time is proportional to # nodes, not data size
  • 28. Page30 © Hortonworks Inc. 2015 Both Manual and Automated Rolling Upgrade • Ambari supports fully automated upgrades • Verifies prerequisites • Performs HDFS upgrade-checkpoint, prompts for DB backups • Performs rolling upgrade • All the components, in the right order • Smoke tests at each critical stages • Opportunities for Admin verification at critical stages • Downgrade if you change your mind • Have published the runbook for those that do not use Ambari • You can do it manually or automate your own process
  • 29. Page31 © Hortonworks Inc. 2015 Runbook: Rolling Upgrade Ambari has automated process for Rolling Upgrades Services are switched over to new version in rolling fashion Any components not installed on cluster are skipped Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Finalize HDFS, YARN, MR, Tez, HBase, Pig. Hive, Phoenix, Mahout HDFS YARN HBase
  • 30. Page32 © Hortonworks Inc. 2015 Runbook: Rolling Downgrade Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Downgrade Finalize
  • 31. Page33 © Hortonworks Inc. 2015 Summary • Enterprises run critical services and data on a Hadoop cluster. • Need a live cluster upgrade without degradation and maintaining SLAs • We enhanced Hadoop components for enterprise-grade rolling upgrade • Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, ) • Data safety – HDFS checkpoints and write-pipelines • Maintain SLAs – solve a number of service degradation problems – HDFS write pipelines, Yarn RM, NM state recovery, Hive, … • Jobs/apps continue to run correctly with the right context • Allow downgrade/rollbacks in case of problems • All enhancements truly open source and pushed back to Apache? • Yes of course – that is how Hortonworks does business …
  • 32. Page34 © Hortonworks Inc. 2015 Backup slides
  • 33. Page35 © Hortonworks Inc. 2015 Why didn’t you use alternatives • Alternatives generally keep one version active, not two • We need to move some services as a pack (clients) • We need to support managing confs and binaries together and separately • Maybe we could have done it, but it was getting complex …..

Editor's Notes

  • #9: HDFS write pipeline – slow down writes, risk data Yarn App masters restart – app failure if App master does not have persistent state Node manager restart – Tasks fail, restarts, SLA degrades Hive server is processing client queries – it cannot restart for new version Client must not see failures – many components do not have retry
  • #28: Yahoo! upgrades approx 1K nodes (out of 40K) a day A 4K cluster takes 2 days