SlideShare a Scribd company logo
Status of Hadoop 0.23
Operations at Yahoo!
From Inception to Customer Validation

Charles Wimmer, Staff Site Reliability
        Engineer at LinkedIn
Summary of This Talk
● Includes
  ○ Operational changes required to support 0.23
● Does not include
  ○ Specifics about customer testing
  ○ Deployment into Research or Production clusters
Scope of This Change at Yahoo!
●   42,000+ Hadoop servers
●   20+ clusters
●   Three tiers: Sandbox, Research, Production
●   0.20.205.x
Overview of the Process
● Provide customers a 0.23 Sandbox cluster
● Provide customers enough data to test their
  applications
● Provide developer support to address
  application issues quickly
● Upgrade Research and Production clusters
  as applications are certified to work with 0.23
Test Cluster
●   420 Nodes
●   2 x Westmere 4 core processors
●   24G RAM
●   12 x 2T Disks
●   No Federation
Configuration
● Hierarchical Queues
● Memory Configuration
● Kerberos
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.capacity</name>
  <value>100</value>
</property>
Hierarchical Queues
<property>
 <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name>
  <value>30</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name>
  <value>15</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
  <value>5</value>
</property>
Hierarchical Queues
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a.
capacity</name>
  <value>50</value>
</property>
<property>
  <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b.
capacity</name>
  <value>50</value>
</property>
Hierarchical Queues
Memory Configuration
  <property> <name>yarn.
nodemanager.resource.memory-
mb</name>
  <value>21504</value>
  </property>
Kerberos Configuration
  <property>
   <name>yarn.resourcemanager.principal</name>
    <value>mapred/clustername-jt1.domain.name.com@REALM.
NAME.COM</value>
  </property>



  <property>
   <name>yarn.nodemanager.principal</name>
   <value>tt/_HOST@REALM.NAME.COM</value>
  </property>
Init Scripts
●   DataNode/NameNode
●   SecondaryNameNode
●   HistoryServer
●   NodeManager
●   ResourceManager
DataNode/NameNode
start_20(){
 . . .
}
start_next(){
 . . .
}
if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then
   start_next $@
else
   start_20 $@
fi
SecondaryNameNode
function clean_checkpoint_dir {
 CHECKPOINT_DIR=/grid/0/tmp/hadoop-
hdfs/dfs/namesecondary/current
 if [ -d "$CHECKPOINT_DIR" ] ; then
   DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX`
    if [ $? -eq 0 ] ; then
    echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ "
    mv $CHECKPOINT_DIR ${DELETE_DIR}/
    cat<<EOF | at now+1min 2>/dev/null
if [ -d $DELETE_DIR ] ; then
    rm -rf --preserve-root $DELETE_DIR
fi
EOF
    fi
 fi
}
HistoryServer
case "$1" in
 start)
   su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh
--config $HADOOP_CONF_DIR start
historyserver"
   RET=$?
   ;;
ResourceManager/NodeManager
case "$1" in
 start)
  su $HADOOP_USER -s /bin/sh -c 
"$HADOOP_PREFIX/sbin/yarn-daemon.sh --config
$HADOOP_CONF_DIR start $PROC"
  RET=$?
  ;;
Questions?
Charles Wimmer

@cwimmer

charles@wimmer.net

cwimmer@linkedin.com

More Related Content

PDF
Online Upgrade Using Logical Replication.
 
PDF
hadoop
PDF
MySQL Backup & Recovery
PDF
InnoDB Performance Optimisation
PDF
Percona Xtrabackup - Highly Efficient Backups
PDF
MySQL Backup and Recovery Essentials
PDF
Hidden gems in Apache Jackrabbit and BloomReach Forge
PPTX
ProxySQL para mysql
Online Upgrade Using Logical Replication.
 
hadoop
MySQL Backup & Recovery
InnoDB Performance Optimisation
Percona Xtrabackup - Highly Efficient Backups
MySQL Backup and Recovery Essentials
Hidden gems in Apache Jackrabbit and BloomReach Forge
ProxySQL para mysql

What's hot (20)

PPT
My two cents about Mysql backup
PDF
MySQL Backup and Security Best Practices
PDF
patroni-based citrus high availability environment deployment
PDF
Broker otw.pptx
PDF
MySQL Server Backup, Restoration, and Disaster Recovery Planning
ODP
Common schema my sql uc 2012
PDF
MySQL Performance Tuning Variables
PDF
Linux internals for Database administrators at Linux Piter 2016
PPTX
PDF
[2019] 200만 동접 게임을 위한 MySQL 샤딩
PPTX
Instroduce Hazelcast
PDF
Apache ignite - a do-it-all key-value db?
PPTX
Backup, Restore, and Disaster Recovery
PDF
MySQL Timeout Variables Explained
PDF
Get to know PostgreSQL!
PDF
Hosting huge amount of binaries in JCR
PDF
Parallel Replication in MySQL and MariaDB
PDF
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PDF
Postgres Plus Cloud Database on OpenStack
PDF
Out of the box replication in postgres 9.4
My two cents about Mysql backup
MySQL Backup and Security Best Practices
patroni-based citrus high availability environment deployment
Broker otw.pptx
MySQL Server Backup, Restoration, and Disaster Recovery Planning
Common schema my sql uc 2012
MySQL Performance Tuning Variables
Linux internals for Database administrators at Linux Piter 2016
[2019] 200만 동접 게임을 위한 MySQL 샤딩
Instroduce Hazelcast
Apache ignite - a do-it-all key-value db?
Backup, Restore, and Disaster Recovery
MySQL Timeout Variables Explained
Get to know PostgreSQL!
Hosting huge amount of binaries in JCR
Parallel Replication in MySQL and MariaDB
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
Postgres Plus Cloud Database on OpenStack
Out of the box replication in postgres 9.4
Ad

Similar to Status of Hadoop 0.23 Operations at Yahoo (20)

PDF
Kerberizing spark. Spark Summit east
PDF
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
PPTX
ProxySQL for MySQL
PPTX
Trouble shooting apachecloudstack
PPTX
Oracle database 12.2 new features
PDF
Data Analytics Service Company and Its Ruby Usage
PPTX
3. v sphere big data extensions
PDF
Dok Talks #124 - Intro to Druid on Kubernetes
PDF
REST in Piece - Administration of an Oracle Cluster/Database using REST
ODP
Ci for all
PDF
Lessons learned when managing MySQL in the Cloud
PPTX
SQL Server 2014 Hybrid Cloud Features
ODP
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
PPTX
Craft CMS: Beyond the Small Business; Advanced tools and configurations
PPTX
Oracle GoldenGate 21c New Features and Best Practices
PPTX
Moving a Windows environment to the cloud - DevOps Galway Meetup
PDF
Time series denver an introduction to prometheus
PDF
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
PDF
Troubleshooting Apache Cloudstack
PDF
Monitoring your technology stack with New Relic
Kerberizing spark. Spark Summit east
Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)
ProxySQL for MySQL
Trouble shooting apachecloudstack
Oracle database 12.2 new features
Data Analytics Service Company and Its Ruby Usage
3. v sphere big data extensions
Dok Talks #124 - Intro to Druid on Kubernetes
REST in Piece - Administration of an Oracle Cluster/Database using REST
Ci for all
Lessons learned when managing MySQL in the Cloud
SQL Server 2014 Hybrid Cloud Features
DB proxy server test: run tests on tens of virtual machines with Jenkins, Vag...
Craft CMS: Beyond the Small Business; Advanced tools and configurations
Oracle GoldenGate 21c New Features and Best Practices
Moving a Windows environment to the cloud - DevOps Galway Meetup
Time series denver an introduction to prometheus
PuppetDB: A Single Source for Storing Your Puppet Data - PUG NY
Troubleshooting Apache Cloudstack
Monitoring your technology stack with New Relic
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPT
Teaching material agriculture food technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine learning based COVID-19 study performance prediction
NewMind AI Weekly Chronicles - August'25-Week II
Teaching material agriculture food technology

Status of Hadoop 0.23 Operations at Yahoo

  • 1. Status of Hadoop 0.23 Operations at Yahoo! From Inception to Customer Validation Charles Wimmer, Staff Site Reliability Engineer at LinkedIn
  • 2. Summary of This Talk ● Includes ○ Operational changes required to support 0.23 ● Does not include ○ Specifics about customer testing ○ Deployment into Research or Production clusters
  • 3. Scope of This Change at Yahoo! ● 42,000+ Hadoop servers ● 20+ clusters ● Three tiers: Sandbox, Research, Production ● 0.20.205.x
  • 4. Overview of the Process ● Provide customers a 0.23 Sandbox cluster ● Provide customers enough data to test their applications ● Provide developer support to address application issues quickly ● Upgrade Research and Production clusters as applications are certified to work with 0.23
  • 5. Test Cluster ● 420 Nodes ● 2 x Westmere 4 core processors ● 24G RAM ● 12 x 2T Disks ● No Federation
  • 6. Configuration ● Hierarchical Queues ● Memory Configuration ● Kerberos
  • 7. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.queues</name> <value>BIZUNIT-A,BIZUNIT-U,BIZUNIT-C,unfunded</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>100</value> </property>
  • 8. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-A.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.capacity</name> <value>30</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-C.capacity</name> <value>15</value> </property> <property> <name>yarn.scheduler.capacity.root.unfunded.capacity</name> <value>5</value> </property>
  • 9. Hierarchical Queues <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-a. capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.BIZUNIT-U.proj-b. capacity</name> <value>50</value> </property>
  • 11. Memory Configuration <property> <name>yarn. nodemanager.resource.memory- mb</name> <value>21504</value> </property>
  • 12. Kerberos Configuration <property> <name>yarn.resourcemanager.principal</name> <value>mapred/clustername-jt1.domain.name.com@REALM. NAME.COM</value> </property> <property> <name>yarn.nodemanager.principal</name> <value>tt/_HOST@REALM.NAME.COM</value> </property>
  • 13. Init Scripts ● DataNode/NameNode ● SecondaryNameNode ● HistoryServer ● NodeManager ● ResourceManager
  • 14. DataNode/NameNode start_20(){ . . . } start_next(){ . . . } if [ -x /home/gs/hadoop/current/bin/hdfs ] ; then start_next $@ else start_20 $@ fi
  • 15. SecondaryNameNode function clean_checkpoint_dir { CHECKPOINT_DIR=/grid/0/tmp/hadoop- hdfs/dfs/namesecondary/current if [ -d "$CHECKPOINT_DIR" ] ; then DELETE_DIR=`mktemp -p /grid/0/tmp -d delete-XXXXXX` if [ $? -eq 0 ] ; then echo "moving $CHECKPOINT_DIR to ${DELETE_DIR}/ " mv $CHECKPOINT_DIR ${DELETE_DIR}/ cat<<EOF | at now+1min 2>/dev/null if [ -d $DELETE_DIR ] ; then rm -rf --preserve-root $DELETE_DIR fi EOF fi fi }
  • 16. HistoryServer case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver" RET=$? ;;
  • 17. ResourceManager/NodeManager case "$1" in start) su $HADOOP_USER -s /bin/sh -c "$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start $PROC" RET=$? ;;