SlideShare a Scribd company logo
APOLLO GROUP




Hadoop Operations: Starting Out Small
So Your Cluster Isn't Yahoo-sized (yet)
Michael Arnold
Principal Systems Engineer
14 June 2012
Agenda

  Who
  What (Definitions)
  Decisions for Now
  Decisions for Later
  Lessons Learned




APOLLO GROUP             © 2012 Apollo Group        2
APOLLO GROUP




  Who




APOLLO GROUP Apollo Group
          © 2012            3
Who is Apollo?

        Apollo Group is a leading provider of higher
          education programs for working adults.




APOLLO GROUP              © 2012 Apollo Group                4
Who is Michael Arnold?

  Systems Administrator
  Automation geek
  13 years in IT
  I deal with:
      –Server hardware specification/configuration
      –Server firmware
      –Server operating system
      –Hadoop application health
      –Monitoring all the above


APOLLO GROUP              © 2012 Apollo Group        5
APOLLO GROUP




  What
  Definitions




APOLLO GROUP Apollo Group
          © 2012            6
Definitions

  Q: What is a tiny/small/medium/large cluster?
  A:
      –Tiny:          1-9
      –Small:         10-99
      –Medium:        100-999
      –Large:         1000+
      –Yahoo-sized:   4000




APOLLO GROUP              © 2012 Apollo Group             7
Definitions

  Q: What is a “headnode”?
  A: A server that runs one or more of the following
   Hadoop processes:
      –NameNode
      –JobTracker
      –Secondary NameNode
      –ZooKeeper
      –HBase Master




APOLLO GROUP            © 2012 Apollo Group             8
APOLLO GROUP




  What decisions should you
  make now and which can
  you postpone for later?
  Decisions for Now



APOLLO GROUP Apollo Group
          © 2012              9
Which Hadoop distribution?

  Amazon
  Apache
  Cloudera
  Greenplum
  Hortonworks
  IBM
  MapR
  Platform Computing



APOLLO GROUP            © 2012 Apollo Group   10
Should you virtualize?

  Can be OK for small clusters BUT
      –virtualization adds overhead
      –can cause performance degradation
      –cannot take advantage of Hadoop rack locality
  Virtualization can be good for:
      –functional testing of M/R job or workflow changes
      –evaluation of Hadoop upgrades




APOLLO GROUP              © 2012 Apollo Group              11
What sort of hardware should you be
                                      considering?

  Inexpensive
  Not “enterprisey” hardware
     –No RAID*
     –No Redundant power*
  Low power consumption
  No optical drives
     –get systems that can boot off the network



                                              * except in headnodes

APOLLO GROUP            © 2012 Apollo Group                       12
Plan for capacity expansion

  Start at the bottom and
   work your way up
  Leave room in your
   cabinets for more
   machines




APOLLO GROUP            © 2012 Apollo Group    13
Plan for capacity expansion (cont.)

  Deploy your initial
   cluster in two cabinets
     –One headnode, one
      switch, and several
      (five) datanodes per
      cabinet




APOLLO GROUP            © 2012 Apollo Group      14
Plan for capacity expansion (cont.)

  Install a second cluster
   in the empty space in
   the upper half of the
   cabinet




APOLLO GROUP             © 2012 Apollo Group     15
APOLLO GROUP




  What decisions should you
  make now and which can
  you postpone for later?
  Decisions for Later



APOLLO GROUP Apollo Group
          © 2012              16
What size cluster?

  Depends upon your:
  Budget
  Data size
  Workload characteristics
  SLA




APOLLO GROUP           © 2012 Apollo Group                    17
What size cluster? (cont.)

  Are your MapReduce jobs:
  compute-intensive?
  reading lots of data?

  http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/




APOLLO GROUP                   © 2012 Apollo Group                      18
Should you implement rack awareness?


        If more than one switch in the cluster:

                           YES




APOLLO GROUP            © 2012 Apollo Group       19
Should you use automation?

       If not in the beginning, then as soon as
                        possible.

  Boot disks will fail.
  Automated OS and application installs:
      –Save time
      –Reduce errors
          •Cobbler/Spacewalk/Foreman/xCat/etc
          •Puppet/Chef/Cfengine/shell scripts/etc

APOLLO GROUP              © 2012 Apollo Group       20
APOLLO GROUP




  Lessons Learned




APOLLO GROUP Apollo Group
          © 2012            21
Keep It Simple

            Don't add redundancy and features
         (server/network) that will make things more
                 complicated and expensive.

               Hadoop has built-in redundancies.

                     Don't overlook them.




APOLLO GROUP                © 2012 Apollo Group                22
Automate the Hardware

  Twelve hours of manual work in the datacenter is
   not fun.
  Make sure all server firmware is configured
   identically.
      –HP SmartStart Scripting Toolkit
      –Dell OpenManage Deployment Toolkit
      –IBM ServerGuide Scripting Toolkit




APOLLO GROUP            © 2012 Apollo Group           23
Rolling upgrades are possible

               (Just not of the Hadoop software.)

   Datanodes can be decommissioned, patched, and
       added back into the cluster without service
                      downtime.




APOLLO GROUP                © 2012 Apollo Group      24
The smallest thing can have a big impact on the
                                             cluster


  Bad NIC/switchport can cause cluster slowness.

  Slow disks can cause intermittent job slowdowns.




APOLLO GROUP           © 2012 Apollo Group            25
HDFS blocks are weird

  On ext3/ext4:
      –Small blocks are not padded to the HDFS block-
       size, but rather the actual size of the data.
      –Each HDFS block is actually two files on the
       datanode's filesystem:
          •The actual data and
          •A metadata/checksum file

 # ls -l blk_1058778885645824207*
 -rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207
 -rw-r--r-- 1 hdfs hdfs   283 May 14 01:26 blk_1058778885645824207_19155994.meta



APOLLO GROUP                        © 2012 Apollo Group                        26
Do not prematurely optimize

  Be careful tuning your datanode filesystems.
      • mkfs -t ext4 -T largefile4 ... (probably bad)
      • mkfs -t ext4 -i 131072 -m 0 ... (better)

 /etc/mke2fs.conf
 [fs_types]
  hadoop = {
         features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,
  extra_isize
         inode_ratio = 131072
         blocksize = -1
         reserved_ratio = 0
         default_mntopts = acl,user_xattr
  }

APOLLO GROUP                       © 2012 Apollo Group                          27
Use DNS-friendly names for services

       hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/
         mapred.delta.hadoop.apollogrp.edu:8021
      http://oozie.delta.hadoop.apollogrp.edu:11000/
      hiveserver.delta.hadoop.apollogrp.edu:10000



   Yes, the names are long, but I bet you can figure out how to
                    connect to Bravo Cluster.




APOLLO GROUP                © 2012 Apollo Group                   29
Use a parallel, remote execution tool

  pdsh/Cluster SSH/mussh/etc

                 SSH in a for loop is so 2010

  FUNC/MCollective




APOLLO GROUP               © 2012 Apollo Group     30
Make your log directories as large as you can.

  20-100GB /var/log
      –Implement log purging cronjobs or your log
       directories will fill up.


  Beware: M/R jobs can fill up /tmp as well.




APOLLO GROUP              © 2012 Apollo Group        31
Insist on IPMI 2.0 for out of band management of
                                     server hardware.

  Serial Over LAN is awesome when booting a
   system.
  Standardized hardware/temperature monitoring.
  Simple remote power control.




APOLLO GROUP            © 2012 Apollo Group         33
Spanning-tree is the devil

  Enable portfast on your server switch ports or the
   BMCs may never get a DHCP lease.




APOLLO GROUP            © 2012 Apollo Group            34
Apollo has re-built it's cluster four times.

               You may end up doing so as well.




APOLLO GROUP               © 2012 Apollo Group     35
Apollo Timeline

  First build
  Cloudera Professional Services helped install CDH
  Four nodes
  Manually build OS via USB CDROM.
  CDH2




APOLLO GROUP           © 2012 Apollo Group                 36
Apollo Timeline

  Second build
  Cobbler
  All software deployment is via kickstart. Very little
   is in puppet. Config files are deployed via wget.
  CDH2




APOLLO GROUP              © 2012 Apollo Group                 37
Apollo Timeline

  Third build
  OS filesystem partitioning needed to change.
  Most software deployment still via kickstart.
  CDH3b2




APOLLO GROUP            © 2012 Apollo Group                 38
Apollo Timeline

  Fourth build
  HDFS filesystem inodes needed to be increased.
  Full puppet automation.
  Added redundant/hotswap enterprise hardware for
   headnodes.
  CDH3u1




APOLLO GROUP          © 2012 Apollo Group                 39
Cluster failures at Apollo

  Hardware
      –disk failures (40+)
      –disk cabling (6)
      –RAM (2)
      –switch port (1)
  Software
      –Cluster
          •NFS (NN -> 2NN metadata)
      –Job
          •TT java heap
          •Running out of /tmp or /var/log/hadoop
          •Running out of HDFS space

APOLLO GROUP                  © 2012 Apollo Group        40
Know your workload

  You can spend all the time in the world trying to get
   the best CPU/RAM/HDD/switch/cabinet
   configuration, but you are running on pure luck
   until you understand your cluster's workload.




APOLLO GROUP             © 2012 Apollo Group               41
APOLLO GROUP




  Questions?




APOLLO GROUP Apollo Group
          © 2012            42

More Related Content

PPTX
Hadoop Operations - Best Practices from the Field
PPTX
Hadoop operations-2014-strata-new-york-v5
PDF
Hadoop Operations for Production Systems (Strata NYC)
PDF
Hadoop - Lessons Learned
PPTX
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
PDF
Improving Hadoop Cluster Performance via Linux Configuration
PPTX
Storage and-compute-hdfs-map reduce
PPTX
Keep your hadoop cluster at its best! v4
Hadoop Operations - Best Practices from the Field
Hadoop operations-2014-strata-new-york-v5
Hadoop Operations for Production Systems (Strata NYC)
Hadoop - Lessons Learned
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Improving Hadoop Cluster Performance via Linux Configuration
Storage and-compute-hdfs-map reduce
Keep your hadoop cluster at its best! v4

What's hot (20)

PPTX
Hadoop Backup and Disaster Recovery
PDF
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
PPTX
Best Practices for Virtualizing Hadoop
PDF
Improving Hadoop Performance via Linux
PPTX
Meet hbase 2.0
PPTX
Apache Hadoop 3 updates with migration story
PPTX
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
PPTX
Running a container cloud on YARN
PDF
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
PPTX
High Availability for HBase Tables - Past, Present, and Future
PPTX
CBlocks - Posix compliant files systems for HDFS
PDF
Difference between hadoop 2 vs hadoop 3
PDF
Optimizing Dell PowerEdge Configurations for Hadoop
PPTX
Hadoop Operations
PDF
Ozone - Evolution of hdfs scalability
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
PDF
Applications on Hadoop
PDF
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Big data processing meets non-volatile memory: opportunities and challenges
Hadoop Backup and Disaster Recovery
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Best Practices for Virtualizing Hadoop
Improving Hadoop Performance via Linux
Meet hbase 2.0
Apache Hadoop 3 updates with migration story
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Running a container cloud on YARN
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
High Availability for HBase Tables - Past, Present, and Future
CBlocks - Posix compliant files systems for HDFS
Difference between hadoop 2 vs hadoop 3
Optimizing Dell PowerEdge Configurations for Hadoop
Hadoop Operations
Ozone - Evolution of hdfs scalability
Introduction to Cloudera's Administrator Training for Apache Hadoop
Applications on Hadoop
Intro to Hadoop Presentation at Carnegie Mellon - Silicon Valley
HBase and HDFS: Understanding FileSystem Usage in HBase
Big data processing meets non-volatile memory: opportunities and challenges
Ad

Viewers also liked (20)

PDF
Intro to hadoop tutorial
ODP
HBase introduction talk
PDF
Tutorial hadoop hdfs_map_reduce
PDF
Introduction to Hadoop
PDF
Hadoop YARN
PPTX
Big dataarchitecturesandecosystem+nosql
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
PPT
Hadoop Tutorial
PPT
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
PPTX
Breakout: Hadoop and the Operational Data Store
PPTX
Hadoop Streaming Tutorial With Python
PDF
Hadoop Application Architectures tutorial at Big DataService 2015
PDF
A real time architecture using Hadoop and Storm @ FOSDEM 2013
PDF
Hive Anatomy
PPTX
Apache hadoop pig overview and introduction
PDF
Hadoop Ecosystem Architecture Overview
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
PDF
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
PDF
Hadoop Tutorial with @techmilind
 
PPTX
Big Data & Hadoop Tutorial
Intro to hadoop tutorial
HBase introduction talk
Tutorial hadoop hdfs_map_reduce
Introduction to Hadoop
Hadoop YARN
Big dataarchitecturesandecosystem+nosql
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop Tutorial
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Breakout: Hadoop and the Operational Data Store
Hadoop Streaming Tutorial With Python
Hadoop Application Architectures tutorial at Big DataService 2015
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Hive Anatomy
Apache hadoop pig overview and introduction
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Tutorial with @techmilind
 
Big Data & Hadoop Tutorial
Ad

Similar to Hadoop operations (20)

PDF
Operate your hadoop cluster like a high eff goldmine
PDF
Automated Configuration of Firmware
PDF
Next Generation Hadoop Operations
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
PPTX
Top 10 lessons learned from deploying hadoop in a private cloud
PPTX
Hadoop project design and a usecase
PDF
Lessons from building large clusters
PDF
HPE Hadoop Solutions - From use cases to proposal
PPTX
Deploying hp cloud
PDF
Hadoop at Nokia
PPTX
JavaOne 2016 "Java, Microservices, Cloud and Containers"
PDF
Hadoop 101
 
PDF
Cloud Deployments with Apache Hadoop and Apache HBase
PDF
Infrastructure Around Hadoop
PDF
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
PDF
Building Scale Free Applications with Hadoop and Cascading
PDF
HBase Sizing Notes
PDF
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
PDF
Saviak lviv ai-2019-e-mail (1)
PDF
HPE Solutions for Challenges in AI and Big Data
Operate your hadoop cluster like a high eff goldmine
Automated Configuration of Firmware
Next Generation Hadoop Operations
Hp Converged Systems and Hortonworks - Webinar Slides
Top 10 lessons learned from deploying hadoop in a private cloud
Hadoop project design and a usecase
Lessons from building large clusters
HPE Hadoop Solutions - From use cases to proposal
Deploying hp cloud
Hadoop at Nokia
JavaOne 2016 "Java, Microservices, Cloud and Containers"
Hadoop 101
 
Cloud Deployments with Apache Hadoop and Apache HBase
Infrastructure Around Hadoop
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Building Scale Free Applications with Hadoop and Cascading
HBase Sizing Notes
Continuous Deployment & Delivery + Culture Hacks @ QCON 2012
Saviak lviv ai-2019-e-mail (1)
HPE Solutions for Challenges in AI and Big Data

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Getting Started with Data Integration: FME Form 101
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Touch Screen Technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Hybrid model detection and classification of lung cancer
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Heart disease approach using modified random forest and particle swarm optimi...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
WOOl fibre morphology and structure.pdf for textiles
Getting Started with Data Integration: FME Form 101
A comparative study of natural language inference in Swahili using monolingua...
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Touch Screen Technology
Building Integrated photovoltaic BIPV_UPV.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A Presentation on Artificial Intelligence
Hybrid model detection and classification of lung cancer
Hindi spoken digit analysis for native and non-native speakers
1 - Historical Antecedents, Social Consideration.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Enhancing emotion recognition model for a student engagement use case through...
Accuracy of neural networks in brain wave diagnosis of schizophrenia
TLE Review Electricity (Electricity).pptx
Heart disease approach using modified random forest and particle swarm optimi...

Hadoop operations

  • 1. APOLLO GROUP Hadoop Operations: Starting Out Small So Your Cluster Isn't Yahoo-sized (yet) Michael Arnold Principal Systems Engineer 14 June 2012
  • 2. Agenda Who What (Definitions) Decisions for Now Decisions for Later Lessons Learned APOLLO GROUP © 2012 Apollo Group 2
  • 3. APOLLO GROUP Who APOLLO GROUP Apollo Group © 2012 3
  • 4. Who is Apollo? Apollo Group is a leading provider of higher education programs for working adults. APOLLO GROUP © 2012 Apollo Group 4
  • 5. Who is Michael Arnold? Systems Administrator Automation geek 13 years in IT I deal with: –Server hardware specification/configuration –Server firmware –Server operating system –Hadoop application health –Monitoring all the above APOLLO GROUP © 2012 Apollo Group 5
  • 6. APOLLO GROUP What Definitions APOLLO GROUP Apollo Group © 2012 6
  • 7. Definitions Q: What is a tiny/small/medium/large cluster? A: –Tiny: 1-9 –Small: 10-99 –Medium: 100-999 –Large: 1000+ –Yahoo-sized: 4000 APOLLO GROUP © 2012 Apollo Group 7
  • 8. Definitions Q: What is a “headnode”? A: A server that runs one or more of the following Hadoop processes: –NameNode –JobTracker –Secondary NameNode –ZooKeeper –HBase Master APOLLO GROUP © 2012 Apollo Group 8
  • 9. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for Now APOLLO GROUP Apollo Group © 2012 9
  • 10. Which Hadoop distribution? Amazon Apache Cloudera Greenplum Hortonworks IBM MapR Platform Computing APOLLO GROUP © 2012 Apollo Group 10
  • 11. Should you virtualize? Can be OK for small clusters BUT –virtualization adds overhead –can cause performance degradation –cannot take advantage of Hadoop rack locality Virtualization can be good for: –functional testing of M/R job or workflow changes –evaluation of Hadoop upgrades APOLLO GROUP © 2012 Apollo Group 11
  • 12. What sort of hardware should you be considering? Inexpensive Not “enterprisey” hardware –No RAID* –No Redundant power* Low power consumption No optical drives –get systems that can boot off the network * except in headnodes APOLLO GROUP © 2012 Apollo Group 12
  • 13. Plan for capacity expansion Start at the bottom and work your way up Leave room in your cabinets for more machines APOLLO GROUP © 2012 Apollo Group 13
  • 14. Plan for capacity expansion (cont.) Deploy your initial cluster in two cabinets –One headnode, one switch, and several (five) datanodes per cabinet APOLLO GROUP © 2012 Apollo Group 14
  • 15. Plan for capacity expansion (cont.) Install a second cluster in the empty space in the upper half of the cabinet APOLLO GROUP © 2012 Apollo Group 15
  • 16. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for Later APOLLO GROUP Apollo Group © 2012 16
  • 17. What size cluster? Depends upon your: Budget Data size Workload characteristics SLA APOLLO GROUP © 2012 Apollo Group 17
  • 18. What size cluster? (cont.) Are your MapReduce jobs: compute-intensive? reading lots of data? http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/ APOLLO GROUP © 2012 Apollo Group 18
  • 19. Should you implement rack awareness? If more than one switch in the cluster: YES APOLLO GROUP © 2012 Apollo Group 19
  • 20. Should you use automation? If not in the beginning, then as soon as possible. Boot disks will fail. Automated OS and application installs: –Save time –Reduce errors •Cobbler/Spacewalk/Foreman/xCat/etc •Puppet/Chef/Cfengine/shell scripts/etc APOLLO GROUP © 2012 Apollo Group 20
  • 21. APOLLO GROUP Lessons Learned APOLLO GROUP Apollo Group © 2012 21
  • 22. Keep It Simple Don't add redundancy and features (server/network) that will make things more complicated and expensive. Hadoop has built-in redundancies. Don't overlook them. APOLLO GROUP © 2012 Apollo Group 22
  • 23. Automate the Hardware Twelve hours of manual work in the datacenter is not fun. Make sure all server firmware is configured identically. –HP SmartStart Scripting Toolkit –Dell OpenManage Deployment Toolkit –IBM ServerGuide Scripting Toolkit APOLLO GROUP © 2012 Apollo Group 23
  • 24. Rolling upgrades are possible (Just not of the Hadoop software.) Datanodes can be decommissioned, patched, and added back into the cluster without service downtime. APOLLO GROUP © 2012 Apollo Group 24
  • 25. The smallest thing can have a big impact on the cluster Bad NIC/switchport can cause cluster slowness. Slow disks can cause intermittent job slowdowns. APOLLO GROUP © 2012 Apollo Group 25
  • 26. HDFS blocks are weird On ext3/ext4: –Small blocks are not padded to the HDFS block- size, but rather the actual size of the data. –Each HDFS block is actually two files on the datanode's filesystem: •The actual data and •A metadata/checksum file # ls -l blk_1058778885645824207* -rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207 -rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.meta APOLLO GROUP © 2012 Apollo Group 26
  • 27. Do not prematurely optimize Be careful tuning your datanode filesystems. • mkfs -t ext4 -T largefile4 ... (probably bad) • mkfs -t ext4 -i 131072 -m 0 ... (better) /etc/mke2fs.conf [fs_types] hadoop = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink, extra_isize inode_ratio = 131072 blocksize = -1 reserved_ratio = 0 default_mntopts = acl,user_xattr } APOLLO GROUP © 2012 Apollo Group 27
  • 28. Use DNS-friendly names for services hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/ mapred.delta.hadoop.apollogrp.edu:8021 http://oozie.delta.hadoop.apollogrp.edu:11000/ hiveserver.delta.hadoop.apollogrp.edu:10000 Yes, the names are long, but I bet you can figure out how to connect to Bravo Cluster. APOLLO GROUP © 2012 Apollo Group 29
  • 29. Use a parallel, remote execution tool pdsh/Cluster SSH/mussh/etc SSH in a for loop is so 2010 FUNC/MCollective APOLLO GROUP © 2012 Apollo Group 30
  • 30. Make your log directories as large as you can. 20-100GB /var/log –Implement log purging cronjobs or your log directories will fill up. Beware: M/R jobs can fill up /tmp as well. APOLLO GROUP © 2012 Apollo Group 31
  • 31. Insist on IPMI 2.0 for out of band management of server hardware. Serial Over LAN is awesome when booting a system. Standardized hardware/temperature monitoring. Simple remote power control. APOLLO GROUP © 2012 Apollo Group 33
  • 32. Spanning-tree is the devil Enable portfast on your server switch ports or the BMCs may never get a DHCP lease. APOLLO GROUP © 2012 Apollo Group 34
  • 33. Apollo has re-built it's cluster four times. You may end up doing so as well. APOLLO GROUP © 2012 Apollo Group 35
  • 34. Apollo Timeline First build Cloudera Professional Services helped install CDH Four nodes Manually build OS via USB CDROM. CDH2 APOLLO GROUP © 2012 Apollo Group 36
  • 35. Apollo Timeline Second build Cobbler All software deployment is via kickstart. Very little is in puppet. Config files are deployed via wget. CDH2 APOLLO GROUP © 2012 Apollo Group 37
  • 36. Apollo Timeline Third build OS filesystem partitioning needed to change. Most software deployment still via kickstart. CDH3b2 APOLLO GROUP © 2012 Apollo Group 38
  • 37. Apollo Timeline Fourth build HDFS filesystem inodes needed to be increased. Full puppet automation. Added redundant/hotswap enterprise hardware for headnodes. CDH3u1 APOLLO GROUP © 2012 Apollo Group 39
  • 38. Cluster failures at Apollo Hardware –disk failures (40+) –disk cabling (6) –RAM (2) –switch port (1) Software –Cluster •NFS (NN -> 2NN metadata) –Job •TT java heap •Running out of /tmp or /var/log/hadoop •Running out of HDFS space APOLLO GROUP © 2012 Apollo Group 40
  • 39. Know your workload You can spend all the time in the world trying to get the best CPU/RAM/HDD/switch/cabinet configuration, but you are running on pure luck until you understand your cluster's workload. APOLLO GROUP © 2012 Apollo Group 41
  • 40. APOLLO GROUP Questions? APOLLO GROUP Apollo Group © 2012 42