Hadoop Installation and basic configuration

Hadoop HDFS/MapReduce
Architecture
Hardware
Installation and Configuration
Monitoring
Namenode

Hardware Requirements
● NameNode + JobTracker
– >= 2 cores
– >= 8 gigs ram
– >= 40gig disk RAID 10
● DataNode + TaskTracker
– >= 4 cores
– >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM
– >= N Gig disk space JBOD (no raid)

Installation
● Download tar file from hadoop or use a prebuilt
rpm
● https://github.com/gerritjvv/repo
● http://bigtop.apache.org/

Configuration
● $HADOOP_HOME/conf/core-site.xml
● $HADOOP_HOME/conf/mapred-site.xml
● $HADOOP_HOME/conf/hdfs-site.xml
● http://hadoop.apache.org/docs/stable/cluster_setup
●

Configuration Namenode
● Create directory for namenode metadata
– /data/hadoop/name
● Open core-site.xml
– Define fs.default.name = http://<host>:8020
● Open hdfs-site.xml
– Define dfs.name.dir=/data/hadoop/name
– Define dfs.replication=3
– Create dir /data/hadoop/hdfs
– Define dfs.data.dir=/data/hadoop/hdfs
– Defin dfs.http.address=localhost:50070
● Start the namenode with the format option
– /opt/hadoop/bin/hadoop namenode -format
– After the format start the namenode with service hadoop-namenode start

Configuration JobTracker
● Open /opt/hadoop/conf/mapred-site.xml
– Define the property
mapred.job.tracker=<host>:8021
– Create the directory /data/hadoop/mapred
– Define mapred.local.dir=/data/hadoop/mapred
● Start the JobTracker with service hadoop-
jobtracker start

Configuration DataNode
● On each datanode create the directory
/data/hadoop/hdfs (one directory per disk)
● Open /opt/hadoop/conf/hdfs-site.xml
– Define dfs.http.address=<host>:50070
– Define dfs.data.dir=/data/hadoop/hdfs
● Start the datanodes with service hadoop-
datanode start

Configuration Mapreduce
● On each datanode create the directory /data/hadoop/mapred
● Open /opt/hadoop/conf/mapred-site.xml
– Define mapred.local.dir=/data/hadoop/mapred
– Define mapred.tasktracker.map.tasks.maximum=<Number of map
tasks>
– Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce
tasks>
● Start the TaskTrackers with service hadoop-tasktracker start

Monitoring
● Web Html scraping
– https://github.com/gerritjvv/hadoop-monitoring
● Glanglia
– http://ganglia.info/?p=88
● Cacti
– http://blog.cloudera.com/blog/2009/07/hadoop-graphing

Namenode Edits
● Writes/Updates/Deletes are written to RAM and
to a write ahead log.
● The metadata in RAM is only merged into a
binary file during the secondary namenode
checkpoint
● This file corrupts easily
● Recovery is a manual task

HA
● Yarn and Hadoop 2.0.0
● Experimental
● http://hadoop.apache.org/docs/current/hadoop-yarn

Hadoop Installation and basic configuration

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Hadoop Installation and basic configuration (20)

Recently uploaded (20)

Hadoop Installation and basic configuration