SlideShare a Scribd company logo
Single-Node Hadoop Cluster Installation 
Presented By: 
Mahantesh Angadi, Nagarjuna D. N., Manoj P. T. 
2nd Sem Mtech-CNE (2014) 
Dept. of ISE, AIT 
Under The Guidance of: 
Manjunath T. N. 
Amogh P. K. 
Assistant Professor 
Dept. of ISE, AIT
OUTLINE 
•Requirements for Java & Hadoop installation 
•Jdk Installation steps 
•Hadoop installation steps
How to Install A Single-Node Hadoop Cluster 
•Assumptions 
oYou are running 32-bit windows 
oYour laptops has 4GB or more of RAM 
•Downloads 
oVMware Workstation-10 or more 
oUbuntu 10 or more 
oJava JDK 1.5 0r more(E.g. JDK 1.7) 
oHadoop 1.2.1 or more
•Instructions to Install Hadoop 
1. Install VMWare Workstation 
2. Create a new Virtual machine 
3. Point the installer disc image to the ISO file (E.g. Ubuntu 10) that you are downloaded 
4. Give the User name & Password (E.g. hduser for both) 
5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Host machine) 
6. Customize hardware 
a. Memory: 2GB RAM (more is better, but you want to leave some for your Host(Windows) machine) 
b. Processors: 2(more is better, but you wanted to leave some for your Host(Windows) machine)
7. Launch your Virtual machine (all the instructions after this step will be performed in Ubuntu) 
8. Login to User (E.g. hduser) 
9. Open a terminal window with Ctrl + Alt + T (you will use this shortcut a lot) 
• Type following commands in the terminal to download recent linux packages(needs internet connections)
$ sudo apt-get update
JDK Installation Steps 
$ sudo apt-get install openssh-server(recommends while connecting to localhost)
10. Install Java JDK 7 
a. Download the java JDK (http://www.wikihow.com/Install-Oracle-Java-JDK-on-Ubuntu-Linux) 
b. Unzip the file 
$ tar –xvf jdk-7u25-linux-i586.tar.gz (or) tar xzf jdk-7u25- linux-i586.tar.gz
•Now move the JDK 7 directory to /usr/lib/java (you suppose to create java folder in lib (your choice of location) directory) 
$ sudo mkdir –p /usr/lib/java 
•Now move from Download/Desktop folder to Java folder using terminal
•$ sudo cp -r jdk1.7.0_25 /usr/lib/java/
c. Do the following steps 
Edit the system PATH file /etc/profile and add the following system variables to your system path. Use nano, gedit or any other text editor, as root, open up /etc/profile. 
•Type/Copy/Paste: $ sudo gedit /etc/profile 
or 
•Type/Copy/Paste: $ sudo nano /etc/profile
•Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file: 
Type/Copy/Paste: JAVA_HOME=/usr/lib/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH
Single node hadoop cluster installation
•Change JDK to the version you are going to be installed 
Save(CTRL+X & Y & ENTER for nano) the /etc/profile file and exit.
d. Now run 
•$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_25/bin/java" 1 
oThis command notifies the system that Oracle Java JRE is available for use 
• $ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_25/bin/javac" 1 
oThis command notifies the system that Oracle Java JDK is available for use 
•$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_25/bin/javaws" 1 
oThis command notifies the system that Oracle Java Web start is available for use
Your Ubuntu Linux system that Oracle Java JDK/JRE must be the default Java. 
•Type/Copy/Paste: $ sudo update-alternatives --set java /usr/lib/java/jdk1.7.0_25/bin/java 
othis command will set the java runtime environment for the system 
•Type/Copy/Paste: $ sudo update-alternatives --set javac /usr/lib/java/jdk1.7.0_25/bin/javac 
othis command will set the javac compiler for the system 
•Type/Copy/Paste:$ sudo update-alternatives --set javaws /usr/lib/java/jdk1.7.0_25/bin/javaws 
othis command will set Java Web start for the system
•A successful installation of 32-bit Oracle Java will display: 
Type/Copy/Paste: $ java -version 
oThis command displays the version of java running on your system 
You should receive a message which displays: 
Java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.5_25-b18) Java HotSpot(TM) Server VM (build 24.25-b08, mixed mode) 
Type/Copy/Paste: $ javac -version 
oThis command lets you know that you are now able to compile Java programs from the terminal. 
You should receive a message which displays: 
javac 1.7.0_25
•Successful Java installation displays 
 
“Congratulations you are successfully installed Java JDK”
Hadoop Installation Steps 
Prerequisites 
•Configure JDK: 
oSun Java JDK is compulsory to run hadoop, therefore all the nodes in hadoop cluster should have JDK configured. Ex:-jdk 1.5 & above ( preference:- jdk-7u25-linux-i586.tar.gz) 
•Download hadoop package: 
Ex:- hadoop-1.2.1-bin.tar.gz 
•NOTE: 
In a multi-node hadoop cluster, the master node uses Secure Shell (SSH) commands 
to manipulate the remote nodes. This requires all the nodes must have the same version of JDK and hadoop core. If the versions among nodes are different, errors will occur when you start the cluster.
Adding a dedicated Hadoop system user 
•We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc). 
oThis will add the user hduser and the group hadoop to your local machine. 
$su - hduser 
oThis will change to hduser 
$ sudo addgroup hadoop 
$ sudo adduser --ingroup hadoop hduser
Configuring SSH 
•Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short hadoop installation tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section. 
•we assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. 
•First, we have to generate an SSH key for the hduser user. 
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
hduser@ubuntu:~$ ssh-keygen -t rsa -P "" 
Private Key 
Public Key
•Second, you have to enable SSH access to your local machine with this newly created key. 
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
•The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. 
•If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information). 
•hduser@ubuntu:~$ ssh localhost 
Are you sure you want to continue connecting (yes/no)? yes
•If the SSH connect should fail, these general tips will help:- 
•Enable debugging with ssh -vvv localhost and investigate the error in detail. 
•Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload. 
•Successful connection to localhost diplays:
Disabling IPv6 
•One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of our Ubuntu box. In our case, we realized that there’s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, we simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. 
•To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file: 
# disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1 
/etc/sysctl.conf
•You have to reboot your machine in order to make the changes take effect. 
•You can check whether IPv6 is enabled on your machine with the following command: 
•A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want). 
Alternative 
•You can also disable IPv6 only for Hadoop as documented in HADOOP. You can do so by adding the following line to : 
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6 
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 
conf/hadoop-env.sh
Hadoop Installation 
•Download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. we picked /usr/local/hadoop. 
Update $HOME/.bashrc 
•Add the following lines to the end of the $HOME/.bashrc file of user hduser. If you use a shell other than bash, you should of course update its appropriate configuration files instead of .bashrc. 
$ cd /usr/local$ sudo tar xzf hadoop-1.0.3.tar.gz 
$ sudo mv hadoop-1.0.3 hadoop 
$ sudo chown -R hduser:hadoop hadoop-1.2.1
Copy n paste it in $HOME/.bashrc and edit to your requirements 
# Set Hadoop-related environment variables 
export HADOOP_HOME=/usr/local/hadoop (edit here) 
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
export JAVA_HOME=/usr/lib/jvm/java-6-sun(edit here) 
# Some convenient aliases and functions for running Hadoop-related commands 
unalias fs &> /dev/null 
alias fs="hadoop fs“ 
unalias hls &> /dev/null 
alias hls="fs -ls" 
# If you have LZO compression enabled in your Hadoop cluster and 
# compress job outputs with LZOP (not covered in this tutorial): 
# Conveniently inspect an LZOP compressed filem from the command 
# line; run via: 
# 
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo 
# 
# Requires installed 'lzop' command. 
#lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less} 
# Add Hadoop bin/ directory to PATH 
export PATH=$PATH:$HADOOP_HOME/bin
•The following picture gives an overview of the most important HDFS components.
Configuration 
•The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME environment variable to the Sun JDK/JRE 6 directory. 
Change 
to 
# The java implementation to use. Required. 
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun 
# The java implementation to use. Required. 
export JAVA_HOME=/usr/lib/java/ jdk1.7.0_25 
conf/hadoop-env.sh
•You can leave the settings below “as is” with the exception of the hadoop.tmp.dir parameter – this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don’t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point. 
•Now we create the directory and set the required ownerships and permissions: 
$ sudo mkdir -p /app/hadoop/tmp 
$ sudo chown hduser:hadoop /app/hadoop/tmp 
# ...and if you want to tighten up security, chmod from 755 to 750... 
$ sudo chmod 750 /app/hadoop/tmp
•If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section). 
•Add the following snippets between the <configuration> ... </configuration> tags in the respective configuration XML file. 
•In file conf/core-site.xml: conf/core-site.xml 
<property> 
<name>hadoop.tmp.dir</name> 
<value>/app/hadoop/tmp</value> 
<description>A base for other temporary directories.</description> 
</property> 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:54310</value> 
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> 
</property>
•In file conf/hdfs-site.xml: conf/hdfs-site.xml 
<property> 
<name>dfs.replication</name> 
<value>1</value> 
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> 
</property>
•In file conf/mapred-site.xml: conf/mapred-site.xml 
<property> 
<name>mapred.job.tracker</name> 
<value>localhost:54311</value> 
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> 
</property>
Formatting the HDFS filesystem via the NameNode 
•The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster. 
•Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)! 
•To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command 
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
•The output will look like this: 
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu/127.0.1.1STARTUP_MSG: args = [- format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1************************************************************/hduser@ubuntu:/usr/local/hadoop$
Starting your single-node cluster 
•Run the command: 
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh 
•This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine. 
•The output will look like this:
Single node hadoop cluster installation
•A nifty tool for checking whether the expected Hadoop processes are running is jps (part of Sun’s Java since v1.5.0 or more). 
hduser@ubuntu:/usr/local/hadoop$ jps 
•Stopping your single-node cluster 
Run the command 
hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh 
•to stop all the daemons running on your machine.
Hadoop Web Interfaces 
•Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at these locations: 
http://localhost:50070/ – web UI of the NameNode daemon 
http://localhost:50030/ – web UI of the JobTracker daemon 
http://localhost:50060/ – web UI of the TaskTracker daemon 
•These web interfaces provide concise information about what’s happening in your Hadoop cluster. You might want to give them a try. 
•Where 
o50070- namenode port number 
o50030-jobtracker port number 
o50060-tasktracker port number 
•Type the links In local browser to see the hadoop setup output
Single node hadoop cluster installation

More Related Content

PPT
Hadoop Installation
PPTX
Hadoop single cluster installation
PPTX
Hadoop installation
ODP
An example Hadoop Install
PPTX
Hadoop single node setup
PPT
Running hadoop on ubuntu linux
DOCX
Hadoop 2.2.0 Multi-node cluster Installation on Ubuntu
PDF
Install and Configure Ubuntu for Hadoop Installation for beginners
Hadoop Installation
Hadoop single cluster installation
Hadoop installation
An example Hadoop Install
Hadoop single node setup
Running hadoop on ubuntu linux
Hadoop 2.2.0 Multi-node cluster Installation on Ubuntu
Install and Configure Ubuntu for Hadoop Installation for beginners

What's hot (17)

DOCX
Run wordcount job (hadoop)
PDF
Set up Hadoop Cluster on Amazon EC2
PPTX
Hadoop Cluster - Basic OS Setup Insights
PPTX
HADOOP 실제 구성 사례, Multi-Node 구성
PPTX
Apache Hadoop & Hive installation with movie rating exercise
PPT
Hadoop on ec2
PDF
Hadoop completereference
PDF
Puppet: Eclipsecon ALM 2013
PDF
Configuration Surgery with Augeas
PPTX
How to create a secured multi tenancy for clustered ML with JupyterHub
PPTX
How to go the extra mile on monitoring
KEY
Making Your Capistrano Recipe Book
PPTX
Hadoop 2.4 installing on ubuntu 14.04
PPTX
How to create a multi tenancy for an interactive data analysis with jupyter h...
PDF
Out of the box replication in postgres 9.4
PDF
Out of the Box Replication in Postgres 9.4(pgconfsf)
DOCX
Ansible ex407 and EX 294
Run wordcount job (hadoop)
Set up Hadoop Cluster on Amazon EC2
Hadoop Cluster - Basic OS Setup Insights
HADOOP 실제 구성 사례, Multi-Node 구성
Apache Hadoop & Hive installation with movie rating exercise
Hadoop on ec2
Hadoop completereference
Puppet: Eclipsecon ALM 2013
Configuration Surgery with Augeas
How to create a secured multi tenancy for clustered ML with JupyterHub
How to go the extra mile on monitoring
Making Your Capistrano Recipe Book
Hadoop 2.4 installing on ubuntu 14.04
How to create a multi tenancy for an interactive data analysis with jupyter h...
Out of the box replication in postgres 9.4
Out of the Box Replication in Postgres 9.4(pgconfsf)
Ansible ex407 and EX 294
Ad

Viewers also liked (20)

PDF
Cluster management and automation with cloudera manager
PDF
Cloudera hadoop installation
PDF
Extending and Automating Cloudera Manager via API
PDF
AnalyzingMovieData and Business Intelligence
PDF
One Hadoop, Multiple Clouds - NYC Big Data Meetup
PDF
Unit testing Agile OpenSpace
PDF
Apache Accumulo and Cloudera
PDF
CDH5最新情報 #cwt2013
PPT
Recommendation Engine using Apache Mahout
PPTX
YARN High Availability
PDF
Introducing Cloudera Director at Big Data Bash
PDF
Hadoop Operations for Production Systems (Strata NYC)
PPTX
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
PPTX
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
PDF
Cloudera Manager 5 (hadoop運用) #cwt2013
PPTX
Five Tips for Running Cloudera on AWS
PPTX
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
PPT
HIVE: Data Warehousing & Analytics on Hadoop
PDF
Hive Quick Start Tutorial
PPT
Seminar Presentation Hadoop
Cluster management and automation with cloudera manager
Cloudera hadoop installation
Extending and Automating Cloudera Manager via API
AnalyzingMovieData and Business Intelligence
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Unit testing Agile OpenSpace
Apache Accumulo and Cloudera
CDH5最新情報 #cwt2013
Recommendation Engine using Apache Mahout
YARN High Availability
Introducing Cloudera Director at Big Data Bash
Hadoop Operations for Production Systems (Strata NYC)
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Cloudera Manager 5 (hadoop運用) #cwt2013
Five Tips for Running Cloudera on AWS
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
HIVE: Data Warehousing & Analytics on Hadoop
Hive Quick Start Tutorial
Seminar Presentation Hadoop
Ad

Similar to Single node hadoop cluster installation (20)

PDF
02 Hadoop deployment and configuration
PDF
R hive tutorial supplement 1 - Installing Hadoop
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
PDF
Hadoop single node installation on ubuntu 14
PDF
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
PDF
Setting up a HADOOP 2.2 cluster on CentOS 6
PPTX
Hadoop installation on windows
PPTX
Exp-3.pptx
DOC
Configure h base hadoop and hbase client
DOCX
Single node setup
DOCX
Hadoop installation
DOCX
Setup and run hadoop distrubution file system example 2.2
PPTX
Hadoop installation with an example
PPT
Big data with hadoop Setup on Ubuntu 12.04
PDF
Hadoop installation steps
PPTX
Configuring Your First Hadoop Cluster On EC2
PDF
Deploy hadoop cluster
PPTX
Session 03 - Hadoop Installation and Basic Commands
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
PPTX
Implementing Hadoop on a single cluster
02 Hadoop deployment and configuration
R hive tutorial supplement 1 - Installing Hadoop
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop single node installation on ubuntu 14
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Setting up a HADOOP 2.2 cluster on CentOS 6
Hadoop installation on windows
Exp-3.pptx
Configure h base hadoop and hbase client
Single node setup
Hadoop installation
Setup and run hadoop distrubution file system example 2.2
Hadoop installation with an example
Big data with hadoop Setup on Ubuntu 12.04
Hadoop installation steps
Configuring Your First Hadoop Cluster On EC2
Deploy hadoop cluster
Session 03 - Hadoop Installation and Basic Commands
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
Implementing Hadoop on a single cluster

Recently uploaded (20)

PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
Project quality management in manufacturing
PPTX
Welding lecture in detail for understanding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Foundation to blockchain - A guide to Blockchain Tech
DOCX
573137875-Attendance-Management-System-original
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
Well-logging-methods_new................
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Structs to JSON How Go Powers REST APIs.pdf
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Project quality management in manufacturing
Welding lecture in detail for understanding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Foundation to blockchain - A guide to Blockchain Tech
573137875-Attendance-Management-System-original
Operating System & Kernel Study Guide-1 - converted.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Model Code of Practice - Construction Work - 21102022 .pdf
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Lesson 3_Tessellation.pptx finite Mathematics
Well-logging-methods_new................
Arduino robotics embedded978-1-4302-3184-4.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

Single node hadoop cluster installation

  • 1. Single-Node Hadoop Cluster Installation Presented By: Mahantesh Angadi, Nagarjuna D. N., Manoj P. T. 2nd Sem Mtech-CNE (2014) Dept. of ISE, AIT Under The Guidance of: Manjunath T. N. Amogh P. K. Assistant Professor Dept. of ISE, AIT
  • 2. OUTLINE •Requirements for Java & Hadoop installation •Jdk Installation steps •Hadoop installation steps
  • 3. How to Install A Single-Node Hadoop Cluster •Assumptions oYou are running 32-bit windows oYour laptops has 4GB or more of RAM •Downloads oVMware Workstation-10 or more oUbuntu 10 or more oJava JDK 1.5 0r more(E.g. JDK 1.7) oHadoop 1.2.1 or more
  • 4. •Instructions to Install Hadoop 1. Install VMWare Workstation 2. Create a new Virtual machine 3. Point the installer disc image to the ISO file (E.g. Ubuntu 10) that you are downloaded 4. Give the User name & Password (E.g. hduser for both) 5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Host machine) 6. Customize hardware a. Memory: 2GB RAM (more is better, but you want to leave some for your Host(Windows) machine) b. Processors: 2(more is better, but you wanted to leave some for your Host(Windows) machine)
  • 5. 7. Launch your Virtual machine (all the instructions after this step will be performed in Ubuntu) 8. Login to User (E.g. hduser) 9. Open a terminal window with Ctrl + Alt + T (you will use this shortcut a lot) • Type following commands in the terminal to download recent linux packages(needs internet connections)
  • 6. $ sudo apt-get update
  • 7. JDK Installation Steps $ sudo apt-get install openssh-server(recommends while connecting to localhost)
  • 8. 10. Install Java JDK 7 a. Download the java JDK (http://www.wikihow.com/Install-Oracle-Java-JDK-on-Ubuntu-Linux) b. Unzip the file $ tar –xvf jdk-7u25-linux-i586.tar.gz (or) tar xzf jdk-7u25- linux-i586.tar.gz
  • 9. •Now move the JDK 7 directory to /usr/lib/java (you suppose to create java folder in lib (your choice of location) directory) $ sudo mkdir –p /usr/lib/java •Now move from Download/Desktop folder to Java folder using terminal
  • 10. •$ sudo cp -r jdk1.7.0_25 /usr/lib/java/
  • 11. c. Do the following steps Edit the system PATH file /etc/profile and add the following system variables to your system path. Use nano, gedit or any other text editor, as root, open up /etc/profile. •Type/Copy/Paste: $ sudo gedit /etc/profile or •Type/Copy/Paste: $ sudo nano /etc/profile
  • 12. •Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file: Type/Copy/Paste: JAVA_HOME=/usr/lib/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH
  • 14. •Change JDK to the version you are going to be installed Save(CTRL+X & Y & ENTER for nano) the /etc/profile file and exit.
  • 15. d. Now run •$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_25/bin/java" 1 oThis command notifies the system that Oracle Java JRE is available for use • $ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_25/bin/javac" 1 oThis command notifies the system that Oracle Java JDK is available for use •$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_25/bin/javaws" 1 oThis command notifies the system that Oracle Java Web start is available for use
  • 16. Your Ubuntu Linux system that Oracle Java JDK/JRE must be the default Java. •Type/Copy/Paste: $ sudo update-alternatives --set java /usr/lib/java/jdk1.7.0_25/bin/java othis command will set the java runtime environment for the system •Type/Copy/Paste: $ sudo update-alternatives --set javac /usr/lib/java/jdk1.7.0_25/bin/javac othis command will set the javac compiler for the system •Type/Copy/Paste:$ sudo update-alternatives --set javaws /usr/lib/java/jdk1.7.0_25/bin/javaws othis command will set Java Web start for the system
  • 17. •A successful installation of 32-bit Oracle Java will display: Type/Copy/Paste: $ java -version oThis command displays the version of java running on your system You should receive a message which displays: Java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.5_25-b18) Java HotSpot(TM) Server VM (build 24.25-b08, mixed mode) Type/Copy/Paste: $ javac -version oThis command lets you know that you are now able to compile Java programs from the terminal. You should receive a message which displays: javac 1.7.0_25
  • 18. •Successful Java installation displays  “Congratulations you are successfully installed Java JDK”
  • 19. Hadoop Installation Steps Prerequisites •Configure JDK: oSun Java JDK is compulsory to run hadoop, therefore all the nodes in hadoop cluster should have JDK configured. Ex:-jdk 1.5 & above ( preference:- jdk-7u25-linux-i586.tar.gz) •Download hadoop package: Ex:- hadoop-1.2.1-bin.tar.gz •NOTE: In a multi-node hadoop cluster, the master node uses Secure Shell (SSH) commands to manipulate the remote nodes. This requires all the nodes must have the same version of JDK and hadoop core. If the versions among nodes are different, errors will occur when you start the cluster.
  • 20. Adding a dedicated Hadoop system user •We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc). oThis will add the user hduser and the group hadoop to your local machine. $su - hduser oThis will change to hduser $ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser
  • 21. Configuring SSH •Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short hadoop installation tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section. •we assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. •First, we have to generate an SSH key for the hduser user. hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
  • 22. hduser@ubuntu:~$ ssh-keygen -t rsa -P "" Private Key Public Key
  • 23. •Second, you have to enable SSH access to your local machine with this newly created key. hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • 24. •The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. •If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information). •hduser@ubuntu:~$ ssh localhost Are you sure you want to continue connecting (yes/no)? yes
  • 25. •If the SSH connect should fail, these general tips will help:- •Enable debugging with ssh -vvv localhost and investigate the error in detail. •Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload. •Successful connection to localhost diplays:
  • 26. Disabling IPv6 •One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of our Ubuntu box. In our case, we realized that there’s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, we simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. •To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file: # disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1 /etc/sysctl.conf
  • 27. •You have to reboot your machine in order to make the changes take effect. •You can check whether IPv6 is enabled on your machine with the following command: •A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want). Alternative •You can also disable IPv6 only for Hadoop as documented in HADOOP. You can do so by adding the following line to : $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true conf/hadoop-env.sh
  • 28. Hadoop Installation •Download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. we picked /usr/local/hadoop. Update $HOME/.bashrc •Add the following lines to the end of the $HOME/.bashrc file of user hduser. If you use a shell other than bash, you should of course update its appropriate configuration files instead of .bashrc. $ cd /usr/local$ sudo tar xzf hadoop-1.0.3.tar.gz $ sudo mv hadoop-1.0.3 hadoop $ sudo chown -R hduser:hadoop hadoop-1.2.1
  • 29. Copy n paste it in $HOME/.bashrc and edit to your requirements # Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop (edit here) # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/java-6-sun(edit here) # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs“ unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed filem from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # # Requires installed 'lzop' command. #lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less} # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
  • 30. •The following picture gives an overview of the most important HDFS components.
  • 31. Configuration •The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME environment variable to the Sun JDK/JRE 6 directory. Change to # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # The java implementation to use. Required. export JAVA_HOME=/usr/lib/java/ jdk1.7.0_25 conf/hadoop-env.sh
  • 32. •You can leave the settings below “as is” with the exception of the hadoop.tmp.dir parameter – this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don’t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point. •Now we create the directory and set the required ownerships and permissions: $ sudo mkdir -p /app/hadoop/tmp $ sudo chown hduser:hadoop /app/hadoop/tmp # ...and if you want to tighten up security, chmod from 755 to 750... $ sudo chmod 750 /app/hadoop/tmp
  • 33. •If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section). •Add the following snippets between the <configuration> ... </configuration> tags in the respective configuration XML file. •In file conf/core-site.xml: conf/core-site.xml <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
  • 34. •In file conf/hdfs-site.xml: conf/hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
  • 35. •In file conf/mapred-site.xml: conf/mapred-site.xml <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
  • 36. Formatting the HDFS filesystem via the NameNode •The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster. •Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)! •To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
  • 37. •The output will look like this: hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu/127.0.1.1STARTUP_MSG: args = [- format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1************************************************************/hduser@ubuntu:/usr/local/hadoop$
  • 38. Starting your single-node cluster •Run the command: hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh •This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine. •The output will look like this:
  • 40. •A nifty tool for checking whether the expected Hadoop processes are running is jps (part of Sun’s Java since v1.5.0 or more). hduser@ubuntu:/usr/local/hadoop$ jps •Stopping your single-node cluster Run the command hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh •to stop all the daemons running on your machine.
  • 41. Hadoop Web Interfaces •Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at these locations: http://localhost:50070/ – web UI of the NameNode daemon http://localhost:50030/ – web UI of the JobTracker daemon http://localhost:50060/ – web UI of the TaskTracker daemon •These web interfaces provide concise information about what’s happening in your Hadoop cluster. You might want to give them a try. •Where o50070- namenode port number o50030-jobtracker port number o50060-tasktracker port number •Type the links In local browser to see the hadoop setup output