Hadoop installation on windows

HADOOP INSTALLATION
TO INSTALL HADOOP ON WINDOWS WE CAN USE VARIOUS
METHODS
ONE METHOD IS:
1.INSTALL VIRTUAL BOX ON YOUR SYSTEM :
https://www.virtualbox.org/wiki/Downloads
2.Download latest version of ubuntu 16.04:
https://www.ubuntu.com/download/desktop
3.Open the virtual box and install the ubuntu 16.04 software .
4.Check the internet connection while installing the ubuntu.It will
automatically download the related softwares in it.

Creating user
 Creating a User
 it is recommended to create a separate user for Hadoop to isolate Hadoop file system from Unix file system.
 open the root using the command “su”.
 Create a user from the root account using the command “useradd username”.
 Now you can open an existing user account using the command “su username”.
 $ su password:
 # useradd hadoop
 # passwd hadoop
 New passwd:
 Retype new passwd

Changing the password of su
 If su is giving error means not giving permission you can change
the password
 $sudo - i
 Enter the password:
 $sudo passwd
 $enter the unix password:
 $re enter the unix password:
 $exit

SSH Setup and Key Generation
 SSH setup is required to do different operations on a cluster such
as starting, stopping, distributed daemon shell operations.
 To authenticate different users of Hadoop, it is required to provide
public/private key pair for a Hadoop user and share it with different
users.
 The following commands are used for generating a key value pair
using SSH. Copy the public keys form id_rsa.pub to
authorized_keys, and provide the owner with read and write
permissions to authorized_keys file respectively.
 ssh-keygen -t rsa
 $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
 $ chmod 0600 ~/.ssh/authorized_keys

Error correcting ssh
 Remove ssh server
 $sudo apt-get remove ssh and
 Again add or install the ssh
 $sudo apt-get install ssh

Download javajdk
 Open terminal and check the java jdk softwares is install or not .
 If not follow the commands
 sudo apt-get update
 sudo add-apt-repository ppa:webupd8team/java
 sudo update-java-alternatives -s java-9-sun
 sudo apt-get install openjdk-7-jdk
 Check java version
 $ Java –version
 To find the default Java path
 readlink -f /usr/bin/java | sed "s:bin/java::“
 Output
 /usr/lib/jvm/java-8-openjdk-amd64/jre/

Java installation
 cd Downloads/
 $ ls
 jdk-7u71-linux-x64.gz
 $ tar zxf jdk-7u71-linux-x64.gz
 $ ls
 jdk1.7.0_71 jdk-7u71-linux-x64.gz
 For setting up PATH and JAVA_HOME variables, add the following
commands to ~/.bashrc file.
 export JAVA_HOME=/usr/local/jdk1.7.0_71
 export PATH=$PATH:$JAVA_HOME/bin
 Now apply all the changes into the current running system.
 $ source ~/.bashrc

 To make java available to all the users, you have to move it to the
location “/usr/local/”. Open root, and type the following commands.
 $ su password:
 # mv jdk1.7.0_71 /usr/local/
 # exit

Download hadoop
 GNU/Linux is supported as a development and production platform.
 Hadoop has been demonstrated on GNU/Linux clusters with 2000
nodes.
 ssh must be installed and sshd must be running to use the Hadoop
scripts that manage remote Hadoop daemon.
 Download the hadoop by following the command :
 wget https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-
2.7.3/hadoop-2.7.3.tar.gz
 You can download latest version by replacing 2.9.0 instead of 2.7.3

Hadoop Download
 $ su password:
 # cd /usr/local
 # wget http://apache.claz.org/hadoop/common/hadoop-2.4.1/ hadoop-
2.4.1.tar.gz
 # tar xzf hadoop-2.4.1.tar.gz
 $mkdir hadoop
 Sudo chmod –R0777 /usr/local/hadoop
 # mv hadoop-2.4.1/* to hadoop/
 # exit
 Open hadoop-env.sh
 Set the java home path

Hadoop Operation Modes
• Single java process
Local/standalone
mode
• It is a distributed simulation on single machine
• Each Hadoop daemon such as hdfs, yarn,
MapReduce etc., will run as a separate java
process.
Pseudo distributed
mode
• This mode is fully distributed with minimum two or
more machines as a clusterFully Distributed
Mode

Setting hadoop
 You can set Hadoop environment variables by appending the following
commands to ~/.bashrc file.
 export HADOOP_HOME=/usr/local/hadoop
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export
HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
 export HADOOP_INSTALL=$HADOOP_HOME
 Now apply all the changes into the current running system.
 $ source ~/.bashrc

Hadoop Configuration
 You can find all the hadoop configuration in $ cd
$HADOOP_HOME/etc/hadoop
 If hadoop folder is not present then create the folder
 $Mkdir hadoop
 core-site.xml
The core-site.xml file contains information such as the port number
used for Hadoop instance, memory allocated for the file system,
memory limit for storing the data, and size of Read/Write buffers.
Open the core-site.xml and add the following properties in between
<configuration>, </configuration> tags.
<configuration> <property> <name>fs.default.name</name>
<value>hdfs://localhost:9000</value> </property> </configuration>

hdfs-site.xml
 The hdfs-site.xml file contains information such as the value of
replication data, namenode path, and datanode paths of your local
file systems. It means the place where you want to store the
Hadoop infrastructure.
 Open this file and add the following properties in between the
<configuration> </configuration> tags in this file.
 <configuration> <property> <name>dfs.replication</name>
<value>1</value> </property> <property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/namenode </value>
</property> <property> <name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopinfra/hdfs/datanode </value>
</property> </configuration>

ERROR MAY OCCUR WHEN RUNNING
HDFS
• The error will occur due to the
configuartion.<value>file://home/hadoop/hadoopinfra/hdfs/nameno
de </value> <value>file://home/hadoop/hadoopinfra/hdfs/datanode
</value>
 Above text having the incorrect configuration it may occur the
authority exception
 The correct configuration is :
 <value>file:/home/hadoop/hadoopinfra/hdfs/namenode </value>
<value>file:/home/hadoop/hadoopinfra/hdfs/datanode </value>

 yarn-site.xml
 This file is used to configure yarn into Hadoop. Open the yarn-
site.xml file and add the following properties in between the
<configuration>, </configuration> tags in this file.
 <configuration> <property> <name>yarn.nodemanager.aux-
services</name> <value>mapreduce_shuffle</value> </property>
</configuration>

Mapred.xml
 This file is used to specify which MapReduce framework we are
using. By default, Hadoop contains a template of yarn-site.xml.
First of all, it is required to copy the file from mapred-
site.xml.template to mapred-site.xml file using the following
command.
 $ cp mapred-site.xml.template mapred-site.xml Open mapred-
site.xml file and add the following properties in between the
<configuration>, </configuration>tags in this file.
 <configuration> <property>
<name>mapreduce.framework.name</name> <value>yarn</value>
</property> </configuration>

Verifying Hadoop Installation
Name node
 Name Node Setup
 Set up the namenode using the command “hdfs namenode -format” as follows.
 $ cd ~ $ hdfs namenode -format The expected result is as follows.
 10/24/14 21:30:55 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************ STARTUP_MSG: Starting
NameNode STARTUP_MSG: host = localhost/192.168.1.11 STARTUP_MSG:
args = [-format] STARTUP_MSG: version = 2.4.1 ... ... 10/24/14 21:30:56 INFO
common.Storage: Storage directory /home/hadoop/hadoopinfra/hdfs/namenode
has been successfully formatted. 10/24/14 21:30:56 INFO
namenode.NNStorageRetentionManager: Going to retain 1 images with txid >=
0 10/24/14 21:30:56 INFO util.ExitUtil: Exiting with status 0 10/24/14 21:30:56
INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************ SHUTDOWN_MSG:
Shutting down NameNode at localhost/192.168.1.11
************************************************************/

Verifying Hadoop dfs
 The following command is used to start dfs. Executing this
command will start your Hadoop file system.
 $ start-dfs.sh The expected output is as follows:
 10/24/14 21:37:56 Starting namenodes on [localhost] localhost:
starting namenode, logging to /home/hadoop/hadoop
2.4.1/logs/hadoop-hadoop-namenode-localhost.out localhost:
starting datanode, logging to /home/hadoop/hadoop
2.4.1/logs/hadoop-hadoop-datanode-localhost.out Starting
secondary namenodes [0.0.0.0]

Verifying Yarn Script
 The following command is used to start the yarn script. Executing
this command will start your yarn daemons.
 $ start-yarn.sh The expected output as follows:
 starting yarn daemons starting resourcemanager, logging to
/home/hadoop/hadoop 2.4.1/logs/yarn-hadoop-resourcemanager-
localhost.out localhost: starting nodemanager, logging to
/home/hadoop/hadoop 2.4.1/logs/yarn-hadoop-nodemanager-
localhost.out

Accessing Hadoop on Browser
 The default port number to access Hadoop is 50070. Use the
following url to get Hadoop services on browser.
verify All Applications for Cluster
 The default port number to access all applications of cluster is
8088. Use the following url to visit this service.
 http://localhost:8088/

Hadoop installation on windows

More Related Content

What's hot (20)

Similar to Hadoop installation on windows (20)

Recently uploaded (20)

Hadoop installation on windows