SlideShare a Scribd company logo
6
Most read
9
Most read
12
Most read
HADOOP PRESENTATION
Installing and Command uses
Hadoop
 Hadoop is a framework for
running applications on large
clusters built of commodity
hardware.
 The Hadoop framework
transparently provides
applications both reliability and
data motion.
 Hadoop implements a computational
paradigm named Map/Reduce, where the
application is divided into many small
fragments of work, each of which may be
executed or reexecuted on any node in the
cluster.
 it provides a distributed file system (HDFS)
that stores data on the compute nodes,
providing very high aggregate bandwidth
across the cluster. Both Map/Reduce and
the distributed file system are designed so
that node failures are automatically handled
by the framework
HDFS(Hadoop’s Distributed File
System)
 Hadoop's Distributed File System is
designed to reliably store very large files
across machines in a large cluster.
 It is inspired by the Google File System.
Hadoop DFS stores each file as a
sequence of blocks, all blocks in a file
except the last block are the same size.
Hadoop Installation presentation
Map Reduce
A MapReduce job
usually splits the input
data-set into
independent chunks
which are processed
by the map tasks in a
completely parallel
manner.
The framework sorts
the outputs of the
maps, which are then
input to the reduce
tasks. Typically both
the input and the
output of the job are
stored in a file-system.
Hadoop Requirement’s
 Download Hadoop 2.8.0 (Link: http://www-
eu.apache.org/dist/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz)
 Java JDK 1.8.0.zip
(Link: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-
downloads-2133151.html)
Installation step’s
 Check either Java 1.8.0 is already installed on your system or not, use "Javac -
version" to check.
 If Java is not installed on your system then first install java under "C:JAVA"
 Extract file Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place
under "C:Hadoop-2.8.0".
 Set the path HADOOP_HOME Environment variable on windows (Variable Name
: HADOOP_HOME and Variable Value : C:Hadoop-2.8.0bin) click ok.
 Set the path JAVA_HOME Environment variable on windows (Variable Name :
JAVA_HOME and Variable Value : C:javabin) click ok.
Configuration
 Edit file C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml paragraph and save
this file.
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property>
</configuration>
 Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this file C:/Hadoop-
2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file.
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
</configuration>
 Create folder "data" under "C:Hadoop-2.8.0"
 Create folder "datanode" under "C:Hadoop-2.8.0data"
 Create folder "namenode" under "C:Hadoop-2.8.0data"
 Edit file C:Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml
paragraph and save this file.
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property>
<name>dfs.namenode.name.dir</name> <value>C:hadoop-2.8.0datanamenode</value> </property> <property>
<name>dfs.datanode.data.dir</name> <value>C:hadoop-2.8.0datadatanode</value> </property> </configuration>
 Edit file C:/Hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml
paragraph and save this file.
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
</property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
 Edit file C:/Hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the
command line"JAVA_HOME=%JAVA_HOME%" instead of
set "JAVA_HOME=C:Java" (On C:java this is path to file jdk.18.0)
Testing
 Open cmd and change directory to
"C:Hadoop-2.8.0sbin" and
type "start-all.cmd" to start
apache.
 Make sure these apps are running:-
 Hadoop Namenode
 Hadoop datanode
 YARN Resourc Manager
 YARN Node Manager
 Open: http://localhost:8088
 Open: http://localhost:50070
Run wordcount Using MapReduce
Example
 Download MapReduceClient.jar
(Link: https://github.com/MuhammadBilalYa
r/HADOOP-INSTALLATION-ON-WINDOW-
10/blob/master/MapReduceClient.jar)
 Download Input_file.txt
(Link: https://github.com/MuhammadBilalYa
r/HADOOP-INSTALLATION-ON-WINDOW-
10/blob/master/input_file.txt)
 Open cmd in Administrative mode and
move to "C:/Hadoop-2.8.0/sbin" and start
cluster.
 Start-all.cmd
 Create an input directory in HDFS.
 hadoop fs -mkdir /input_dir
 Copy the input text file named input_file.txt in the input
directory (input_dir)of HDFS.
 hadoop fs -put C:/input_file.txt /input_dir
 Verify input_file.txt available in HDFS input directory
(input_dir).
 hadoop fs -ls /input_dir/
 Verify content of the copied file.
 hadoop dfs -cat /input_dir/input_file.txt
 Run MapReduceClient.jar and also provide input and out
directories.
 hadoop jar C:/MapReduceClient.jar wordcount
/input_dir /output_dir
 Verify content for generated output file.
File System Command’s
 Starting HDFS
 Initially you have to format the configured HDFS file system, open namenode (HDFS server), and
execute the following command.
 hadoop namenode -format
 After formatting the HDFS, start the distributed file system. The following command will start the
namenode as well as the data nodes as cluster.
 start-dfs.sh
 Listing Files in HDFS
 bin/hadoop fs -ls <args>
 Inserting Data into HDFS
 /bin/hadoop fs -mkdir /user/input (You have to create an input directory.)
 /bin/hadoop fs -put /home/file.txt /user/input (Transfer and store a data file from local systems to
the HFS)
 /bin/hadoop fs -ls /user/input (You can verify the file using this command.)
 Retrieving Data from HDFS
 /bin/hadoop fs -cat /user/output/outfile (view the data from HDFS using cat command.)
 /bin/hadoop fs -get /user/output/ /home/hadoop_tp/ (Get the file from HDFS to the local file
system using get command.)
 stop-dfs.sh (Shutting Down the HDFS)
Hadoop Installation presentation
Hadoop Installation presentation

More Related Content

PPTX
Hadoop installation on windows
PPT
HadoooIO.ppt
PPTX
HADOOP TECHNOLOGY ppt
PPTX
directory structure and file system mounting
PDF
High–Performance Computing
PPTX
Map Reduce
PPTX
Hadoop installation with an example
PPTX
Hadoop And Their Ecosystem ppt
Hadoop installation on windows
HadoooIO.ppt
HADOOP TECHNOLOGY ppt
directory structure and file system mounting
High–Performance Computing
Map Reduce
Hadoop installation with an example
Hadoop And Their Ecosystem ppt

What's hot (20)

DOCX
Hadoop Seminar Report
PDF
Dbms 3: 3 Schema Architecture
PPTX
Big data and Hadoop
PPTX
Linker and Loader
PPTX
Big Data Analytics with Hadoop
PPT
Map reduce in BIG DATA
PPT
Seminar Presentation Hadoop
PDF
SQOOP PPT
PPT
Instance Based Learning in Machine Learning
PDF
The CAP Theorem
PPTX
Hadoop – Architecture.pptx
PPTX
Big data Analytics Hadoop
PPTX
Introduction to Hadoop
PPTX
Introduction to Hadoop and Hadoop component
PPT
Map Reduce
PDF
Using pySpark with Google Colab & Spark 3.0 preview
KEY
Testing Hadoop jobs with MRUnit
PPT
Goal stack planning.ppt
PPTX
Big Data and Hadoop
PPTX
introduction to NOSQL Database
Hadoop Seminar Report
Dbms 3: 3 Schema Architecture
Big data and Hadoop
Linker and Loader
Big Data Analytics with Hadoop
Map reduce in BIG DATA
Seminar Presentation Hadoop
SQOOP PPT
Instance Based Learning in Machine Learning
The CAP Theorem
Hadoop – Architecture.pptx
Big data Analytics Hadoop
Introduction to Hadoop
Introduction to Hadoop and Hadoop component
Map Reduce
Using pySpark with Google Colab & Spark 3.0 preview
Testing Hadoop jobs with MRUnit
Goal stack planning.ppt
Big Data and Hadoop
introduction to NOSQL Database
Ad

Similar to Hadoop Installation presentation (20)

PDF
LAB PROGRAM-9zxcvbnmzxcvbnzxcvbnxcvbn.pdf
 
PDF
BIGDATA ANALYTICS LAB MANUAL final.pdf
PPTX
Top 10 Hadoop Shell Commands
ODT
ACADGILD:: HADOOP LESSON
DOC
Configure h base hadoop and hbase client
PPTX
Bd class 2 complete
PPTX
Lec 2 & 3 _Unit 1_Hadoop _MapReduce1.pptx
PPTX
Unit 5
PDF
HDFS_Command_Reference
PDF
Big data using Hadoop, Hive, Sqoop with Installation
PPTX
MapReduce1.pptx
PDF
Design and Research of Hadoop Distributed Cluster Based on Raspberry
PPT
Hadoop file
PDF
Hadoop Architecture and HDFS
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ODP
Hadoop2.2
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
PDF
Introduction to hadoop administration jk
PDF
Configuring and manipulating HDFS files
LAB PROGRAM-9zxcvbnmzxcvbnzxcvbnxcvbn.pdf
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
Top 10 Hadoop Shell Commands
ACADGILD:: HADOOP LESSON
Configure h base hadoop and hbase client
Bd class 2 complete
Lec 2 & 3 _Unit 1_Hadoop _MapReduce1.pptx
Unit 5
HDFS_Command_Reference
Big data using Hadoop, Hive, Sqoop with Installation
MapReduce1.pptx
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Hadoop file
Hadoop Architecture and HDFS
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
Hadoop2.2
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Introduction to hadoop administration jk
Configuring and manipulating HDFS files
Ad

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Logistic Regression ml machine learning.pptx
Foundation of Data Science unit number two notes
Acceptance and paychological effects of mandatory extra coach I classes.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Lecture1 pattern recognition............
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
climate analysis of Dhaka ,Banglades.pptx
IB Computer Science - Internal Assessment.pptx
Business Acumen Training GuidePresentation.pptx
Database Infoormation System (DBIS).pptx
Fluorescence-microscope_Botany_detailed content
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
oil_refinery_comprehensive_20250804084928 (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Supervised vs unsupervised machine learning algorithms
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Knowledge Engineering Part 1
Logistic Regression ml machine learning.pptx

Hadoop Installation presentation

  • 2. Hadoop  Hadoop is a framework for running applications on large clusters built of commodity hardware.  The Hadoop framework transparently provides applications both reliability and data motion.  Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster.  it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both Map/Reduce and the distributed file system are designed so that node failures are automatically handled by the framework
  • 3. HDFS(Hadoop’s Distributed File System)  Hadoop's Distributed File System is designed to reliably store very large files across machines in a large cluster.  It is inspired by the Google File System. Hadoop DFS stores each file as a sequence of blocks, all blocks in a file except the last block are the same size.
  • 5. Map Reduce A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system.
  • 6. Hadoop Requirement’s  Download Hadoop 2.8.0 (Link: http://www- eu.apache.org/dist/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz)  Java JDK 1.8.0.zip (Link: http://www.oracle.com/technetwork/java/javase/downloads/jdk8- downloads-2133151.html)
  • 7. Installation step’s  Check either Java 1.8.0 is already installed on your system or not, use "Javac - version" to check.  If Java is not installed on your system then first install java under "C:JAVA"  Extract file Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and place under "C:Hadoop-2.8.0".  Set the path HADOOP_HOME Environment variable on windows (Variable Name : HADOOP_HOME and Variable Value : C:Hadoop-2.8.0bin) click ok.  Set the path JAVA_HOME Environment variable on windows (Variable Name : JAVA_HOME and Variable Value : C:javabin) click ok.
  • 8. Configuration  Edit file C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file. <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>  Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this file C:/Hadoop- 2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file. <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>  Create folder "data" under "C:Hadoop-2.8.0"  Create folder "datanode" under "C:Hadoop-2.8.0data"  Create folder "namenode" under "C:Hadoop-2.8.0data"
  • 9.  Edit file C:Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file. <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>C:hadoop-2.8.0datanamenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>C:hadoop-2.8.0datadatanode</value> </property> </configuration>  Edit file C:/Hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file. <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>  Edit file C:/Hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the command line"JAVA_HOME=%JAVA_HOME%" instead of set "JAVA_HOME=C:Java" (On C:java this is path to file jdk.18.0)
  • 10. Testing  Open cmd and change directory to "C:Hadoop-2.8.0sbin" and type "start-all.cmd" to start apache.  Make sure these apps are running:-  Hadoop Namenode  Hadoop datanode  YARN Resourc Manager  YARN Node Manager  Open: http://localhost:8088  Open: http://localhost:50070
  • 11. Run wordcount Using MapReduce Example  Download MapReduceClient.jar (Link: https://github.com/MuhammadBilalYa r/HADOOP-INSTALLATION-ON-WINDOW- 10/blob/master/MapReduceClient.jar)  Download Input_file.txt (Link: https://github.com/MuhammadBilalYa r/HADOOP-INSTALLATION-ON-WINDOW- 10/blob/master/input_file.txt)  Open cmd in Administrative mode and move to "C:/Hadoop-2.8.0/sbin" and start cluster.  Start-all.cmd  Create an input directory in HDFS.  hadoop fs -mkdir /input_dir  Copy the input text file named input_file.txt in the input directory (input_dir)of HDFS.  hadoop fs -put C:/input_file.txt /input_dir  Verify input_file.txt available in HDFS input directory (input_dir).  hadoop fs -ls /input_dir/  Verify content of the copied file.  hadoop dfs -cat /input_dir/input_file.txt  Run MapReduceClient.jar and also provide input and out directories.  hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir  Verify content for generated output file.
  • 12. File System Command’s  Starting HDFS  Initially you have to format the configured HDFS file system, open namenode (HDFS server), and execute the following command.  hadoop namenode -format  After formatting the HDFS, start the distributed file system. The following command will start the namenode as well as the data nodes as cluster.  start-dfs.sh  Listing Files in HDFS  bin/hadoop fs -ls <args>
  • 13.  Inserting Data into HDFS  /bin/hadoop fs -mkdir /user/input (You have to create an input directory.)  /bin/hadoop fs -put /home/file.txt /user/input (Transfer and store a data file from local systems to the HFS)  /bin/hadoop fs -ls /user/input (You can verify the file using this command.)  Retrieving Data from HDFS  /bin/hadoop fs -cat /user/output/outfile (view the data from HDFS using cat command.)  /bin/hadoop fs -get /user/output/ /home/hadoop_tp/ (Get the file from HDFS to the local file system using get command.)  stop-dfs.sh (Shutting Down the HDFS)