SlideShare a Scribd company logo
Hadoop HDFS/MapReduce
Architecture
Hardware
Installation and Configuration
Monitoring
Namenode
HDFS Architecture
Replication
Map Reduce
Hardware Requirements
● NameNode + JobTracker
– >= 2 cores
– >= 8 gigs ram
– >= 40gig disk RAID 10
● DataNode + TaskTracker
– >= 4 cores
– >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM
– >= N Gig disk space JBOD (no raid)
Installation
● Download tar file from hadoop or use a prebuilt
rpm
● https://github.com/gerritjvv/repo
● http://bigtop.apache.org/
Configuration
● $HADOOP_HOME/conf/core-site.xml
● $HADOOP_HOME/conf/mapred-site.xml
● $HADOOP_HOME/conf/hdfs-site.xml
● http://hadoop.apache.org/docs/stable/cluster_setup
●
Configuration Namenode
● Create directory for namenode metadata
– /data/hadoop/name
● Open core-site.xml
– Define fs.default.name = http://<host>:8020
● Open hdfs-site.xml
– Define dfs.name.dir=/data/hadoop/name
– Define dfs.replication=3
– Create dir /data/hadoop/hdfs
– Define dfs.data.dir=/data/hadoop/hdfs
– Defin dfs.http.address=localhost:50070
● Start the namenode with the format option
– /opt/hadoop/bin/hadoop namenode -format
– After the format start the namenode with service hadoop-namenode start
Configuration JobTracker
● Open /opt/hadoop/conf/mapred-site.xml
– Define the property
mapred.job.tracker=<host>:8021
– Create the directory /data/hadoop/mapred
– Define mapred.local.dir=/data/hadoop/mapred
● Start the JobTracker with service hadoop-
jobtracker start
Configuration DataNode
● On each datanode create the directory
/data/hadoop/hdfs (one directory per disk)
● Open /opt/hadoop/conf/hdfs-site.xml
– Define dfs.http.address=<host>:50070
– Define dfs.data.dir=/data/hadoop/hdfs
● Start the datanodes with service hadoop-
datanode start
Configuration Mapreduce
● On each datanode create the directory /data/hadoop/mapred
● Open /opt/hadoop/conf/mapred-site.xml
– Define mapred.local.dir=/data/hadoop/mapred
– Define mapred.tasktracker.map.tasks.maximum=<Number of map
tasks>
– Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce
tasks>
● Start the TaskTrackers with service hadoop-tasktracker start
Monitoring
● Web Html scraping
– https://github.com/gerritjvv/hadoop-monitoring
● Glanglia
– http://ganglia.info/?p=88
● Cacti
– http://blog.cloudera.com/blog/2009/07/hadoop-graphing
Namenode Edits
● Writes/Updates/Deletes are written to RAM and
to a write ahead log.
● The metadata in RAM is only merged into a
binary file during the secondary namenode
checkpoint
● This file corrupts easily
● Recovery is a manual task
HA
● Yarn and Hadoop 2.0.0
● Experimental
● http://hadoop.apache.org/docs/current/hadoop-yarn
End

More Related Content

PPTX
Hadoop installation
ODP
An example Hadoop Install
PPTX
Hadoop single node setup
PPTX
Hadoop installation on windows
PDF
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
PDF
Hadoop spark performance comparison
PPTX
Hadoop 20111215
PPTX
Hadoop 20111117
Hadoop installation
An example Hadoop Install
Hadoop single node setup
Hadoop installation on windows
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Hadoop spark performance comparison
Hadoop 20111215
Hadoop 20111117

What's hot (20)

DOCX
Hadoop installation
PPTX
Hadoop Interacting with HDFS
PDF
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
PDF
Postgres 12 Cluster Database operations.
PDF
Comparison of-foss-distributed-storage
PDF
Comparison of foss distributed storage
PDF
GTC Japan 2014
PDF
LizardFS-WhitePaper-Eng-v3.9.2-web
PPTX
Hadoop 2.4 installing on ubuntu 14.04
ODP
PostgreSQL Administration for System Administrators
PPTX
Automating Disaster Recovery PostgreSQL
PDF
TP2 Big Data HBase
PDF
Archlinux install
PDF
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
PDF
20141111 파이썬으로 Hadoop MR프로그래밍
PPTX
Ceph Day KL - Bluestore
PDF
はじめてのGlusterFS
PPTX
Bluestore
PPTX
Introduction to HDFS and MapReduce
ODP
Web scraping with nutch solr
Hadoop installation
Hadoop Interacting with HDFS
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Postgres 12 Cluster Database operations.
Comparison of-foss-distributed-storage
Comparison of foss distributed storage
GTC Japan 2014
LizardFS-WhitePaper-Eng-v3.9.2-web
Hadoop 2.4 installing on ubuntu 14.04
PostgreSQL Administration for System Administrators
Automating Disaster Recovery PostgreSQL
TP2 Big Data HBase
Archlinux install
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
20141111 파이썬으로 Hadoop MR프로그래밍
Ceph Day KL - Bluestore
はじめてのGlusterFS
Bluestore
Introduction to HDFS and MapReduce
Web scraping with nutch solr
Ad

Viewers also liked (20)

PPTX
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
PPT
Hadoop MapReduce Fundamentals
PPT
HIVE: Data Warehousing & Analytics on Hadoop
PDF
Hadoop Overview & Architecture
 
PPTX
MapReduce DesignPatterns
PDF
Hadoop map reduce concepts
PDF
Hadoop M/R Pig Hive
PPTX
Hadoop HDFS Concepts
PPTX
PPTX
SQL On Hadoop
PDF
PPTX
Introduction to Pig | Pig Architecture | Pig Fundamentals
PDF
Hive Functions Cheat Sheet
PPTX
MapReduce Design Patterns
PDF
Analytical Queries with Hive: SQL Windowing and Table Functions
PDF
Optimizing Hive Queries
PDF
Map reduce: beyond word count
PDF
Hadoop Administration pdf
PPTX
Hive: Loading Data
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Hadoop MapReduce Fundamentals
HIVE: Data Warehousing & Analytics on Hadoop
Hadoop Overview & Architecture
 
MapReduce DesignPatterns
Hadoop map reduce concepts
Hadoop M/R Pig Hive
Hadoop HDFS Concepts
SQL On Hadoop
Introduction to Pig | Pig Architecture | Pig Fundamentals
Hive Functions Cheat Sheet
MapReduce Design Patterns
Analytical Queries with Hive: SQL Windowing and Table Functions
Optimizing Hive Queries
Map reduce: beyond word count
Hadoop Administration pdf
Hive: Loading Data
How to understand and analyze Apache Hive query execution plan for performanc...
Ad

Similar to Hadoop Installation and basic configuration (20)

PPTX
Hadoop Cluster Configuration and Data Loading - Module 2
PDF
hdfs readrmation ghghg bigdats analytics info.pdf
PDF
Lecture 2 part 1
PPTX
Hadoop at a glance
PDF
Hadoop operations basic
PDF
Aziksa hadoop architecture santosh jha
PPT
Hadoop Tutorial
PPT
Hadoop -HDFS.ppt
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PDF
Hadoop Architecture and HDFS
PDF
hadoop distributed file systems complete information
PPT
4. v sphere big data extensions hadoop
PPTX
Hadoop architecture by ajay
PPTX
Introduction to HDFS
PPTX
Learn to setup a Hadoop Multi Node Cluster
PPT
Hadoop Tutorial
PPTX
PPTX
Hadoop and BigData - July 2016
PDF
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
PDF
Hadoop Architecture in Depth
Hadoop Cluster Configuration and Data Loading - Module 2
hdfs readrmation ghghg bigdats analytics info.pdf
Lecture 2 part 1
Hadoop at a glance
Hadoop operations basic
Aziksa hadoop architecture santosh jha
Hadoop Tutorial
Hadoop -HDFS.ppt
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture and HDFS
hadoop distributed file systems complete information
4. v sphere big data extensions hadoop
Hadoop architecture by ajay
Introduction to HDFS
Learn to setup a Hadoop Multi Node Cluster
Hadoop Tutorial
Hadoop and BigData - July 2016
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Hadoop Architecture in Depth

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
20250228 LYD VKU AI Blended-Learning.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...

Hadoop Installation and basic configuration

  • 5. Hardware Requirements ● NameNode + JobTracker – >= 2 cores – >= 8 gigs ram – >= 40gig disk RAID 10 ● DataNode + TaskTracker – >= 4 cores – >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM – >= N Gig disk space JBOD (no raid)
  • 6. Installation ● Download tar file from hadoop or use a prebuilt rpm ● https://github.com/gerritjvv/repo ● http://bigtop.apache.org/
  • 7. Configuration ● $HADOOP_HOME/conf/core-site.xml ● $HADOOP_HOME/conf/mapred-site.xml ● $HADOOP_HOME/conf/hdfs-site.xml ● http://hadoop.apache.org/docs/stable/cluster_setup ●
  • 8. Configuration Namenode ● Create directory for namenode metadata – /data/hadoop/name ● Open core-site.xml – Define fs.default.name = http://<host>:8020 ● Open hdfs-site.xml – Define dfs.name.dir=/data/hadoop/name – Define dfs.replication=3 – Create dir /data/hadoop/hdfs – Define dfs.data.dir=/data/hadoop/hdfs – Defin dfs.http.address=localhost:50070 ● Start the namenode with the format option – /opt/hadoop/bin/hadoop namenode -format – After the format start the namenode with service hadoop-namenode start
  • 9. Configuration JobTracker ● Open /opt/hadoop/conf/mapred-site.xml – Define the property mapred.job.tracker=<host>:8021 – Create the directory /data/hadoop/mapred – Define mapred.local.dir=/data/hadoop/mapred ● Start the JobTracker with service hadoop- jobtracker start
  • 10. Configuration DataNode ● On each datanode create the directory /data/hadoop/hdfs (one directory per disk) ● Open /opt/hadoop/conf/hdfs-site.xml – Define dfs.http.address=<host>:50070 – Define dfs.data.dir=/data/hadoop/hdfs ● Start the datanodes with service hadoop- datanode start
  • 11. Configuration Mapreduce ● On each datanode create the directory /data/hadoop/mapred ● Open /opt/hadoop/conf/mapred-site.xml – Define mapred.local.dir=/data/hadoop/mapred – Define mapred.tasktracker.map.tasks.maximum=<Number of map tasks> – Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce tasks> ● Start the TaskTrackers with service hadoop-tasktracker start
  • 12. Monitoring ● Web Html scraping – https://github.com/gerritjvv/hadoop-monitoring ● Glanglia – http://ganglia.info/?p=88 ● Cacti – http://blog.cloudera.com/blog/2009/07/hadoop-graphing
  • 13. Namenode Edits ● Writes/Updates/Deletes are written to RAM and to a write ahead log. ● The metadata in RAM is only merged into a binary file during the secondary namenode checkpoint ● This file corrupts easily ● Recovery is a manual task
  • 14. HA ● Yarn and Hadoop 2.0.0 ● Experimental ● http://hadoop.apache.org/docs/current/hadoop-yarn
  • 15. End