SlideShare a Scribd company logo
Hadoop 2.0
Architecture
HELLO!
I am Ganesh
Balavengattaraman
BigData
Huge Amount of Data (Terabytes or Petabytes)
Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using on-
hand database management tools or traditional data
processing applications. (MySql, Oracle, etc.)
The challenges include capture, storage, search, sharing,
transfer, analysis, and visualization.
Hadoop Architecture
Hadoop
Apache Hadoop is a framework that allows the distributed
processing of large data sets across clusters of
commodity hardware using a simple programming mode.
It is an Open-source Data Management with scale-out
storage and distributed processing.
Hadoop Ecosystem
Hadoop Architecture 2.0
Hadoop Architecture
Block Split
Its the physical division of data file done by HDFS
while storing it
128 MB of blocks size by default for Hadoop 2.0.
Rack Awareness
HDFS stores blocks on the cluster in a rack aware
fashion i.e. one block on one rack and the other
two blocks on other rack
Block Replication in HDFS
Provides redundancy and fault tolerance to the
data saved. The default value is 3
Hadoop Architecture
Hadoop Architecture
RM-Resource Manager
1. It is the global resource scheduler
2. It runs on the Master Node of the Cluster
3. It is responsible for negotiating the resources of
the system amongst the competing applications.
4. It keeps a track on the heartbeats from the Node
Manager
NM-Node Manager
1.Node Manager communicates with the resource
manager.
2.It runs on the Slave Nodes of the Cluster
AM-Application Master
1.There is one AM per application which is
application specific or framework specific.
2.The AM runs in Containers that are created by
the resource manager on request.
THANKS!
Any questions?

More Related Content

PPTX
PPTX
Hadoop And Their Ecosystem
PPTX
HADOOP TECHNOLOGY ppt
PPTX
PPT on Hadoop
PDF
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
PPTX
Hadoop
PPT
Hadoop hive presentation
PDF
Hadoop ecosystem
Hadoop And Their Ecosystem
HADOOP TECHNOLOGY ppt
PPT on Hadoop
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Hadoop
Hadoop hive presentation
Hadoop ecosystem

What's hot (19)

PDF
Hadoop Ecosystem
PPT
Hadoop technology
PPT
Hadoop Technologies
PPTX
Hadoop
PDF
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
PPTX
Hadoop Technology
PPTX
Apache hadoop technology : Beginners
PPTX
Hadoop technology
PPTX
Big Data and Hadoop - An Introduction
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
PPTX
Apache Hadoop
PPTX
Hadoop vs Apache Spark
DOCX
PDF
Big Data and Hadoop Ecosystem
PPTX
Big data and tools
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPTX
Design of Hadoop Distributed File System
Hadoop Ecosystem
Hadoop technology
Hadoop Technologies
Hadoop
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop Technology
Apache hadoop technology : Beginners
Hadoop technology
Big Data and Hadoop - An Introduction
Introduction to Big Data & Hadoop Architecture - Module 1
Apache Hadoop
Hadoop vs Apache Spark
Big Data and Hadoop Ecosystem
Big data and tools
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Design of Hadoop Distributed File System
Ad

Similar to Hadoop Architecture (20)

PPTX
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
PDF
2.1-HADOOP.pdf
PPTX
Big Data Analytics -Introduction education
PDF
BDA Mod2@AzDOCUMENTS.in.pdf
PPTX
Big Data Analytics Presentation on the resourcefulness of Big data
PPTX
Distributed Systems Hadoop.pptx
PPTX
BIG DATA: Apache Hadoop
PPTX
Seminar ppt
DOCX
project report on hadoop
PPTX
Big data and Hadoop Section..............
PPTX
Introduction to Hadoop and Hadoop component
PPTX
Big data and hadoop
PPTX
Hadoop An Introduction
PPT
hadoop
PPT
hadoop
PDF
10 Features Of Hadoop That made Popular .
PPTX
Hadoop basics
PPTX
2.introduction to hdfs
PDF
Hadoop overview.pdf
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
2.1-HADOOP.pdf
Big Data Analytics -Introduction education
BDA Mod2@AzDOCUMENTS.in.pdf
Big Data Analytics Presentation on the resourcefulness of Big data
Distributed Systems Hadoop.pptx
BIG DATA: Apache Hadoop
Seminar ppt
project report on hadoop
Big data and Hadoop Section..............
Introduction to Hadoop and Hadoop component
Big data and hadoop
Hadoop An Introduction
hadoop
hadoop
10 Features Of Hadoop That made Popular .
Hadoop basics
2.introduction to hdfs
Hadoop overview.pdf
Ad

Hadoop Architecture

  • 3. BigData Huge Amount of Data (Terabytes or Petabytes) Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on- hand database management tools or traditional data processing applications. (MySql, Oracle, etc.) The challenges include capture, storage, search, sharing, transfer, analysis, and visualization.
  • 5. Hadoop Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity hardware using a simple programming mode. It is an Open-source Data Management with scale-out storage and distributed processing.
  • 9. Block Split Its the physical division of data file done by HDFS while storing it 128 MB of blocks size by default for Hadoop 2.0.
  • 10. Rack Awareness HDFS stores blocks on the cluster in a rack aware fashion i.e. one block on one rack and the other two blocks on other rack Block Replication in HDFS Provides redundancy and fault tolerance to the data saved. The default value is 3
  • 13. RM-Resource Manager 1. It is the global resource scheduler 2. It runs on the Master Node of the Cluster 3. It is responsible for negotiating the resources of the system amongst the competing applications. 4. It keeps a track on the heartbeats from the Node Manager
  • 14. NM-Node Manager 1.Node Manager communicates with the resource manager. 2.It runs on the Slave Nodes of the Cluster AM-Application Master 1.There is one AM per application which is application specific or framework specific. 2.The AM runs in Containers that are created by the resource manager on request.