SlideShare a Scribd company logo
Hadoop_Introduction_pptx.pptx
 Introduction to Hadoop
 Hadoop nodes & daemons
 Hadoop Architecture
 Characteristics
 Hadoop Features
2
The Technology that empowers Yahoo, Facebook, Twitter, Walmart and
others
Hadoop
3
An Open Source framework
that allows distributed
processing of large data-sets
across the cluster of
commodity hardware
4
An Open Source framework
that allows distributed
processing of large data-sets
across the cluster of
commodity hardware
Open Source
 Source code is freely
available
 It may be redistributed
and modified
5
An open source framework
that allows Distributed
Processing of large data-sets
across the cluster of
commodity hardware
Distributed Processing
 Data is processed/
distributed on multiple
nodes / servers
 Multiple machines
processes the data
independently
6
An open source framework
that allows distributed
processing of large data-sets
across the Cluster of
commodity hardware
Cluster
 Multiple machines
connected together
 Nodes are connected via
LAN
7
An open source framework
that allows distributed
processing of large data-sets
across the cluster of
Commodity Hardware
Commodity Hardware
 Economic / affordable
machines
 Typically low
performance hardware
8
 Open source framework written in Java
 Inspired by Google's Map-Reduce programming model as
well as its file system (GFS)
9
Hadoop defeated
Super computer
Hadoop became
top-level project
launched Hive,
SQL Support for Hadoop
Development of
started as Lucene sub-project
published GFS &
MapReduce papers
2002 2003 2005 2006 2008
Doug Cutting started
working on
Doug Cutting added
DFS & MapReduce
in
converted 4TB of
image archives over
100 EC2 instances
Doug Cutting
joined Cloudera
2009
2004
Hadoop History
2007
10
Hadoop consists of three key parts
11
Master Node Slave Node
Nodes
12
Master Node Slave Node
Resource
Manager
NameNode
Node
Manager
DataNode
Nodes
13
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
Sub
Work
14
15
 Source code is freely
available
 Can be redistributed
 Can be modified
Free
Affordabl
e
Communi
ty
Transpare
nt
Inter-
operable
No
vendor
lock
Open
Source
16
 Data is processed
distributedly on cluster
 Multiple nodes in the
cluster process data
independently
Centralized Processing
Distributed Processing
17
 Failure of nodes are
recovered automatically
 Framework takes care of
failure of hardware as well
tasks
18
 Data is reliably stored on
the cluster of machines
despite machine failures
 Failure of nodes doesn’t
cause data loss
19
 Data is highly available
and accessible despite
hardware failure
 There will be no downtime
for end user application
due to data
20
 Vertical Scalability – New
hardware can be added to
the nodes
 Horizontal Scalability –
New nodes can be added
on the fly
21
 No need to purchase costly license
 No need to purchase costly hardware
Economic
Open
Source
Commodity
Hardware =
+
22
 Distributed computing
challenges are handled by
framework
 Client just need to concentrate
on business logic
23
 Move computation to data
instead of data to
computation
 Data is processed on the
nodes where it is stored
Storage Servers App Servers
Dat
a
Dat
a
Dat
a
Dat
a
Servers
Dat
a
Dat
a
Dat
a
Dat
a
Algorith
m
Alg
o
Alg
o
Alg
o
Alg
o
24
 Everyday we generate 2.5 quintillion bytes of data
 Hadoop handles huge volumes of data efficiently
 Hadoop uses the power of distributed computing
 HDFS & Yarn are two main components of Hadoop
 It is highly fault tolerant, reliable & available
25
Hadoop_Introduction_pptx.pptx

More Related Content

PPTX
hadoop_Introduction module 2 and chapter 3pptx.pptx
PPTX
Hadoop_Introduction unit-2 for vtu syllabus
PPTX
2. hadoop fundamentals
PPT
Hadoop mapreduce and yarn frame work- unit5
PPTX
Hadoop and Big data in Big data and cloud.pptx
PPTX
Hadoop ppt1
PPTX
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
PPTX
hadoop_Introduction module 2 and chapter 3pptx.pptx
Hadoop_Introduction unit-2 for vtu syllabus
2. hadoop fundamentals
Hadoop mapreduce and yarn frame work- unit5
Hadoop and Big data in Big data and cloud.pptx
Hadoop ppt1
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY

Similar to Hadoop_Introduction_pptx.pptx (20)

PDF
Introduction to Hadoop Administration
PDF
Introduction to Hadoop Administration
PPTX
Features of Hadoop
PPT
Hadoop
PDF
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
PPT
Hadoop training in bangalore
PPTX
Hadoop info
PPTX
Hadoop.pptx
PPTX
List of Engineering Colleges in Uttarakhand
PPTX
Hadoop.pptx
PPTX
Apache Hadoop Big Data Technology
PDF
20131205 hadoop-hdfs-map reduce-introduction
PPTX
Hadoop introduction
PPTX
002 Introduction to hadoop v3
PPTX
PPTX
Hadoo its a good pdf to read some notes p.pptx
PPT
Hadoop and Mapreduce Introduction
PPTX
MOD-2 presentation on engineering students
PPTX
Hadoop architecture-tutorial
PPTX
Hadoop introduction
Introduction to Hadoop Administration
Introduction to Hadoop Administration
Features of Hadoop
Hadoop
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop training in bangalore
Hadoop info
Hadoop.pptx
List of Engineering Colleges in Uttarakhand
Hadoop.pptx
Apache Hadoop Big Data Technology
20131205 hadoop-hdfs-map reduce-introduction
Hadoop introduction
002 Introduction to hadoop v3
Hadoo its a good pdf to read some notes p.pptx
Hadoop and Mapreduce Introduction
MOD-2 presentation on engineering students
Hadoop architecture-tutorial
Hadoop introduction

More from Shrinivasa6 (10)

PPT
shortest path algorithms with different examplesppt
PPT
dynamic-programming unit 3 power point presentation
PPTX
Module 2_Chapter 3_HDFS DATA STORAGE.pptx
PPTX
Module 2 Chapter 6 Yet another resource locater.pptx
PPTX
Big data analytics Module1 contents pptx
PPTX
Module 2 C2_HadoopEcosystemComponents.pptx
PPTX
BDA: Big Data Analytics for Unit-1 Vtu syllabus
PPTX
M4,C5 APACHE PIG.pptx
PPTX
Module-1.pptx63.pptx
PPTX
BDA_Module1.pptx
shortest path algorithms with different examplesppt
dynamic-programming unit 3 power point presentation
Module 2_Chapter 3_HDFS DATA STORAGE.pptx
Module 2 Chapter 6 Yet another resource locater.pptx
Big data analytics Module1 contents pptx
Module 2 C2_HadoopEcosystemComponents.pptx
BDA: Big Data Analytics for Unit-1 Vtu syllabus
M4,C5 APACHE PIG.pptx
Module-1.pptx63.pptx
BDA_Module1.pptx

Recently uploaded (20)

PPTX
Current and future trends in Computer Vision.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
737-MAX_SRG.pdf student reference guides
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PDF
PPT on Performance Review to get promotions
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT
Occupational Health and Safety Management System
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPT
Total quality management ppt for engineering students
PPTX
communication and presentation skills 01
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Current and future trends in Computer Vision.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
737-MAX_SRG.pdf student reference guides
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
III.4.1.2_The_Space_Environment.p pdffdf
Safety Seminar civil to be ensured for safe working.
86236642-Electric-Loco-Shed.pdf jfkduklg
PPT on Performance Review to get promotions
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Visual Aids for Exploratory Data Analysis.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Occupational Health and Safety Management System
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Fundamentals of safety and accident prevention -final (1).pptx
Total quality management ppt for engineering students
communication and presentation skills 01
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems

Hadoop_Introduction_pptx.pptx

  • 2.  Introduction to Hadoop  Hadoop nodes & daemons  Hadoop Architecture  Characteristics  Hadoop Features 2
  • 3. The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others Hadoop 3
  • 4. An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware 4
  • 5. An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware Open Source  Source code is freely available  It may be redistributed and modified 5
  • 6. An open source framework that allows Distributed Processing of large data-sets across the cluster of commodity hardware Distributed Processing  Data is processed/ distributed on multiple nodes / servers  Multiple machines processes the data independently 6
  • 7. An open source framework that allows distributed processing of large data-sets across the Cluster of commodity hardware Cluster  Multiple machines connected together  Nodes are connected via LAN 7
  • 8. An open source framework that allows distributed processing of large data-sets across the cluster of Commodity Hardware Commodity Hardware  Economic / affordable machines  Typically low performance hardware 8
  • 9.  Open source framework written in Java  Inspired by Google's Map-Reduce programming model as well as its file system (GFS) 9
  • 10. Hadoop defeated Super computer Hadoop became top-level project launched Hive, SQL Support for Hadoop Development of started as Lucene sub-project published GFS & MapReduce papers 2002 2003 2005 2006 2008 Doug Cutting started working on Doug Cutting added DFS & MapReduce in converted 4TB of image archives over 100 EC2 instances Doug Cutting joined Cloudera 2009 2004 Hadoop History 2007 10
  • 11. Hadoop consists of three key parts 11
  • 12. Master Node Slave Node Nodes 12
  • 13. Master Node Slave Node Resource Manager NameNode Node Manager DataNode Nodes 13
  • 15. 15
  • 16.  Source code is freely available  Can be redistributed  Can be modified Free Affordabl e Communi ty Transpare nt Inter- operable No vendor lock Open Source 16
  • 17.  Data is processed distributedly on cluster  Multiple nodes in the cluster process data independently Centralized Processing Distributed Processing 17
  • 18.  Failure of nodes are recovered automatically  Framework takes care of failure of hardware as well tasks 18
  • 19.  Data is reliably stored on the cluster of machines despite machine failures  Failure of nodes doesn’t cause data loss 19
  • 20.  Data is highly available and accessible despite hardware failure  There will be no downtime for end user application due to data 20
  • 21.  Vertical Scalability – New hardware can be added to the nodes  Horizontal Scalability – New nodes can be added on the fly 21
  • 22.  No need to purchase costly license  No need to purchase costly hardware Economic Open Source Commodity Hardware = + 22
  • 23.  Distributed computing challenges are handled by framework  Client just need to concentrate on business logic 23
  • 24.  Move computation to data instead of data to computation  Data is processed on the nodes where it is stored Storage Servers App Servers Dat a Dat a Dat a Dat a Servers Dat a Dat a Dat a Dat a Algorith m Alg o Alg o Alg o Alg o 24
  • 25.  Everyday we generate 2.5 quintillion bytes of data  Hadoop handles huge volumes of data efficiently  Hadoop uses the power of distributed computing  HDFS & Yarn are two main components of Hadoop  It is highly fault tolerant, reliable & available 25