SlideShare a Scribd company logo
3
Most read
4
Most read
6
Most read
Dr. C.V. Suresh Babu
(CentreforKnowledgeTransfer)
institute
Discussion Topics
• Introduction
• Components of Hadoop
• MapReduce
• Map Task
• Reduce Task
• Anatomy of a Map Reduce
(CentreforKnowledgeTransfer)
institute
Introduction
• Hadoop is a framework written in Java that utilizes a large cluster of
commodity hardware to maintain and store big size data.
• Hadoop works on MapReduce Programming Algorithm that was introduced
by Google.
• Today lots of Big Brand Companys are using Hadoop in their Organization
to deal with big data for eg. Facebook, Yahoo, Netflix, eBay, etc.
(CentreforKnowledgeTransfer)
institute
Components of Hadoop
The Hadoop Architecture Mainly consists of 4 components.
• MapReduce
• HDFS(Hadoop distributed File System)
• YARN(Yet Another Resource Framework)
• Common Utilities or Hadoop Common
(CentreforKnowledgeTransfer)
institute
A Hadoop cluster
consists of a single
master and multiple
slave nodes. The master
node includes Job
Tracker, Task Tracker,
NameNode, and
DataNode whereas the
slave node includes
DataNode and
TaskTracker.
(CentreforKnowledgeTransfer)
institute
MapReduce
• MapReduce nothing but just like an Algorithm or a data structure that is
based on the YARN framework.
• The major feature of MapReduce is to perform the distributed processing in
parallel in a Hadoop cluster which Makes Hadoop working so fast.
• When you are dealing with Big Data, serial processing is no more of any use.
• MapReduce has mainly 2 tasks which are divided phase-wise:
 Map Task
 Reduce Task
(CentreforKnowledgeTransfer)
institute
Map Task
Here, we can see that
the Input is provided to
the Map() function then
it’s output is used as an
input to the Reduce
function and after that,
we receive our final
output.
In first phase, Map is utilized and in next phase Reduce is
utilized.
(CentreforKnowledgeTransfer)
institute
Map()
• As we can see that an Input is provided to
the Map(), now as we are using Big Data. The
Input is a set of Data.
• The Map() function here breaks this
DataBlocks into Tuples that are nothing but
a key-value pair.
• These key-value pairs are now sent as input
to the Reduce().
Reduce()
• The Reduce() function then combines this
broken Tuples or key-value pair based on
its Key value and form set of Tuples, and
perform some operation like sorting,
summation type job, etc. which is then
sent to the final Output Node.
• Finally, the Output is Obtained.
Note: The data processing is always done in Reducer depending upon the business requirement
of that industry. This is How First Map() and then Reduce is utilized one by one.
(CentreforKnowledgeTransfer)
institute
Map Task
• RecordReader The purpose of recoredreader is to break the records. It is responsible
for providing key-value pairs in a Map() function. The key is actually is its locational
information and value is the data associated with it.
• Map: A map is nothing but a user-defined function whose work is to process the
Tuples obtained from record reader. The Map() function either does not generate any
key-value pair or generate multiple pairs of these tuples.
• Combiner: Combiner is used for grouping the data in the Map workflow. It is similar
to a Local reducer. The intermediate key-value that are generated in the Map is
combined with the help of this combiner. Using a combiner is not necessary as it is
optional.
• Partitionar: Partitional is responsible for fetching key-value pairs generated in the
Mapper Phases. The partitioner generates the shards corresponding to each reducer.
Hashcode of each key is also fetched by this partition. Then partitioner performs
it’s(Hashcode) modulus with the number of reducers(key.hashcode()%(number of
reducers)).
(CentreforKnowledgeTransfer)
institute
Reduce Task
• Shuffle and Sort: The Task of Reducer starts with this step, the
process in which the Mapper generates the intermediate key-value and
transfers them to the Reducer task is known as Shuffling. Using the
Shuffling process the system can sort the data using its key
value.Once some of the Mapping tasks are done Shuffling begins that
is why it is a faster process and does not wait for the completion of the
task performed by Mapper.
• Reduce: The main function or task of the Reduce is to gather the Tuple
generated from Map and then perform some sorting and aggregation
sort of process on those key-value depending on its key element.
• OutputFormat: Once all the operations are performed, the key-value
pairs are written into the file with the help of record writer, each
record in a new line, and the key and value in a space-separated
manner.
(CentreforKnowledgeTransfer)
institute
Anatomy
of a
Map
Reduce
(CentreforKnowledgeTransfer)
institute

More Related Content

PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PPSX
PPTX
PPTX
PPT on Hadoop
PDF
Data Mesh 101
PPTX
introduction to NOSQL Database
PDF
Data Mesh for Dinner
PPTX
1 introduction to environmental engineering
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PPT on Hadoop
Data Mesh 101
introduction to NOSQL Database
Data Mesh for Dinner
1 introduction to environmental engineering

What's hot (20)

PPTX
HADOOP TECHNOLOGY ppt
PPTX
Map Reduce
PPT
Hive(ppt)
PPT
Map reduce in BIG DATA
PPTX
Hadoop File system (HDFS)
PPTX
Hadoop And Their Ecosystem ppt
PPTX
Introduction to Map Reduce
PPTX
Big Data Analytics with Hadoop
PDF
Hadoop Overview & Architecture
 
PPTX
Introduction to Hadoop and Hadoop component
PPTX
Introduction to Hadoop
PPT
Hadoop Map Reduce
PDF
Hadoop & MapReduce
PPTX
Big data and Hadoop
PDF
Hadoop YARN
PPTX
OLAP operations
PPTX
Concurrency Control in Distributed Database.
PDF
HDFS Architecture
PPTX
Fault tolerance in distributed systems
PPTX
Dynamic Itemset Counting
HADOOP TECHNOLOGY ppt
Map Reduce
Hive(ppt)
Map reduce in BIG DATA
Hadoop File system (HDFS)
Hadoop And Their Ecosystem ppt
Introduction to Map Reduce
Big Data Analytics with Hadoop
Hadoop Overview & Architecture
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop
Hadoop Map Reduce
Hadoop & MapReduce
Big data and Hadoop
Hadoop YARN
OLAP operations
Concurrency Control in Distributed Database.
HDFS Architecture
Fault tolerance in distributed systems
Dynamic Itemset Counting
Ad

Similar to Hadoop Architecture (20)

PDF
Hadoop eco system with mapreduce hive and pig
PPTX
Types_of_Stats.pptxTypes_of_Stats.pptxTypes_of_Stats.pptx
PPTX
writing Hadoop Map Reduce programs
PPTX
MapReduce.pptx
PPTX
Hadoop – Architecture.pptx
PDF
Big Data Analytics Chapter3-6@2021.pdf
PPTX
Hadoop fault tolerance
PDF
Hadoop interview questions - Softwarequery.com
PDF
PPT
Big Data- process of map reducing MapReduce- .ppt
PPTX
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
PPTX
MAP REDUCE IN DATA SCIENCE.pptx
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
PPTX
Big Data.pptx
PPTX
Join Algorithms in MapReduce
PPTX
Lecture 04 big data analytics | map reduce
PDF
MapReduce basics
PPTX
Hadoop and MapReduce Introductort presentation
PPTX
Map reduce prashant
PPTX
Hadoop eco system with mapreduce hive and pig
Types_of_Stats.pptxTypes_of_Stats.pptxTypes_of_Stats.pptx
writing Hadoop Map Reduce programs
MapReduce.pptx
Hadoop – Architecture.pptx
Big Data Analytics Chapter3-6@2021.pdf
Hadoop fault tolerance
Hadoop interview questions - Softwarequery.com
Big Data- process of map reducing MapReduce- .ppt
CLOUD_COMPUTING_MODULE4_RK_BIG_DATA.pptx
MAP REDUCE IN DATA SCIENCE.pptx
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
Big Data.pptx
Join Algorithms in MapReduce
Lecture 04 big data analytics | map reduce
MapReduce basics
Hadoop and MapReduce Introductort presentation
Map reduce prashant
Ad

More from Dr. C.V. Suresh Babu (20)

PPTX
Data analytics with R
PPTX
Association rules
PPTX
PPTX
Classification
PPTX
Blue property assumptions.
PPTX
Introduction to regression
PPTX
Expert systems
PPTX
Dempster shafer theory
PPTX
Bayes network
PPTX
Bayes' theorem
PPTX
Knowledge based agents
PPTX
Rule based system
PPTX
Formal Logic in AI
PPTX
Production based system
PPTX
Game playing in AI
PPTX
Diagnosis test of diabetics and hypertension by AI
PPTX
A study on “impact of artificial intelligence in covid19 diagnosis”
PDF
A study on “impact of artificial intelligence in covid19 diagnosis”
Data analytics with R
Association rules
Classification
Blue property assumptions.
Introduction to regression
Expert systems
Dempster shafer theory
Bayes network
Bayes' theorem
Knowledge based agents
Rule based system
Formal Logic in AI
Production based system
Game playing in AI
Diagnosis test of diabetics and hypertension by AI
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
history of c programming in notes for students .pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Transform Your Business with a Software ERP System
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
ai tools demonstartion for schools and inter college
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPT
Introduction Database Management System for Course Database
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
history of c programming in notes for students .pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Upgrade and Innovation Strategies for SAP ERP Customers
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Design an Analysis of Algorithms II-SECS-1021-03
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PTS Company Brochure 2025 (1).pdf.......
Transform Your Business with a Software ERP System
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Which alternative to Crystal Reports is best for small or large businesses.pdf
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
ai tools demonstartion for schools and inter college
ISO 45001 Occupational Health and Safety Management System
Navsoft: AI-Powered Business Solutions & Custom Software Development
ManageIQ - Sprint 268 Review - Slide Deck
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Introduction Database Management System for Course Database
Wondershare Filmora 15 Crack With Activation Key [2025

Hadoop Architecture

  • 1. Dr. C.V. Suresh Babu (CentreforKnowledgeTransfer) institute
  • 2. Discussion Topics • Introduction • Components of Hadoop • MapReduce • Map Task • Reduce Task • Anatomy of a Map Reduce (CentreforKnowledgeTransfer) institute
  • 3. Introduction • Hadoop is a framework written in Java that utilizes a large cluster of commodity hardware to maintain and store big size data. • Hadoop works on MapReduce Programming Algorithm that was introduced by Google. • Today lots of Big Brand Companys are using Hadoop in their Organization to deal with big data for eg. Facebook, Yahoo, Netflix, eBay, etc. (CentreforKnowledgeTransfer) institute
  • 4. Components of Hadoop The Hadoop Architecture Mainly consists of 4 components. • MapReduce • HDFS(Hadoop distributed File System) • YARN(Yet Another Resource Framework) • Common Utilities or Hadoop Common (CentreforKnowledgeTransfer) institute
  • 5. A Hadoop cluster consists of a single master and multiple slave nodes. The master node includes Job Tracker, Task Tracker, NameNode, and DataNode whereas the slave node includes DataNode and TaskTracker. (CentreforKnowledgeTransfer) institute
  • 6. MapReduce • MapReduce nothing but just like an Algorithm or a data structure that is based on the YARN framework. • The major feature of MapReduce is to perform the distributed processing in parallel in a Hadoop cluster which Makes Hadoop working so fast. • When you are dealing with Big Data, serial processing is no more of any use. • MapReduce has mainly 2 tasks which are divided phase-wise:  Map Task  Reduce Task (CentreforKnowledgeTransfer) institute
  • 7. Map Task Here, we can see that the Input is provided to the Map() function then it’s output is used as an input to the Reduce function and after that, we receive our final output. In first phase, Map is utilized and in next phase Reduce is utilized. (CentreforKnowledgeTransfer) institute
  • 8. Map() • As we can see that an Input is provided to the Map(), now as we are using Big Data. The Input is a set of Data. • The Map() function here breaks this DataBlocks into Tuples that are nothing but a key-value pair. • These key-value pairs are now sent as input to the Reduce(). Reduce() • The Reduce() function then combines this broken Tuples or key-value pair based on its Key value and form set of Tuples, and perform some operation like sorting, summation type job, etc. which is then sent to the final Output Node. • Finally, the Output is Obtained. Note: The data processing is always done in Reducer depending upon the business requirement of that industry. This is How First Map() and then Reduce is utilized one by one. (CentreforKnowledgeTransfer) institute
  • 9. Map Task • RecordReader The purpose of recoredreader is to break the records. It is responsible for providing key-value pairs in a Map() function. The key is actually is its locational information and value is the data associated with it. • Map: A map is nothing but a user-defined function whose work is to process the Tuples obtained from record reader. The Map() function either does not generate any key-value pair or generate multiple pairs of these tuples. • Combiner: Combiner is used for grouping the data in the Map workflow. It is similar to a Local reducer. The intermediate key-value that are generated in the Map is combined with the help of this combiner. Using a combiner is not necessary as it is optional. • Partitionar: Partitional is responsible for fetching key-value pairs generated in the Mapper Phases. The partitioner generates the shards corresponding to each reducer. Hashcode of each key is also fetched by this partition. Then partitioner performs it’s(Hashcode) modulus with the number of reducers(key.hashcode()%(number of reducers)). (CentreforKnowledgeTransfer) institute
  • 10. Reduce Task • Shuffle and Sort: The Task of Reducer starts with this step, the process in which the Mapper generates the intermediate key-value and transfers them to the Reducer task is known as Shuffling. Using the Shuffling process the system can sort the data using its key value.Once some of the Mapping tasks are done Shuffling begins that is why it is a faster process and does not wait for the completion of the task performed by Mapper. • Reduce: The main function or task of the Reduce is to gather the Tuple generated from Map and then perform some sorting and aggregation sort of process on those key-value depending on its key element. • OutputFormat: Once all the operations are performed, the key-value pairs are written into the file with the help of record writer, each record in a new line, and the key and value in a space-separated manner. (CentreforKnowledgeTransfer) institute