SlideShare a Scribd company logo
Gandhinagar Institute of Technology
SUBJECT – DMBI (2170715)
Hadoop MapReduce Paradigm
Prepared By-
Tarj Mehta (170120107074)
Guided By – Prof. Nisha Khurana
Have you ever wondered how Google applies queries on
their large mountain of data?
How Facebook is quickly able to deal with such large
quantities of information?
The problem…
• In the early days companies had to pay money to database vendors to
house their data.
• This technique was good for small to medium amount of data.
• In early 2000, Google ran into a problem.
• They had to pay large amount of money to database vendors like
Oracle, IBM and Microsoft to fit their data. Hence, data processing
was turning expensive.
The solution…
• To address the problem, Google Labs team developed an algorithm.
• The algorithm allowed calculations of large data to be chopped into
smaller chunks. (tuples of data)
• The small chunks were mapped to many computers.
• When required the calculations can be done again to bring it back
together and produce resulting data set.
• This algorithm is called Map Reduce.
Hadoop MapReduce Paradigm
Hadoop
• The Map Reduce algorithm was later used to develop an open source
project called Hadoop.
• It allows different applications to run using Map Reduce algorithm.
• Simply it can be said that data is processed in parallel and not in
serial.
• It depends on Java coding.
Why Hadoop?
• In organizations with >10 Terabytes of data, high calculation
complexity like statistical simulations takes time to compute.
• Hadoop plays a central role in statistical analysis, ETL (Extract,
Transfer, Load)processing and business applications.
The Algorithm
• MapReduce is based on “sending computer to where data resides”.
• It has 2 stages:
1. Map stage: process input data (form of directory or file) stored in
Hadoop File system (HDFS). Input file is sent line by line and
converted to smaller chunks.
2. Reduce Stage: Reducer’s job is to process data that comes from
mapper. After processing, it produces a new set of output which is
stored in HDFS.
• During Map and Reduce job, Hadoop sends map and reduce task to
appropriate servers in the cluster.
• The Hadoop manages all the details of data like issuing tasks, verifying
task completion and copy data around the cluster.
• Most of the calculations is done on nodes with data to reduce
network traffic.
• After completion of the given tasks, cluster collects and reduces the
data to form appropriate result and send back to Hadoop server.
References
• https://www.youtube.com/watch?v=9s-vSeWej1U
• https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
Thank YOU.

More Related Content

PDF
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
PPTX
Data-Intensive Technologies for Cloud Computing
PPTX
Big data technology unit 3
PDF
Big Data on Implementation of Many to Many Clustering
PPTX
Big data and hadoop
PDF
Data Engineering Basics
PDF
Data scientist a perfect job
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Data-Intensive Technologies for Cloud Computing
Big data technology unit 3
Big Data on Implementation of Many to Many Clustering
Big data and hadoop
Data Engineering Basics
Data scientist a perfect job

What's hot (18)

PPTX
Mapreduce script
PPTX
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
PPTX
عصر کلان داده، چرا و چگونه؟
PPTX
Big Data Hadoop (Overview)
PPTX
Data lake-itweekend-sharif university-vahid amiry
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PDF
Hadoop Master Class : A concise overview
PDF
IS OLAP DEAD IN THE AGE OF BIG DATA?
PPTX
Hadoop: Distributed Data Processing
PPTX
Big data
PDF
Magical Methods for Batch Data Processing
PPTX
Introduction to Apache Hadoop Eco-System
PPTX
Big data processing using hadoop poster presentation
PDF
An introduction to Workload Modelling for Cloud Applications
PDF
"Machine Learning and Internet of Things, the future of medical prevention", ...
PPTX
PPT
OLAP Cubes in Datawarehousing
PPS
Big data hadoop rdbms
Mapreduce script
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
عصر کلان داده، چرا و چگونه؟
Big Data Hadoop (Overview)
Data lake-itweekend-sharif university-vahid amiry
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
Hadoop Master Class : A concise overview
IS OLAP DEAD IN THE AGE OF BIG DATA?
Hadoop: Distributed Data Processing
Big data
Magical Methods for Batch Data Processing
Introduction to Apache Hadoop Eco-System
Big data processing using hadoop poster presentation
An introduction to Workload Modelling for Cloud Applications
"Machine Learning and Internet of Things, the future of medical prevention", ...
OLAP Cubes in Datawarehousing
Big data hadoop rdbms
Ad

Similar to Hadoop MapReduce Paradigm (20)

PPTX
Introduction to Hadoop and MapReduce
PPTX
Big Data and Hadoop
PPT
Introduccion a Hadoop / Introduction to Hadoop
PPTX
Learn what is Hadoop-and-BigData
PPTX
Big Data
PPTX
Hadoop and Mapreduce for .NET User Group
PPTX
Hadoop training-in-hyderabad
PPT
Seminar Presentation Hadoop
PPTX
Introduction to Apache Hadoop
PPTX
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
PPTX
writing Hadoop Map Reduce programs
PPTX
Hadoop live online training
PPTX
Hadoop and MapReduce Introductort presentation
PPT
Hadoop hive presentation
PPTX
Hadoop and MapReduce addDdaDadadDDAD.pptx
PPTX
Analysing of big data using map reduce
PDF
B017320612
PDF
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Introduction to Hadoop and MapReduce
Big Data and Hadoop
Introduccion a Hadoop / Introduction to Hadoop
Learn what is Hadoop-and-BigData
Big Data
Hadoop and Mapreduce for .NET User Group
Hadoop training-in-hyderabad
Seminar Presentation Hadoop
Introduction to Apache Hadoop
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
writing Hadoop Map Reduce programs
Hadoop live online training
Hadoop and MapReduce Introductort presentation
Hadoop hive presentation
Hadoop and MapReduce addDdaDadadDDAD.pptx
Analysing of big data using map reduce
B017320612
Leveraging Map Reduce With Hadoop for Weather Data Analytics
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
composite construction of structures.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Geodesy 1.pptx...............................................
PDF
Digital Logic Computer Design lecture notes
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Welding lecture in detail for understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
composite construction of structures.pdf
573137875-Attendance-Management-System-original
Foundation to blockchain - A guide to Blockchain Tech
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Embodied AI: Ushering in the Next Era of Intelligent Systems
R24 SURVEYING LAB MANUAL for civil enggi
Geodesy 1.pptx...............................................
Digital Logic Computer Design lecture notes
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Welding lecture in detail for understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
OOP with Java - Java Introduction (Basics)
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Hadoop MapReduce Paradigm

  • 1. Gandhinagar Institute of Technology SUBJECT – DMBI (2170715) Hadoop MapReduce Paradigm Prepared By- Tarj Mehta (170120107074) Guided By – Prof. Nisha Khurana
  • 2. Have you ever wondered how Google applies queries on their large mountain of data? How Facebook is quickly able to deal with such large quantities of information?
  • 3. The problem… • In the early days companies had to pay money to database vendors to house their data. • This technique was good for small to medium amount of data. • In early 2000, Google ran into a problem. • They had to pay large amount of money to database vendors like Oracle, IBM and Microsoft to fit their data. Hence, data processing was turning expensive.
  • 4. The solution… • To address the problem, Google Labs team developed an algorithm. • The algorithm allowed calculations of large data to be chopped into smaller chunks. (tuples of data) • The small chunks were mapped to many computers. • When required the calculations can be done again to bring it back together and produce resulting data set. • This algorithm is called Map Reduce.
  • 6. Hadoop • The Map Reduce algorithm was later used to develop an open source project called Hadoop. • It allows different applications to run using Map Reduce algorithm. • Simply it can be said that data is processed in parallel and not in serial. • It depends on Java coding.
  • 7. Why Hadoop? • In organizations with >10 Terabytes of data, high calculation complexity like statistical simulations takes time to compute. • Hadoop plays a central role in statistical analysis, ETL (Extract, Transfer, Load)processing and business applications.
  • 8. The Algorithm • MapReduce is based on “sending computer to where data resides”. • It has 2 stages: 1. Map stage: process input data (form of directory or file) stored in Hadoop File system (HDFS). Input file is sent line by line and converted to smaller chunks. 2. Reduce Stage: Reducer’s job is to process data that comes from mapper. After processing, it produces a new set of output which is stored in HDFS.
  • 9. • During Map and Reduce job, Hadoop sends map and reduce task to appropriate servers in the cluster. • The Hadoop manages all the details of data like issuing tasks, verifying task completion and copy data around the cluster. • Most of the calculations is done on nodes with data to reduce network traffic. • After completion of the given tasks, cluster collects and reduces the data to form appropriate result and send back to Hadoop server.