SlideShare a Scribd company logo
Map Reduce introduction (google white papers)
 A simple programming model
 Functional model
 For large-scale data processing
 Exploits large set of commodity computers
 Executes process in distributed manner
 Offers high availability
 Lots of demands for very large scale data
processing
 A certain common themes for these demands
 Lots of machines needed (scaling)
 Two basic operations on the input
▪ Map
▪ Reduce
 Map:
 Accepts input key/value
pair
 Emits intermediate
key/value pair
 Reduce :
 Accepts intermediate
key/value* pair
 Emits output key/value
pair
Very
big
data
Result
M
A
P
R
E
D
U
C
E
Partitioning
Function
Very
big
data
Split data
Split data
Split data
Split data
grep
grep
grep
grep
matches
matches
matches
matches
cat
All
matches
Map Reduce introduction (google white papers)
Map Reduce introduction (google white papers)
 Map
 Process a key/value pair to generate intermediate
key/value pairs
 Reduce
 Merge all intermediate values associated with the
same key
 Partition
 By default : hash(key) mod R
 Well balanced
 No reduce can begin until map is complete
 Master must communicate locations of
intermediate files
 Tasks scheduled based on location of data
 If map worker fails any time before reduce
finishes, task must be completely rerun
 MapReduce library does most of the hard work
for us!
 User to do list:
 indicate:
▪ Input/output files
▪ M: number of map tasks
▪ R: number of reduce tasks
▪ W: number of machines
 Write map and reduce functions
 Submit the job
 String Match, such as Grep
 Reverse index
 Count URL access frequency
 Lots of examples in data mining
Map Reduce introduction (google white papers)
 Provide a general-purpose model to simplify
large-scale computation
 Allow users to focus on the problem without
worrying about details
 Original paper
(http://labs.google.com/papers/mapreduce.h
tml)
 On wikipedia
(http://en.wikipedia.org/wiki/MapReduce)
 Hadoop – MapReduce in Java
(http://lucene.apache.org/hadoop/)
 http://code.google.com/edu/parallel/mapred
uce-tutorial.html

More Related Content

PDF
MapReduce
PPTX
Map Reduce
PDF
Floods Vs ArcGIS - Smart Analytics - Esri UK Annual Conference 2017
PPTX
FME Applications in New York City GIS
PPTX
Murphy presentation
PPTX
Dr Richard Fry - Using R as a GIS
PPTX
An Intro to Analysis in ArcGIS Pro - Smart Analytics - Esri UK Annual Confere...
MapReduce
Map Reduce
Floods Vs ArcGIS - Smart Analytics - Esri UK Annual Conference 2017
FME Applications in New York City GIS
Murphy presentation
Dr Richard Fry - Using R as a GIS
An Intro to Analysis in ArcGIS Pro - Smart Analytics - Esri UK Annual Confere...

What's hot (20)

PDF
Maps with leafletR
PPTX
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
PDF
Real Time Framework by Tonny
PPTX
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
PPT
Geolectioxydata
DOCX
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
PPT
03 sajjad ali -qgis working with raster
PPTX
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
PPTX
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
PPTX
Network topologies working
PPTX
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
PPTX
Creating Reports in SAS Final
PPTX
Automating Crime Data to Import into GIS
PPTX
MapReduce
PPTX
So Many Flightplans – So Many Problems
PDF
GoFFish - A Sub-graph centric framework for large scale graph analytics
PPT
Pricipal Component Analysis Using R
PPTX
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
PPTX
ML whitepaper v0.2
PPTX
Gathering information through web applications - Smart Collaboration - Esri U...
Maps with leafletR
Imagery Analysis in ArcGIS New View, New Vision - Technical - Esri UK Annual ...
Real Time Framework by Tonny
Advanced Analytics - Smart Analytics - Esri UK Annual Conference 2017
Geolectioxydata
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
03 sajjad ali -qgis working with raster
Office for National Statistics - Smart Data - Esri UK Annual Conference 2017
Analytics for Smarter Working in the Field - Smart Working - Esri UK Annual C...
Network topologies working
Using R to Visualize Spatial Data: R as GIS - Guy Lansley
Creating Reports in SAS Final
Automating Crime Data to Import into GIS
MapReduce
So Many Flightplans – So Many Problems
GoFFish - A Sub-graph centric framework for large scale graph analytics
Pricipal Component Analysis Using R
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
ML whitepaper v0.2
Gathering information through web applications - Smart Collaboration - Esri U...
Ad

Similar to Map Reduce introduction (google white papers) (20)

PPT
Map Reduce
PPTX
Map reduce helpful for college students.pptx
PPTX
This gives a brief detail about big data
PPTX
Map reducefunnyslide
PPTX
Mapreduce is for Hadoop Ecosystem in Data Science
PPT
Map reduce - simplified data processing on large clusters
PDF
2 mapreduce-model-principles
PPTX
introduction to Complete Map and Reduce Framework
PDF
Introduction of MapReduce
PPTX
Introduction to MapReduce
PDF
An Introduction to MapReduce
PPTX
ch02-mapreduce.pptx
PDF
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
PPTX
mapreduce.pptx
PPTX
Lecture2-MapReduce - An introductory lecture to Map Reduce
PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
PPTX
PPTX
Embarrassingly/Delightfully Parallel Problems
PDF
Mapreduce2008 cacm
Map Reduce
Map reduce helpful for college students.pptx
This gives a brief detail about big data
Map reducefunnyslide
Mapreduce is for Hadoop Ecosystem in Data Science
Map reduce - simplified data processing on large clusters
2 mapreduce-model-principles
introduction to Complete Map and Reduce Framework
Introduction of MapReduce
Introduction to MapReduce
An Introduction to MapReduce
ch02-mapreduce.pptx
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
mapreduce.pptx
Lecture2-MapReduce - An introductory lecture to Map Reduce
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
Embarrassingly/Delightfully Parallel Problems
Mapreduce2008 cacm
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Geodesy 1.pptx...............................................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Welding lecture in detail for understanding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
web development for engineering and engineering
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
R24 SURVEYING LAB MANUAL for civil enggi
Geodesy 1.pptx...............................................
Embodied AI: Ushering in the Next Era of Intelligent Systems
CYBER-CRIMES AND SECURITY A guide to understanding
Welding lecture in detail for understanding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
web development for engineering and engineering
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT-1 - COAL BASED THERMAL POWER PLANTS
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
Model Code of Practice - Construction Work - 21102022 .pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx

Map Reduce introduction (google white papers)

  • 2.  A simple programming model  Functional model  For large-scale data processing  Exploits large set of commodity computers  Executes process in distributed manner  Offers high availability
  • 3.  Lots of demands for very large scale data processing  A certain common themes for these demands  Lots of machines needed (scaling)  Two basic operations on the input ▪ Map ▪ Reduce
  • 4.  Map:  Accepts input key/value pair  Emits intermediate key/value pair  Reduce :  Accepts intermediate key/value* pair  Emits output key/value pair Very big data Result M A P R E D U C E Partitioning Function
  • 5. Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
  • 8.  Map  Process a key/value pair to generate intermediate key/value pairs  Reduce  Merge all intermediate values associated with the same key  Partition  By default : hash(key) mod R  Well balanced
  • 9.  No reduce can begin until map is complete  Master must communicate locations of intermediate files  Tasks scheduled based on location of data  If map worker fails any time before reduce finishes, task must be completely rerun  MapReduce library does most of the hard work for us!
  • 10.  User to do list:  indicate: ▪ Input/output files ▪ M: number of map tasks ▪ R: number of reduce tasks ▪ W: number of machines  Write map and reduce functions  Submit the job
  • 11.  String Match, such as Grep  Reverse index  Count URL access frequency  Lots of examples in data mining
  • 13.  Provide a general-purpose model to simplify large-scale computation  Allow users to focus on the problem without worrying about details
  • 14.  Original paper (http://labs.google.com/papers/mapreduce.h tml)  On wikipedia (http://en.wikipedia.org/wiki/MapReduce)  Hadoop – MapReduce in Java (http://lucene.apache.org/hadoop/)  http://code.google.com/edu/parallel/mapred uce-tutorial.html