SlideShare a Scribd company logo
MapReduce
Functional ProgrammingMapApply function to transform elements of a list Return results as a listReduceApply function to all elements of a listCollect results and return as a single value
DefinitionMapReduce: A software framework to support processing of massive data sets across distributed computers
SAMPLE USE CASEBack end credit card processorNightly processing of millions of transactionsProcessing requires grouping, sorting, and merchant wide analysisCan’t just divide the over all list into equal parts as further analysis is necessaryTight processing window
DescriptionSimple, powerful programming modelLanguage independentCan run on a single machine, but shines for distributed computing and extreme datasetsBreak down the processing problem into embarrassingly parallel atomic operations
AlgorithmMap Phase Raw data analyzed and converted to name/value pairShuffle PhaseAll name/value pairs are sorted and grouped by their keysReduce PhaseAll values associated with a key are processed for results
MapReduce Walk ThroughGoal: Construct a word frequency of all the words in Wikipedia
Step 0: Split DataRaw input data divided into N partsN > number of machinesSplit must be context specific
Step 1: MapEach machine takes/receives a single slice of the raw input for mappingThe map function processes the input file and emits a name/value pair of the relevant data
STEP 2: ShuffleThe results of the map phase are sorted and grouped by the key in each key value pair.
STEP 3: ReduceResults from shuffle phase divided into M partsM >number of machinesEach machine runs a reduction method on a part of shuffle results.
MAPREDUCE BENEFITSScaleProcessing speed increases with number of machines involvedReliableLoss of any one machine doesn’t stop processingCostOften built from heterogeneous commodity grade computers
Use Case ResultsProcessing time of 1 million records Originally ~3 hoursReduced to 40 minutes on 5 computers
Other MapReduce installationsGoogle – Index buildingVisa – Transaction ProcessingFacebook – Facebook LexiconIntelligence CommunityYahoo/Google – Terabyte Sort10 billion, 100 byte recordsYahoo: 910 nodes, 206 secondsGoogle: ~1,000 nodes, 68 seconds
Questions

More Related Content

PPTX
MAP REDUCE SLIDESHARE
PPTX
Map Reduce introduction (google white papers)
PPTX
Introducing MapReduce Programming Framework
PDF
BREEZE CALPUFF Tech Sheet
PDF
5 Ways to Improve Your LiDAR Workflows
PPT
Programming model
PDF
Mrp Final
PDF
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
MAP REDUCE SLIDESHARE
Map Reduce introduction (google white papers)
Introducing MapReduce Programming Framework
BREEZE CALPUFF Tech Sheet
5 Ways to Improve Your LiDAR Workflows
Programming model
Mrp Final
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...

What's hot (20)

PPTX
Automating Engineering with FME
PPTX
FME in Support of DOT Data Culture
PDF
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
PPTX
Hadoop Mapreduce joins
PDF
LUMASS - a Spatial System Dynamics Modelling Framework
PPSX
Web Based GIS LeadGen Introduction
PDF
QGIS Tutorial 2
PPTX
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
PPTX
ArcGIS Bivariate Mapping Tools
PPTX
Repartition join in mapreduce
PPTX
Map reduce presentation
PDF
ArcGIS Extensions
PPTX
Utilities Industry Success Stories with FME
PPTX
Main map reduce
PPTX
From 2D Drawings to 3D Navigation networks built with FME
PDF
MDT7
PPTX
Fdi extreme features
PPTX
Hadoop MapReduce joins
PDF
8 Ways Utility Networks Can Meet Data Demands
PDF
Coordinate Systems in FME 101
Automating Engineering with FME
FME in Support of DOT Data Culture
MAXITHERMAL SOFTWARE FOR USB LOGGERS : WITH MARATHON PRODUCTS
Hadoop Mapreduce joins
LUMASS - a Spatial System Dynamics Modelling Framework
Web Based GIS LeadGen Introduction
QGIS Tutorial 2
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
ArcGIS Bivariate Mapping Tools
Repartition join in mapreduce
Map reduce presentation
ArcGIS Extensions
Utilities Industry Success Stories with FME
Main map reduce
From 2D Drawings to 3D Navigation networks built with FME
MDT7
Fdi extreme features
Hadoop MapReduce joins
8 Ways Utility Networks Can Meet Data Demands
Coordinate Systems in FME 101
Ad

Viewers also liked (20)

PDF
Energy For One World- The Book- updated version (December 2012)
PPTX
My favourite topic
PPTX
Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
PPTX
Can Water Heal? By Ms. Anu Mehta
PPTX
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...
PDF
EFOW Performance Platforms- Plan Frame
PPT
Presentation
PPTX
презентация пятница (1)
PDF
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
PDF
The LEGO Maturity & Capability Model Approach
PDF
Hadoop Security
PPTX
Employee onboarding and employee engagement in it organizations human resourc...
PPT
Kekurangan zat makanan
PPTX
презентация лето 13 группа
PDF
Banking Trends for 2016
PPTX
кожни систем човека
PPT
On Boarding Ppt
PPTX
Skeletni sistem čoveka
PDF
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
PDF
Digital Portfolios
Energy For One World- The Book- updated version (December 2012)
My favourite topic
Maintaining Body Balance Through Probiotics By Mr. Hussian Sabuwala
Can Water Heal? By Ms. Anu Mehta
Conservation of natural resources A Presentation By Mr Allah Dad Khan Former...
EFOW Performance Platforms- Plan Frame
Presentation
презентация пятница (1)
An Ecological View on Process Improvement: Some Thoughts for Improving Proces...
The LEGO Maturity & Capability Model Approach
Hadoop Security
Employee onboarding and employee engagement in it organizations human resourc...
Kekurangan zat makanan
презентация лето 13 группа
Banking Trends for 2016
кожни систем човека
On Boarding Ppt
Skeletni sistem čoveka
THE SCIENCE BEHIND EFFECTIVE FACEBOOK AD CAMPAIGNS
Digital Portfolios
Ad

Similar to Map Reduce (20)

PPT
Comparing Distributed Indexing To Mapreduce or Not?
PPTX
Map reduce presentation
PDF
Map reduce
PDF
2004 map reduce simplied data processing on large clusters (mapreduce)
ODP
Spatial Data Integrator - Software Presentation and Use Cases
PPT
Hadoop Map Reduce
PPT
Introduction To Map Reduce
PDF
Lecture 1 mapreduce
PPTX
Sawmill - Integrating R and Large Data Clouds
PDF
Mapreduce2008 cacm
PDF
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
PPT
Map Reduce
PPT
Download It
PDF
2 mapreduce-model-principles
PPTX
High Throughput Data Analysis
PDF
MapReduce-Notes.pdf
PPTX
MapReduce.pptx
PPTX
High Performance Computing on NYC Yellow Taxi Data Set
PPTX
MapReduce and Hadoop Introcuctory Presentation
PPTX
Big data & Hadoop
Comparing Distributed Indexing To Mapreduce or Not?
Map reduce presentation
Map reduce
2004 map reduce simplied data processing on large clusters (mapreduce)
Spatial Data Integrator - Software Presentation and Use Cases
Hadoop Map Reduce
Introduction To Map Reduce
Lecture 1 mapreduce
Sawmill - Integrating R and Large Data Clouds
Mapreduce2008 cacm
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
Map Reduce
Download It
2 mapreduce-model-principles
High Throughput Data Analysis
MapReduce-Notes.pdf
MapReduce.pptx
High Performance Computing on NYC Yellow Taxi Data Set
MapReduce and Hadoop Introcuctory Presentation
Big data & Hadoop

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25 Week I
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Map Reduce

  • 2. Functional ProgrammingMapApply function to transform elements of a list Return results as a listReduceApply function to all elements of a listCollect results and return as a single value
  • 3. DefinitionMapReduce: A software framework to support processing of massive data sets across distributed computers
  • 4. SAMPLE USE CASEBack end credit card processorNightly processing of millions of transactionsProcessing requires grouping, sorting, and merchant wide analysisCan’t just divide the over all list into equal parts as further analysis is necessaryTight processing window
  • 5. DescriptionSimple, powerful programming modelLanguage independentCan run on a single machine, but shines for distributed computing and extreme datasetsBreak down the processing problem into embarrassingly parallel atomic operations
  • 6. AlgorithmMap Phase Raw data analyzed and converted to name/value pairShuffle PhaseAll name/value pairs are sorted and grouped by their keysReduce PhaseAll values associated with a key are processed for results
  • 7. MapReduce Walk ThroughGoal: Construct a word frequency of all the words in Wikipedia
  • 8. Step 0: Split DataRaw input data divided into N partsN > number of machinesSplit must be context specific
  • 9. Step 1: MapEach machine takes/receives a single slice of the raw input for mappingThe map function processes the input file and emits a name/value pair of the relevant data
  • 10. STEP 2: ShuffleThe results of the map phase are sorted and grouped by the key in each key value pair.
  • 11. STEP 3: ReduceResults from shuffle phase divided into M partsM >number of machinesEach machine runs a reduction method on a part of shuffle results.
  • 12. MAPREDUCE BENEFITSScaleProcessing speed increases with number of machines involvedReliableLoss of any one machine doesn’t stop processingCostOften built from heterogeneous commodity grade computers
  • 13. Use Case ResultsProcessing time of 1 million records Originally ~3 hoursReduced to 40 minutes on 5 computers
  • 14. Other MapReduce installationsGoogle – Index buildingVisa – Transaction ProcessingFacebook – Facebook LexiconIntelligence CommunityYahoo/Google – Terabyte Sort10 billion, 100 byte recordsYahoo: 910 nodes, 206 secondsGoogle: ~1,000 nodes, 68 seconds