SlideShare a Scribd company logo
Dr. A. Aziz Bhatti
CS-439: Cloud Computing
1
Introduction to MapReduce
Dr. A. Aziz Bhatti
Plan for today
 Introduction
 Census example
 MapReduce architecture
 Data flow
 Execution flow
 Fault tolerance etc.
2
NEXT
Dr. A. Aziz Bhatti
Example: National census
3
http://www.census.gov/2010census/pdf/2010_Questionnaire_Info.pdf
Dr. A. Aziz Bhatti
Analogy: National census
 Suppose we have
10,000 employees,
whose job is to collate
census forms and
to determine how
many people live in
each city
 How would you
organize this task?
4
http://www.census.gov/2010census/pdf/2010_Questionnaire_Info.pdf
Dr. A. Aziz Bhatti
National census "data flow"
5
Dr. A. Aziz Bhatti
Making things more complicated
 Suppose people take vacations, get sick, work
at different rates
 Suppose some forms are incorrectly filled out
and require corrections or need to be thrown
away
 What if the supervisor gets sick?
 How big should the stacks be?
 How do we monitor progress?
 ...
6
Dr. A. Aziz Bhatti
A bit of introspection
 What is the main challenge?
 Are the individual tasks complicated?
 If not, what makes this so challenging?
 How resilient is our solution?
 How well does it balance work across
employees?
 What factors affect this?
 How general is the set of techniques?
7
Dr. A. Aziz Bhatti
I don't want to deal with all this!!!
 Wouldn't it be nice if there were some system
that took care of all these details for you?
 Ideally, you'd just tell the system what needs
to be done
 That's the MapReduce framework.
8
Dr. A. Aziz Bhatti
Data Partitioning for Map Reduce
 We decided that the best scheme for
parallelism in doing a census was:
 Given n workers, divide the workload into k*n tasks
 Each worker starts a task, reports back (with stack of cards) to a
coordinator when done
 Receives a new task if there are any left
 If anyone is a long-time straggler, may reassign the task to someone
else
 Take the stacks, sort them by region / city / zip or whatever
we want to count
 Assign counting of the regions in parallel
9
Dr. A. Aziz Bhatti
Abstracting into a digital data flow
10
Filter+Stack
Worker
Filter+Stack
Worker
Filter+Stack
Worker
Filter+Stack
Worker
CountStack
Worker
CountStack
Worker
CountStack
Worker
CountStack
Worker
CountStack
Worker
blue: 4k
green: 4k
cyan: 3k
gray: 1k
orange: 4k
Dr. A. Aziz Bhatti
Abstracting once more
 There are two kinds of workers:
 Those that take input data items and produce output items
for the “stacks”
 Those that take the stacks and aggregate the results to
produce outputs on a per-stack basis
 We’ll call these:
 map: takes (item_key, value), produces one or more
(stack_key, value’) pairs
 reduce: takes (stack_key, {set of value’}), produces one or
more output results – typically (stack_key, agg_value)
11
We will refer to this key
as the reduce key
Dr. A. Aziz Bhatti
Why MapReduce?
 Scenario:
 You have a huge amount of data, e.g., all the Google
searches of the last three years
 You would like to perform a computation on the data, e.g.,
find out which search terms were the most popular
 How would you do it?
 Analogy to the census example:
 The computation isn't necessarily difficult, but parallelizing
and distributing it, as well as handling faults, is challenging
 Idea: A programming language!
 Write a simple program to express the (simple) computation,
and let the language runtime do all the hard work
12
Dr. A. Aziz Bhatti
Plan for today
 Introduction
 Census example
 MapReduce architecture
 Data flow
 Execution flow
 Fault tolerance etc.
13
NEXT
Dr. A. Aziz Bhatti
What is MapReduce?
 A famous distributed programming model
 In many circles, considered the key building block for
much of Google’s data analysis
 A programming language built on it: Sawzall,
http://labs.google.com/papers/sawzall.html
 … Sawzall has become one of the most widely used programming languages at
Google. … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were
32,580 Sawzall jobs launched, using an average of 220 machines each. While running
those jobs, 18,636 failures occurred (application failure, network outage, system
crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of
3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).
 Other similar languages: Yahoo’s Pig Latin and Pig; Microsoft’s
Dryad
 Cloned in open source: Hadoop,
http://hadoop.apache.org/
14
Dr. A. Aziz Bhatti
What is MapReduce?
 A famous distributed programming model
 In many circles, considered the key building
block for much of Google’s data analysis
 Example of usage rates for one language that
compiled to MapReduce, in 2010:
 … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were
32,580 Sawzall jobs launched, using an average of 220 machines each.
While running those jobs, 18,636 failures occurred (application failure,
network outage, system crash, etc.) that triggered rerunning some portion
of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and
wrote 9.9x1012 bytes (9.3TB).
 Other similar languages: Yahoo’s Pig Latin and Pig;
Microsoft’s Dryad
 Many “successors” now – we’ll talk about them
later
15
Dr. A. Aziz Bhatti
The MapReduce programming model
 Simple distributed functional programming primitives
 Modeled after Lisp primitives:
 map (apply function to all items in a collection) and
 reduce (apply function to set of items with a common key)
 We start with:
 A user-defined function to be applied to all data,
map: (key,value)  (key, value)
 Another user-specified operation
reduce: (key, {set of values})  result
 A set of n nodes, each with data
 All nodes run map on all of their data, producing new
data with keys
 This data is collected by key, then shuffled, and finally reduced
 Dataflow is through temp files on GFS
16
Dr. A. Aziz Bhatti
Simple example: Word count
 Goal: Given a set of documents, count how
often each word occurs
 Input: Key-value pairs (document:lineNumber, text)
 Output: Key-value pairs (word, #occurrences)
 What should be the intermediate key-value pairs?
map(String key, String value) {
// key: document name, line no
// value: contents of line
}
reduce(String key, Iterator values) {
}
for each word w in value:
emit(w, "1")
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
emit(key, result)
17
Dr. A. Aziz Bhatti
Simple example: Word count
18
Mapper
(1-2)
Mapper
(3-4)
Mapper
(5-6)
Mapper
(7-8)
Reducer
(A-G)
Reducer
(H-N)
Reducer
(O-U)
Reducer
(V-Z)
(1, the apple)
(2, is an apple)
(3, not an orange)
(4, because the)
(5, orange)
(6, unlike the apple)
(7, is orange)
(8, not green)
(the, 1)
(apple, 1)
(is, 1)
(apple, 1)
(an, 1)
(not, 1)
(orange, 1)
(an, 1)
(because, 1)
(the, 1)
(orange, 1)
(unlike, 1)
(apple, 1)
(the, 1)
(is, 1)
(orange, 1)
(not, 1)
(green, 1)
(apple, 3)
(an, 2)
(because, 1)
(green, 1)
(is, 2)
(not, 2)
(orange, 3)
(the, 3)
(unlike, 1)
(apple, {1, 1, 1})
(an, {1, 1})
(because, {1})
(green, {1})
(is, {1, 1})
(not, {1, 1})
(orange, {1, 1, 1})
(the, {1, 1, 1})
(unlike, {1})
Each mapper
receives some
of the KV-pairs
as input
The mappers
process the
KV-pairs
one by one
Each KV-pair output
by the mapper is sent
to the reducer that is
responsible for it
The reducers
sort their input
by key
and group it
The reducers
process their
input one group
at a time
1 2 3 4 5
Key range the node
is responsible for
Dr. A. Aziz Bhatti
MapReduce dataflow
19
Mapper
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Reducer
Inputdata
Outputdata
"The Shuffle"
Intermediate
(key,value) pairs
What is meant by a 'dataflow'?
What makes this so scalable?
Dr. A. Aziz Bhatti
More examples
 Distributed grep – all lines matching a pattern
 Map: filter by pattern
 Reduce: output set
 Count URL access frequency
 Map: output each URL as key, with count 1
 Reduce: sum the counts
 Reverse web-link graph
 Map: output (target,source) pairs when link to target
found in souce
 Reduce: concatenates values and emits (target,list(source))
 Inverted index
 Map: Emits (word,documentID)
 Reduce: Combines these into (word,list(documentID))
20
Dr. A. Aziz Bhatti
Common mistakes to avoid
 Mapper and reducer should be stateless
 Don't use static variables - after map +
reduce return, they should remember
nothing about the processed data!
 Reason: No guarantees about which
key-value pairs will be processed by
which workers!
 Don't try to do your own I/O!
 Don't try to read from, or write to,
files in the file system
 The MapReduce framework does all
the I/O for you:
 All the incoming data will be fed as arguments to map and reduce
 Any data your functions produce should be output via emit
21
HashMap h = new HashMap();
map(key, value) {
if (h.contains(key)) {
h.add(key,value);
emit(key, "X");
}
} Wrong!
map(key, value) {
File foo =
new File("xyz.txt");
while (true) {
s = foo.readLine();
...
}
} Wrong!
Dr. A. Aziz Bhatti
More common mistakes to avoid
 Mapper must not map too much data to the
same key
 In particular, don't map everything to the same key!!
 Otherwise the reduce worker will be overwhelmed!
 It's okay if some reduce workers have more work than others
 Example: In WordCount, the reduce worker that works on the key 'and'
has a lot more work than the reduce worker that works on 'syzygy'.
22
map(key, value) {
emit("FOO", key + " " + value);
}
reduce(key, value[]) {
/* do some computation on
all the values */
}
Wrong!
Dr. A. Aziz Bhatti
Designing MapReduce algorithms
 Key decision: What should be done by map,
and what by reduce?
 map can do something to each individual key-value pair, but
it can't look at other key-value pairs
 Example: Filtering out key-value pairs we don't need
 map can emit more than one intermediate key-value pair for
each incoming key-value pair
 Example: Incoming data is text, map produces (word,1) for each word
 reduce can aggregate data; it can look at multiple values, as
long as map has mapped them to the same (intermediate) key
 Example: Count the number of words, add up the total cost, ...
 Need to get the intermediate format right!
 If reduce needs to look at several values together, map
must emit them using the same key!
23
Dr. A. Aziz Bhatti
More details on the MapReduce data flow
24
Data partitions
by key
Map computation
partitions
Reduce
computation
partitions
Redistribution
by output’s key
("shuffle")
Coordinator
(Default MapReduce
uses Filesystem)
Dr. A. Aziz Bhatti
Some additional details
 To make this work, we need a few more parts…
 The file system (distributed across all nodes):
 Stores the inputs, outputs, and temporary results
 The driver program (executes on one node):
 Specifies where to find the inputs, the outputs
 Specifies what mapper and reducer to use
 Can customize behavior of the execution
 The runtime system (controls nodes):
 Supervises the execution of tasks
 Esp. JobTracker
25
Dr. A. Aziz Bhatti
Some details
 Fewer computation partitions than data partitions
 All data is accessible via a distributed filesystem with
replication
 Worker nodes produce data in key order (makes it easy to
merge)
 The master is responsible for scheduling, keeping all
nodes busy
 The master knows how many data partitions there are,
which have completed – atomic commits to disk
 Locality: Master tries to do work on nodes that
have replicas of the data
 Master can deal with stragglers (slow machines) by
re-executing their tasks somewhere else
26
Dr. A. Aziz Bhatti
What if a worker crashes?
 We rely on the file system being shared
across all the nodes
 Two types of (crash) faults:
 Node wrote its output and then crashed
 Here, the file system is likely to have a copy of the complete output
 Node crashed before finishing its output
 The JobTracker sees that the job isn’t making progress, and restarts
the job elsewhere on the system
 (Of course, we have fewer nodes to do
work…)
 But what if the master crashes?
27
Dr. A. Aziz Bhatti
Other challenges
 Locality
 Try to schedule map task on machine that already has data
 Task granularity
 How many map tasks? How many reduce tasks?
 Dealing with stragglers
 Schedule some backup tasks
 Saving bandwidth
 E.g., with combiners
 Handling bad records
 "Last gasp" packet with current sequence number
28
Dr. A. Aziz Bhatti
Scale and MapReduce
 From a particular Google paper on a language built
over MapReduce:
 … Sawzall has become one of the most widely used programming
languages at Google. …
[O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there
were 32,580 Sawzall jobs launched, using an average of 220
machines each.
While running those jobs, 18,636 failures occurred (application
failure, network outage, system crash, etc.) that triggered
rerunning some portion of the job. The jobs read a total of
3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).
 We will see some of MapReduce-based languages,
like Pig Latin, later in the semester
29
Source:InterpretingtheData:ParallelAnalysiswithSawzall(RobPike,SeanDorward,RobertGriesemer,SeanQuinlan)
Dr. A. Aziz Bhatti
Stay tuned
Next time you will learn about:
Programming in MapReduce (cont.)
30

More Related Content

PPTX
Introduction to Map Reduce
PPT
Map Reduce
PPT
Hadoop Map Reduce
PPTX
Map reduce presentation
PPT
Map Reduce
PDF
Map Reduce
PDF
Mapreduce by examples
PDF
Introduction to map reduce
Introduction to Map Reduce
Map Reduce
Hadoop Map Reduce
Map reduce presentation
Map Reduce
Map Reduce
Mapreduce by examples
Introduction to map reduce

What's hot (19)

PDF
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
PPTX
Map Reduce
PDF
Introduction to Map-Reduce
PPTX
Introduction to MapReduce
PDF
Large Scale Data Analysis with Map/Reduce, part I
PPT
An Introduction To Map-Reduce
PPT
Map Reduce
PPTX
Map reduce in Hadoop
PPTX
Map reduce and Hadoop on windows
PDF
Hadoop Map Reduce Arch
PPTX
MapReduce basic
PPT
Map Reduce introduction
PPT
Map Reduce
PPT
Introduction To Map Reduce
PPTX
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
PDF
Relational Algebra and MapReduce
PDF
MapReduce Algorithm Design
PPTX
Analysing of big data using map reduce
PPTX
Join optimization in hive
[Harvard CS264] 08b - MapReduce and Hadoop (Zak Stone, Harvard)
Map Reduce
Introduction to Map-Reduce
Introduction to MapReduce
Large Scale Data Analysis with Map/Reduce, part I
An Introduction To Map-Reduce
Map Reduce
Map reduce in Hadoop
Map reduce and Hadoop on windows
Hadoop Map Reduce Arch
MapReduce basic
Map Reduce introduction
Map Reduce
Introduction To Map Reduce
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Relational Algebra and MapReduce
MapReduce Algorithm Design
Analysing of big data using map reduce
Join optimization in hive
Ad

Viewers also liked (13)

PDF
An Introduction to MapReduce
PPTX
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
PPTX
Introduction to MapReduce and Hadoop
PDF
PDF
Introduction to MapReduce using Disco
PPT
Hadoop ppt2
PPTX
MapReduce Paradigm
PPTX
Apache hadoop technology : Beginners
PPT
Hadoop Technology
PPT
Hadoop Real Life Use Case & MapReduce Details
PPT
Hadoop MapReduce Fundamentals
PPTX
MapReduce in Simple Terms
PPTX
Introduction to YARN and MapReduce 2
An Introduction to MapReduce
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce and Hadoop
Introduction to MapReduce using Disco
Hadoop ppt2
MapReduce Paradigm
Apache hadoop technology : Beginners
Hadoop Technology
Hadoop Real Life Use Case & MapReduce Details
Hadoop MapReduce Fundamentals
MapReduce in Simple Terms
Introduction to YARN and MapReduce 2
Ad

Similar to Introduction to MapReduce (20)

PDF
An Introduction to MapReduce
PPTX
This gives a brief detail about big data
PPTX
IOE MODULE 6.pptx
PPTX
introduction to Complete Map and Reduce Framework
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
PPTX
Big Data.pptx
PPTX
Streaming Python on Hadoop
PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
PDF
2 mapreduce-model-principles
PPTX
Map Reduced and Data Mining Introductory Presentation
PDF
Scalable Algorithm Design with MapReduce
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Map reducefunnyslide
PPTX
Mapreduce script
PPTX
Map reduce helpful for college students.pptx
PDF
Map Reducec and Spark big data visualization and analytics
PPTX
MapReduce : Simplified Data Processing on Large Clusters
PDF
Lecture 1 mapreduce
PDF
2004 map reduce simplied data processing on large clusters (mapreduce)
PDF
Map reduce
An Introduction to MapReduce
This gives a brief detail about big data
IOE MODULE 6.pptx
introduction to Complete Map and Reduce Framework
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
Big Data.pptx
Streaming Python on Hadoop
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
2 mapreduce-model-principles
Map Reduced and Data Mining Introductory Presentation
Scalable Algorithm Design with MapReduce
Embarrassingly/Delightfully Parallel Problems
Map reducefunnyslide
Mapreduce script
Map reduce helpful for college students.pptx
Map Reducec and Spark big data visualization and analytics
MapReduce : Simplified Data Processing on Large Clusters
Lecture 1 mapreduce
2004 map reduce simplied data processing on large clusters (mapreduce)
Map reduce

More from Hassan A-j (6)

PPTX
Warehouse scale computer
PPTX
IOS Swift Language 4th tutorial
PPTX
IOS Swift Language 3rd tutorial
PPTX
IOS Swift language 2nd tutorial
PPTX
IOS Swift language 1st Tutorial
PPTX
Software Process Models
Warehouse scale computer
IOS Swift Language 4th tutorial
IOS Swift Language 3rd tutorial
IOS Swift language 2nd tutorial
IOS Swift language 1st Tutorial
Software Process Models

Recently uploaded (20)

PPTX
international classification of diseases ICD-10 review PPT.pptx
PPTX
innovation process that make everything different.pptx
PPTX
Internet___Basics___Styled_ presentation
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
QR Codes Qr codecodecodecodecocodedecodecode
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PPTX
artificial intelligence overview of it and more
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPT
tcp ip networks nd ip layering assotred slides
international classification of diseases ICD-10 review PPT.pptx
innovation process that make everything different.pptx
Internet___Basics___Styled_ presentation
SAP Ariba Sourcing PPT for learning material
QR Codes Qr codecodecodecodecocodedecodecode
Job_Card_System_Styled_lorem_ipsum_.pptx
artificial intelligence overview of it and more
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
Decoding a Decade: 10 Years of Applied CTI Discipline
Design_with_Watersergyerge45hrbgre4top (1).ppt
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Paper PDF World Game (s) Great Redesign.pdf
Unit-1 introduction to cyber security discuss about how to secure a system
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Module 1 - Cyber Law and Ethics 101.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
tcp ip networks nd ip layering assotred slides

Introduction to MapReduce

  • 1. Dr. A. Aziz Bhatti CS-439: Cloud Computing 1 Introduction to MapReduce
  • 2. Dr. A. Aziz Bhatti Plan for today  Introduction  Census example  MapReduce architecture  Data flow  Execution flow  Fault tolerance etc. 2 NEXT
  • 3. Dr. A. Aziz Bhatti Example: National census 3 http://www.census.gov/2010census/pdf/2010_Questionnaire_Info.pdf
  • 4. Dr. A. Aziz Bhatti Analogy: National census  Suppose we have 10,000 employees, whose job is to collate census forms and to determine how many people live in each city  How would you organize this task? 4 http://www.census.gov/2010census/pdf/2010_Questionnaire_Info.pdf
  • 5. Dr. A. Aziz Bhatti National census "data flow" 5
  • 6. Dr. A. Aziz Bhatti Making things more complicated  Suppose people take vacations, get sick, work at different rates  Suppose some forms are incorrectly filled out and require corrections or need to be thrown away  What if the supervisor gets sick?  How big should the stacks be?  How do we monitor progress?  ... 6
  • 7. Dr. A. Aziz Bhatti A bit of introspection  What is the main challenge?  Are the individual tasks complicated?  If not, what makes this so challenging?  How resilient is our solution?  How well does it balance work across employees?  What factors affect this?  How general is the set of techniques? 7
  • 8. Dr. A. Aziz Bhatti I don't want to deal with all this!!!  Wouldn't it be nice if there were some system that took care of all these details for you?  Ideally, you'd just tell the system what needs to be done  That's the MapReduce framework. 8
  • 9. Dr. A. Aziz Bhatti Data Partitioning for Map Reduce  We decided that the best scheme for parallelism in doing a census was:  Given n workers, divide the workload into k*n tasks  Each worker starts a task, reports back (with stack of cards) to a coordinator when done  Receives a new task if there are any left  If anyone is a long-time straggler, may reassign the task to someone else  Take the stacks, sort them by region / city / zip or whatever we want to count  Assign counting of the regions in parallel 9
  • 10. Dr. A. Aziz Bhatti Abstracting into a digital data flow 10 Filter+Stack Worker Filter+Stack Worker Filter+Stack Worker Filter+Stack Worker CountStack Worker CountStack Worker CountStack Worker CountStack Worker CountStack Worker blue: 4k green: 4k cyan: 3k gray: 1k orange: 4k
  • 11. Dr. A. Aziz Bhatti Abstracting once more  There are two kinds of workers:  Those that take input data items and produce output items for the “stacks”  Those that take the stacks and aggregate the results to produce outputs on a per-stack basis  We’ll call these:  map: takes (item_key, value), produces one or more (stack_key, value’) pairs  reduce: takes (stack_key, {set of value’}), produces one or more output results – typically (stack_key, agg_value) 11 We will refer to this key as the reduce key
  • 12. Dr. A. Aziz Bhatti Why MapReduce?  Scenario:  You have a huge amount of data, e.g., all the Google searches of the last three years  You would like to perform a computation on the data, e.g., find out which search terms were the most popular  How would you do it?  Analogy to the census example:  The computation isn't necessarily difficult, but parallelizing and distributing it, as well as handling faults, is challenging  Idea: A programming language!  Write a simple program to express the (simple) computation, and let the language runtime do all the hard work 12
  • 13. Dr. A. Aziz Bhatti Plan for today  Introduction  Census example  MapReduce architecture  Data flow  Execution flow  Fault tolerance etc. 13 NEXT
  • 14. Dr. A. Aziz Bhatti What is MapReduce?  A famous distributed programming model  In many circles, considered the key building block for much of Google’s data analysis  A programming language built on it: Sawzall, http://labs.google.com/papers/sawzall.html  … Sawzall has become one of the most widely used programming languages at Google. … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).  Other similar languages: Yahoo’s Pig Latin and Pig; Microsoft’s Dryad  Cloned in open source: Hadoop, http://hadoop.apache.org/ 14
  • 15. Dr. A. Aziz Bhatti What is MapReduce?  A famous distributed programming model  In many circles, considered the key building block for much of Google’s data analysis  Example of usage rates for one language that compiled to MapReduce, in 2010:  … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).  Other similar languages: Yahoo’s Pig Latin and Pig; Microsoft’s Dryad  Many “successors” now – we’ll talk about them later 15
  • 16. Dr. A. Aziz Bhatti The MapReduce programming model  Simple distributed functional programming primitives  Modeled after Lisp primitives:  map (apply function to all items in a collection) and  reduce (apply function to set of items with a common key)  We start with:  A user-defined function to be applied to all data, map: (key,value)  (key, value)  Another user-specified operation reduce: (key, {set of values})  result  A set of n nodes, each with data  All nodes run map on all of their data, producing new data with keys  This data is collected by key, then shuffled, and finally reduced  Dataflow is through temp files on GFS 16
  • 17. Dr. A. Aziz Bhatti Simple example: Word count  Goal: Given a set of documents, count how often each word occurs  Input: Key-value pairs (document:lineNumber, text)  Output: Key-value pairs (word, #occurrences)  What should be the intermediate key-value pairs? map(String key, String value) { // key: document name, line no // value: contents of line } reduce(String key, Iterator values) { } for each word w in value: emit(w, "1") // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); emit(key, result) 17
  • 18. Dr. A. Aziz Bhatti Simple example: Word count 18 Mapper (1-2) Mapper (3-4) Mapper (5-6) Mapper (7-8) Reducer (A-G) Reducer (H-N) Reducer (O-U) Reducer (V-Z) (1, the apple) (2, is an apple) (3, not an orange) (4, because the) (5, orange) (6, unlike the apple) (7, is orange) (8, not green) (the, 1) (apple, 1) (is, 1) (apple, 1) (an, 1) (not, 1) (orange, 1) (an, 1) (because, 1) (the, 1) (orange, 1) (unlike, 1) (apple, 1) (the, 1) (is, 1) (orange, 1) (not, 1) (green, 1) (apple, 3) (an, 2) (because, 1) (green, 1) (is, 2) (not, 2) (orange, 3) (the, 3) (unlike, 1) (apple, {1, 1, 1}) (an, {1, 1}) (because, {1}) (green, {1}) (is, {1, 1}) (not, {1, 1}) (orange, {1, 1, 1}) (the, {1, 1, 1}) (unlike, {1}) Each mapper receives some of the KV-pairs as input The mappers process the KV-pairs one by one Each KV-pair output by the mapper is sent to the reducer that is responsible for it The reducers sort their input by key and group it The reducers process their input one group at a time 1 2 3 4 5 Key range the node is responsible for
  • 19. Dr. A. Aziz Bhatti MapReduce dataflow 19 Mapper Mapper Mapper Mapper Reducer Reducer Reducer Reducer Inputdata Outputdata "The Shuffle" Intermediate (key,value) pairs What is meant by a 'dataflow'? What makes this so scalable?
  • 20. Dr. A. Aziz Bhatti More examples  Distributed grep – all lines matching a pattern  Map: filter by pattern  Reduce: output set  Count URL access frequency  Map: output each URL as key, with count 1  Reduce: sum the counts  Reverse web-link graph  Map: output (target,source) pairs when link to target found in souce  Reduce: concatenates values and emits (target,list(source))  Inverted index  Map: Emits (word,documentID)  Reduce: Combines these into (word,list(documentID)) 20
  • 21. Dr. A. Aziz Bhatti Common mistakes to avoid  Mapper and reducer should be stateless  Don't use static variables - after map + reduce return, they should remember nothing about the processed data!  Reason: No guarantees about which key-value pairs will be processed by which workers!  Don't try to do your own I/O!  Don't try to read from, or write to, files in the file system  The MapReduce framework does all the I/O for you:  All the incoming data will be fed as arguments to map and reduce  Any data your functions produce should be output via emit 21 HashMap h = new HashMap(); map(key, value) { if (h.contains(key)) { h.add(key,value); emit(key, "X"); } } Wrong! map(key, value) { File foo = new File("xyz.txt"); while (true) { s = foo.readLine(); ... } } Wrong!
  • 22. Dr. A. Aziz Bhatti More common mistakes to avoid  Mapper must not map too much data to the same key  In particular, don't map everything to the same key!!  Otherwise the reduce worker will be overwhelmed!  It's okay if some reduce workers have more work than others  Example: In WordCount, the reduce worker that works on the key 'and' has a lot more work than the reduce worker that works on 'syzygy'. 22 map(key, value) { emit("FOO", key + " " + value); } reduce(key, value[]) { /* do some computation on all the values */ } Wrong!
  • 23. Dr. A. Aziz Bhatti Designing MapReduce algorithms  Key decision: What should be done by map, and what by reduce?  map can do something to each individual key-value pair, but it can't look at other key-value pairs  Example: Filtering out key-value pairs we don't need  map can emit more than one intermediate key-value pair for each incoming key-value pair  Example: Incoming data is text, map produces (word,1) for each word  reduce can aggregate data; it can look at multiple values, as long as map has mapped them to the same (intermediate) key  Example: Count the number of words, add up the total cost, ...  Need to get the intermediate format right!  If reduce needs to look at several values together, map must emit them using the same key! 23
  • 24. Dr. A. Aziz Bhatti More details on the MapReduce data flow 24 Data partitions by key Map computation partitions Reduce computation partitions Redistribution by output’s key ("shuffle") Coordinator (Default MapReduce uses Filesystem)
  • 25. Dr. A. Aziz Bhatti Some additional details  To make this work, we need a few more parts…  The file system (distributed across all nodes):  Stores the inputs, outputs, and temporary results  The driver program (executes on one node):  Specifies where to find the inputs, the outputs  Specifies what mapper and reducer to use  Can customize behavior of the execution  The runtime system (controls nodes):  Supervises the execution of tasks  Esp. JobTracker 25
  • 26. Dr. A. Aziz Bhatti Some details  Fewer computation partitions than data partitions  All data is accessible via a distributed filesystem with replication  Worker nodes produce data in key order (makes it easy to merge)  The master is responsible for scheduling, keeping all nodes busy  The master knows how many data partitions there are, which have completed – atomic commits to disk  Locality: Master tries to do work on nodes that have replicas of the data  Master can deal with stragglers (slow machines) by re-executing their tasks somewhere else 26
  • 27. Dr. A. Aziz Bhatti What if a worker crashes?  We rely on the file system being shared across all the nodes  Two types of (crash) faults:  Node wrote its output and then crashed  Here, the file system is likely to have a copy of the complete output  Node crashed before finishing its output  The JobTracker sees that the job isn’t making progress, and restarts the job elsewhere on the system  (Of course, we have fewer nodes to do work…)  But what if the master crashes? 27
  • 28. Dr. A. Aziz Bhatti Other challenges  Locality  Try to schedule map task on machine that already has data  Task granularity  How many map tasks? How many reduce tasks?  Dealing with stragglers  Schedule some backup tasks  Saving bandwidth  E.g., with combiners  Handling bad records  "Last gasp" packet with current sequence number 28
  • 29. Dr. A. Aziz Bhatti Scale and MapReduce  From a particular Google paper on a language built over MapReduce:  … Sawzall has become one of the most widely used programming languages at Google. … [O]n one dedicated Workqueue cluster with 1500 Xeon CPUs, there were 32,580 Sawzall jobs launched, using an average of 220 machines each. While running those jobs, 18,636 failures occurred (application failure, network outage, system crash, etc.) that triggered rerunning some portion of the job. The jobs read a total of 3.2x1015 bytes of data (2.8PB) and wrote 9.9x1012 bytes (9.3TB).  We will see some of MapReduce-based languages, like Pig Latin, later in the semester 29 Source:InterpretingtheData:ParallelAnalysiswithSawzall(RobPike,SeanDorward,RobertGriesemer,SeanQuinlan)
  • 30. Dr. A. Aziz Bhatti Stay tuned Next time you will learn about: Programming in MapReduce (cont.) 30