SlideShare a Scribd company logo
9
Most read
1
Cloud Computing :
MapReduce - Tutorial
Prof. Soumya K Ghosh
Department of Computer Science and Engineering
IIT KHARAGPUR
Introduction
• MapReduce: programming model developed at Google
• Objective:
– Implement large scale search
– Text processing on massively scalable web data stored using BigTable and GFS distributed file
system
• Designed for processing and generating large volumes of data via massively parallel
computations, utilizing tens of thousands of processors at a time
• Fault tolerant: ensure progress of computation even if processors and networks fail
• Example:
– Hadoop: open source implementation of MapReduce (developed at Yahoo!)
– Available on pre-packaged AMIs on Amazon EC2 cloud platform
9/11/2017 2
MapReduce Model
9/11/2017 3
• Parallel programming abstraction
• Used by many different parallel applications which carry out large-scale
computation involving thousands of processors
• Leverages a common underlying fault-tolerant implementation
• Two phases of MapReduce:
– Map operation
– Reduce operation
• A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are
assigned to work on the problem
• The computation is coordinated by a single master process
MapReduce Model Contd…
9/11/2017 4
• Map phase:
– Each mapper reads approximately 1/M of the input from the global file
system, using locations given by the master
– Map operation consists of transforming one set of key-value pairs to
another:
– Each mapper writes computation results in one file per reducer
– Files are sorted by a key and stored to the local file system
– The master keeps track of the location of these files
MapReduce Model Contd…
9/11/2017 5
• Reduce phase:
– The master informs the reducers where the partial computations have been stored
on local files of respective mappers
– Reducers make remote procedure call requests to the mappers to fetch the files
– Each reducer groups the results of the map step using the same key and performs a
function f on the list of values that correspond to these key value:
– Final results are written back to the GFS file system
MapReduce: Example
9/11/2017 6
• 3 mappers; 2 reducers
• Map function:
• Reduce function:
Problem-1
9/11/2017 7
In a MapReduce framework consider the HDFS block size is 64 MB.
We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will
be created by Hadoop framework?
Problem-2
9/11/2017 8
Write the pseudo-codes (for map and reduce functions) for calculating
the average of a set of integers in MapReduce.
Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and
reduce outputs.
Problem-3
9/11/2017 9
Compute total and average salary of organization XYZ and group by
based on gender (male or female) using MapReduce. The input is as
follows
Name, Gender, Salary
John, M, 10,000
Martha, F, 15,000
----
Problem-4
9/11/2017 10
Write the Map and Reduce functions (pseudo-codes) for the following Word
Length Categorization problem under MapReduce model.
Word Length Categorization: Given a text paragraph (containing only words),
categorize each word into following categories. Output the frequency of
occurrence of words in each category.
Categories:
tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
11

More Related Content

PPTX
Fundamental Cloud Security
PDF
Juniper Chassis Cluster Configuration with SRX-1500s
PDF
Providing Local DNS with pfSense - pfSense Hangout August 2016
PPTX
Types of computer networks
PPT
Cloud Computing Security Challenges
PPTX
VPN, Its Types,VPN Protocols,Configuration and Benefits
PDF
Cs8591 Computer Networks
Fundamental Cloud Security
Juniper Chassis Cluster Configuration with SRX-1500s
Providing Local DNS with pfSense - pfSense Hangout August 2016
Types of computer networks
Cloud Computing Security Challenges
VPN, Its Types,VPN Protocols,Configuration and Benefits
Cs8591 Computer Networks

What's hot (20)

PDF
Network Address Translation (NAT)
PPT
Packet tracer
PDF
Advanced computer network lab manual (practicals in Cisco Packet tracer)
PDF
Palo Alto VM-100 Configuration Lab
PDF
VLAN and its implementation
PPTX
BGP Update Source
PPT
PDF
BGP Dynamic Routing and Neutron
PDF
Igrp
PPTX
OpManager Technical Overview
PPTX
Border Gateway Protocol (BGP)
PPTX
COMPUTER NETWORK_OSI & TCP/IP
PDF
Aruba Instant 6.4.0.2-4.1 Command Line Interface Reference Guide
PPTX
AAA Best Practices
PPTX
Nginx Deep Dive Kubernetes Ingress
PDF
PDF
DNS (Domain Name System)
PPT
Computer networks--networking hardware
PPTX
Windows Network concepts
Network Address Translation (NAT)
Packet tracer
Advanced computer network lab manual (practicals in Cisco Packet tracer)
Palo Alto VM-100 Configuration Lab
VLAN and its implementation
BGP Update Source
BGP Dynamic Routing and Neutron
Igrp
OpManager Technical Overview
Border Gateway Protocol (BGP)
COMPUTER NETWORK_OSI & TCP/IP
Aruba Instant 6.4.0.2-4.1 Command Line Interface Reference Guide
AAA Best Practices
Nginx Deep Dive Kubernetes Ingress
DNS (Domain Name System)
Computer networks--networking hardware
Windows Network concepts
Ad

Similar to Mod05lec23(map reduce tutorial) (20)

PPTX
This gives a brief detail about big data
PPT
Map Reduce
PDF
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
PPTX
Big Data.pptx
PPTX
introduction to Complete Map and Reduce Framework
PDF
MapReduce
PPTX
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
PPTX
Mapreduce is for Hadoop Ecosystem in Data Science
PDF
Report Hadoop Map Reduce
PPTX
Introduction to MapReduce
PDF
Mapreduce2008 cacm
PPTX
IOE MODULE 6.pptx
PPTX
Map reduce helpful for college students.pptx
PDF
Mapreduce by examples
PPTX
Lecture2-MapReduce - An introductory lecture to Map Reduce
PPTX
MapReduce : Simplified Data Processing on Large Clusters
PDF
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
PDF
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
PDF
An Introduction to MapReduce
This gives a brief detail about big data
Map Reduce
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
Big Data.pptx
introduction to Complete Map and Reduce Framework
MapReduce
COMPLETE MAP AND REDUCE FRAMEWORK INTRODUCTION
Mapreduce is for Hadoop Ecosystem in Data Science
Report Hadoop Map Reduce
Introduction to MapReduce
Mapreduce2008 cacm
IOE MODULE 6.pptx
Map reduce helpful for college students.pptx
Mapreduce by examples
Lecture2-MapReduce - An introductory lecture to Map Reduce
MapReduce : Simplified Data Processing on Large Clusters
Distributed Computing with Apache Hadoop. Introduction to MapReduce.
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
An Introduction to MapReduce
Ad

More from Ankit Gupta (20)

PPT
Biometricstechnology in iot and machine learning
PDF
Week2 cloud computing week2
PDF
Week 8 lecture material
PDF
Week 4 lecture material cc (1)
PDF
Week 3 lecture material cc
PDF
Week 1 lecture material cc
PDF
Mod05lec25(resource mgmt ii)
PDF
Mod05lec24(resource mgmt i)
PDF
Mod05lec22(cloudonomics tutorial)
PDF
Mod05lec21(sla tutorial)
PDF
Lecture29 cc-security4
PDF
Lecture28 cc-security3
PDF
Lecture27 cc-security2
PDF
Lecture26 cc-security1
PDF
Lecture 30 cloud mktplace
PDF
Week 7 lecture material
PDF
Gurukul Cse cbcs-2015-16
PDF
Microprocessor full hand made notes
PPTX
Transfer Leaning Using Pytorch synopsis Minor project pptx
DOC
Intro/Overview on Machine Learning Presentation -2
Biometricstechnology in iot and machine learning
Week2 cloud computing week2
Week 8 lecture material
Week 4 lecture material cc (1)
Week 3 lecture material cc
Week 1 lecture material cc
Mod05lec25(resource mgmt ii)
Mod05lec24(resource mgmt i)
Mod05lec22(cloudonomics tutorial)
Mod05lec21(sla tutorial)
Lecture29 cc-security4
Lecture28 cc-security3
Lecture27 cc-security2
Lecture26 cc-security1
Lecture 30 cloud mktplace
Week 7 lecture material
Gurukul Cse cbcs-2015-16
Microprocessor full hand made notes
Transfer Leaning Using Pytorch synopsis Minor project pptx
Intro/Overview on Machine Learning Presentation -2

Recently uploaded (20)

PPTX
web development for engineering and engineering
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Geodesy 1.pptx...............................................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PDF
PPT on Performance Review to get promotions
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
web development for engineering and engineering
Mechanical Engineering MATERIALS Selection
Lecture Notes Electrical Wiring System Components
Geodesy 1.pptx...............................................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
PPT on Performance Review to get promotions
UNIT 4 Total Quality Management .pptx
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Digital Logic Computer Design lecture notes
Embodied AI: Ushering in the Next Era of Intelligent Systems
Operating System & Kernel Study Guide-1 - converted.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...

Mod05lec23(map reduce tutorial)

  • 1. 1 Cloud Computing : MapReduce - Tutorial Prof. Soumya K Ghosh Department of Computer Science and Engineering IIT KHARAGPUR
  • 2. Introduction • MapReduce: programming model developed at Google • Objective: – Implement large scale search – Text processing on massively scalable web data stored using BigTable and GFS distributed file system • Designed for processing and generating large volumes of data via massively parallel computations, utilizing tens of thousands of processors at a time • Fault tolerant: ensure progress of computation even if processors and networks fail • Example: – Hadoop: open source implementation of MapReduce (developed at Yahoo!) – Available on pre-packaged AMIs on Amazon EC2 cloud platform 9/11/2017 2
  • 3. MapReduce Model 9/11/2017 3 • Parallel programming abstraction • Used by many different parallel applications which carry out large-scale computation involving thousands of processors • Leverages a common underlying fault-tolerant implementation • Two phases of MapReduce: – Map operation – Reduce operation • A configurable number of M ‘mapper’ processors and R ‘reducer’ processors are assigned to work on the problem • The computation is coordinated by a single master process
  • 4. MapReduce Model Contd… 9/11/2017 4 • Map phase: – Each mapper reads approximately 1/M of the input from the global file system, using locations given by the master – Map operation consists of transforming one set of key-value pairs to another: – Each mapper writes computation results in one file per reducer – Files are sorted by a key and stored to the local file system – The master keeps track of the location of these files
  • 5. MapReduce Model Contd… 9/11/2017 5 • Reduce phase: – The master informs the reducers where the partial computations have been stored on local files of respective mappers – Reducers make remote procedure call requests to the mappers to fetch the files – Each reducer groups the results of the map step using the same key and performs a function f on the list of values that correspond to these key value: – Final results are written back to the GFS file system
  • 6. MapReduce: Example 9/11/2017 6 • 3 mappers; 2 reducers • Map function: • Reduce function:
  • 7. Problem-1 9/11/2017 7 In a MapReduce framework consider the HDFS block size is 64 MB. We have 3 files of size 64K, 65Mb and 127Mb. How many blocks will be created by Hadoop framework?
  • 8. Problem-2 9/11/2017 8 Write the pseudo-codes (for map and reduce functions) for calculating the average of a set of integers in MapReduce. Suppose A = (10, 20, 30, 40, 50) is a set of integers. Show the map and reduce outputs.
  • 9. Problem-3 9/11/2017 9 Compute total and average salary of organization XYZ and group by based on gender (male or female) using MapReduce. The input is as follows Name, Gender, Salary John, M, 10,000 Martha, F, 15,000 ----
  • 10. Problem-4 9/11/2017 10 Write the Map and Reduce functions (pseudo-codes) for the following Word Length Categorization problem under MapReduce model. Word Length Categorization: Given a text paragraph (containing only words), categorize each word into following categories. Output the frequency of occurrence of words in each category. Categories: tiny: 1-2 letters; small: 3-5 letters; medium: 6-9 letters; big: 10 or more letters
  • 11. 11