SlideShare a Scribd company logo
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
 MapReduce Way
 Classes and Packages in MapReduce
 Explanation of a Complete MapReduce Program
 MapReduce Examples on Analytics
 MapReduce Example on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example on Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way – Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input/Output Classes in MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Output Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages and Classes in Word Count
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages to Import
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
All these packages are present in
hadoop-common.jar
All these
packages are
present in
hadoop-mapreduce-
client-core.jar
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Mapper Class
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
Name of the Mapper Class which
inherits Super Class Mapper
Mapper Class takes 4 Arguments i.e.
Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Reducer Class
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
Name of the Reducer Class which
inherits Super Class Reducer
Reducer Class takes 4 Arguments i.e.
Reducer <KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Its Time to see some MapReduce Examples
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce is useful in a wide range of applications in multiple domains.
It is majorly used for 2 things:
 Analytics: Process the data and give the desired results
 Testing: Perform few test cases using MRUnit
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see few MapReduce Examples
on Analytics
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Weather Forecasting
 Problem Statement:
» Analysing weather data of Austin to determine Hot and Cold
Days.
We have weather data set of Austin by NCIE.
NOAA's National Centres for Environmental Information (NCEI)
(previously NCDC) is responsible for preserving, monitoring, assessing,
and providing public access to the Nation's treasure of climate and
historical weather data and information.
Refer -> ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01
Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Temperature Example - Weather Dataset
6th Column
Max Temp
6th Column
Min Temp
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Last.fm Example
is an online music website where users listen to various tracks,
the data gets collected like shown below. Write a map reduce
program to get the Number of unique listeners.
The data is coming in log files and looks like as shown below:
UserId TrackId Shared Radio Skip
100001 150 1 1 0
100005 103 0 0 1
100142 78 1 0 0
110005 289 1 0 1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see a MapReduce Example
on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MRUnit Testing Framework
 Provides 4 drivers for separately testing MapReduce code
» MapDriver
» ReduceDriver
» MapReduceDriver
» PipelineMapReduceDriver
 Helps in filling the gap between MapReduce programs and JUnit*
 Better control on log messages with JUnit Integration
*JUnit is a simple framework
to write repeatable tests.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce MRUnit Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Learning Resources
 Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial
 MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial
 MapReduce Interview Questions:
www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

More Related Content

PDF
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
PDF
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
PDF
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Hadoop File system (HDFS)
PPTX
PPT on Hadoop
PPT
Hive(ppt)
PDF
Cassandra Introduction & Features
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
HADOOP TECHNOLOGY ppt
Hadoop File system (HDFS)
PPT on Hadoop
Hive(ppt)
Cassandra Introduction & Features

What's hot (20)

PPTX
Introduction to Map Reduce
PDF
Introduction to Hadoop
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PDF
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
PPTX
Hadoop Tutorial For Beginners
PDF
Data warehouse architecture
PDF
HDFS Architecture
PPTX
Hadoop Architecture
PDF
Hadoop Overview & Architecture
 
PPTX
Big data and Hadoop
PDF
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
PPTX
Introduction to HDFS
PPTX
Apache Spark Architecture
PPTX
Introduction to Storm
PPTX
Introduction to Apache Spark
PDF
Oracle Security Presentation
PDF
Introduction to column oriented databases
PPTX
Introduction to Oracle Database
PDF
Hadoop and Spark
PPTX
Introduction to Hadoop and Hadoop component
Introduction to Map Reduce
Introduction to Hadoop
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Hadoop Tutorial For Beginners
Data warehouse architecture
HDFS Architecture
Hadoop Architecture
Hadoop Overview & Architecture
 
Big data and Hadoop
What Is Hadoop | Hadoop Tutorial For Beginners | Edureka
Introduction to HDFS
Apache Spark Architecture
Introduction to Storm
Introduction to Apache Spark
Oracle Security Presentation
Introduction to column oriented databases
Introduction to Oracle Database
Hadoop and Spark
Introduction to Hadoop and Hadoop component
Ad

Similar to MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka (20)

PPTX
PDF
2. Develop a MapReduce program to calculate the frequency of a given word in ...
PDF
Big data using Hadoop, Hive, Sqoop with Installation
PDF
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
PDF
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
PDF
Hadoop and Mapreduce Certification
PPTX
Big data and hadoop certification training
PPTX
Big data and hadoop certification training
PDF
Introduction to Spark
PPTX
Hadoop certification training
PDF
Hadoop tutorial hand-outs
PPTX
Hadoop and Mapreduce for .NET User Group
PDF
Big Data Hadoop Local and Public Cloud (Amazon EMR)
PPTX
Hadoop training and certification
PDF
Hadoop map reduce concepts
PDF
Hadoop Overview kdd2011
PDF
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
DOCX
Hadoop online training course
PPTX
Hadoop certification training
PDF
Njug presentation
2. Develop a MapReduce program to calculate the frequency of a given word in ...
Big data using Hadoop, Hive, Sqoop with Installation
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Hadoop and Mapreduce Certification
Big data and hadoop certification training
Big data and hadoop certification training
Introduction to Spark
Hadoop certification training
Hadoop tutorial hand-outs
Hadoop and Mapreduce for .NET User Group
Big Data Hadoop Local and Public Cloud (Amazon EMR)
Hadoop training and certification
Hadoop map reduce concepts
Hadoop Overview kdd2011
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Hadoop online training course
Hadoop certification training
Njug presentation
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Modernizing your data center with Dell and AMD
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
CIFDAQ's Market Insight: SEC Turns Pro Crypto
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Monthly Chronicles - July 2025
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
Modernizing your data center with Dell and AMD

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka