SlideShare a Scribd company logo
UNDERSTANDING INPUTS AND
OUTPUTS IN MAPREDUCE
By,
S.Subha Thilagam.
 Your data might be XML files sitting behind a
number of FTP servers, text log files sitting on a
central web server, or Lucene indexes1 in HDFS.
How does MapReduce support reading and writing to
these different serialization structures across the
various storage mechanisms? You’ll need to know
the answer in order to support a specific serialization
format.
BIG DATA ANALYTICS
 The two classes that support data input in
MapReduce are InputFormat and Record-
Reader. The InputFormat class is consulted to
determine how the input data should be
partitioned for the map tasks, and the
RecordReader performs the reading of data
from the inputs.
 Every job in MapReduce must define its
inputs according to contracts specified in the
InputFormat abstract class. InputFormat
implementers must fulfill three contracts: first,
they describe type information for map input
keys and values; next, they specify how the
input data should be partitioned; and finally,
they indicate the RecordReader instance that
should read the data from source.
 The RecordReader class is used by
MapReduce in the map tasks to read data from
an input split and provide each record in the
form of a key/value pair for use by mappers. A
task is commonly created for each input split,
and each task has a single RecordReader that’s
responsible for reading the data for that input
split.
 MapReduce uses a similar process for
supporting output data as it does for input
data.Two classes must exist, an OutputFormat
and a RecordWriter. The OutputFormat
performs some basic validation of the data
sink properties, and the RecordWriter writes
each reducer output to the data sink.
 Much like the InputFormat class, the
OutputFormat class, as shown in figure 3.5,
defines the contracts that implementers must
fulfill, including checking the information
related to the job output, providing a
RecordWriter, and specifying an output
committer, which allows writes to be staged
and then made “permanent” upon task and/or
job success.
 You’ll use the RecordWriter to write the
reducer outputs to the destination data sink.
 It’s a simple class.
BIG DATA ANALYTICS

More Related Content

PDF
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
PDF
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
RTF
Bt export
PPT
23. Advanced Datatypes and New Application in DBMS
PPT
19. Distributed Databases in DBMS
PDF
Lsmw by guntupalliharikrishna
PPTX
SOFTWARE ENGINEERING ppt
DOCX
Sap abap questions
Bigdata analytics K.kiruthika 2nd M.Sc.,computer science Bon secoures college...
Big data analytics K.Kiruthika II-M.Sc.,Computer Science Bonsecours college f...
Bt export
23. Advanced Datatypes and New Application in DBMS
19. Distributed Databases in DBMS
Lsmw by guntupalliharikrishna
SOFTWARE ENGINEERING ppt
Sap abap questions

What's hot (8)

PPT
Chapter4.7-mikroprocessor
PPT
Data flow diagrams
PDF
Transaction
PDF
Programming Interface & SAP BDC
PDF
Vsam interview questions and answers.
PDF
Chapter 01
PPT
How to Data Flow Diagram
PPT
Distributed Database Management System
Chapter4.7-mikroprocessor
Data flow diagrams
Transaction
Programming Interface & SAP BDC
Vsam interview questions and answers.
Chapter 01
How to Data Flow Diagram
Distributed Database Management System
Ad

Similar to BIG DATA ANALYTICS (20)

PPTX
Map reducefunnyslide
PPTX
S_MapReduce_Types_Formats_Features_07.pptx
PPTX
Mapreduce advanced
PPTX
PPTX
Map reduce helpful for college students.pptx
PDF
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
PDF
Module-2-Map Reduce.pdf Lecture notes its really easy to understand during exams
PDF
Hadoop eco system with mapreduce hive and pig
PPTX
MapReduce and Hadoop Introcuctory Presentation
PPTX
Lecture 04 big data analytics | map reduce
PPTX
Map Reduce
PPT
hadoop.ppt
PPTX
Types_of_Stats.pptxTypes_of_Stats.pptxTypes_of_Stats.pptx
PDF
Hadoop first mr job - inverted index construction
PPTX
MAP REDUCE IN DATA SCIENCE.pptx
PDF
PPTX
Hadoop MapReduce framework - Module 3
PPT
Hadoop_Pennonsoft
PPT
PPT
Hadoop 2
Map reducefunnyslide
S_MapReduce_Types_Formats_Features_07.pptx
Mapreduce advanced
Map reduce helpful for college students.pptx
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
Module-2-Map Reduce.pdf Lecture notes its really easy to understand during exams
Hadoop eco system with mapreduce hive and pig
MapReduce and Hadoop Introcuctory Presentation
Lecture 04 big data analytics | map reduce
Map Reduce
hadoop.ppt
Types_of_Stats.pptxTypes_of_Stats.pptxTypes_of_Stats.pptx
Hadoop first mr job - inverted index construction
MAP REDUCE IN DATA SCIENCE.pptx
Hadoop MapReduce framework - Module 3
Hadoop_Pennonsoft
Hadoop 2
Ad

More from sweetysweety8 (20)

PPTX
Artificial neural network
PPTX
Compiler Design
PPTX
Software engineering
PPTX
Software engineering
PPTX
WEB PROGRAMMING ANALYSIS
PPTX
Software engineering
PPTX
Software engineering
PPTX
Compiler Design
PPTX
WEB PROGRAMMING ANALYSIS
PPTX
WEB PROGRAMMING
PPTX
Bigdata
PPTX
BIG DATA ANALYTICS
PPTX
Compiler Design
PPTX
WEB PROGRAMMING
PPTX
BIG DATA ANALYTICS
PPT
Data mining
PPTX
Operating System
PPTX
Relational Database Management System
PPTX
Relational Database Management System
PPTX
Relational Database Management System
Artificial neural network
Compiler Design
Software engineering
Software engineering
WEB PROGRAMMING ANALYSIS
Software engineering
Software engineering
Compiler Design
WEB PROGRAMMING ANALYSIS
WEB PROGRAMMING
Bigdata
BIG DATA ANALYTICS
Compiler Design
WEB PROGRAMMING
BIG DATA ANALYTICS
Data mining
Operating System
Relational Database Management System
Relational Database Management System
Relational Database Management System

Recently uploaded (20)

PPTX
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
PPTX
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PDF
COLEAD A2F approach and Theory of Change
PPTX
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
PurpoaiveCommunication for students 02.pptx
PDF
natwest.pdf company description and business model
DOCX
Action plan to easily understanding okey
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PPTX
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
PPTX
NORMAN_RESEARCH_PRESENTATION.in education
PDF
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
PDF
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
DOC
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPT
First Aid Training Presentation Slides.ppt
PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
_ISO_Presentation_ISO 9001 and 45001.pptx
PPTX
Human Mind & its character Characteristics
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
COLEAD A2F approach and Theory of Change
BIOLOGY TISSUE PPT CLASS 9 PROJECT PUBLIC
Anesthesia and it's stage with mnemonic and images
PurpoaiveCommunication for students 02.pptx
natwest.pdf company description and business model
Action plan to easily understanding okey
Tablets And Capsule Preformulation Of Paracetamol
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
NORMAN_RESEARCH_PRESENTATION.in education
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
chapter8-180915055454bycuufucdghrwtrt.pptx
First Aid Training Presentation Slides.ppt
An Unlikely Response 08 10 2025.pptx
_ISO_Presentation_ISO 9001 and 45001.pptx
Human Mind & its character Characteristics

BIG DATA ANALYTICS

  • 1. UNDERSTANDING INPUTS AND OUTPUTS IN MAPREDUCE By, S.Subha Thilagam.
  • 2.  Your data might be XML files sitting behind a number of FTP servers, text log files sitting on a central web server, or Lucene indexes1 in HDFS. How does MapReduce support reading and writing to these different serialization structures across the various storage mechanisms? You’ll need to know the answer in order to support a specific serialization format.
  • 4.  The two classes that support data input in MapReduce are InputFormat and Record- Reader. The InputFormat class is consulted to determine how the input data should be partitioned for the map tasks, and the RecordReader performs the reading of data from the inputs.
  • 5.  Every job in MapReduce must define its inputs according to contracts specified in the InputFormat abstract class. InputFormat implementers must fulfill three contracts: first, they describe type information for map input keys and values; next, they specify how the input data should be partitioned; and finally, they indicate the RecordReader instance that should read the data from source.
  • 6.  The RecordReader class is used by MapReduce in the map tasks to read data from an input split and provide each record in the form of a key/value pair for use by mappers. A task is commonly created for each input split, and each task has a single RecordReader that’s responsible for reading the data for that input split.
  • 7.  MapReduce uses a similar process for supporting output data as it does for input data.Two classes must exist, an OutputFormat and a RecordWriter. The OutputFormat performs some basic validation of the data sink properties, and the RecordWriter writes each reducer output to the data sink.
  • 8.  Much like the InputFormat class, the OutputFormat class, as shown in figure 3.5, defines the contracts that implementers must fulfill, including checking the information related to the job output, providing a RecordWriter, and specifying an output committer, which allows writes to be staged and then made “permanent” upon task and/or job success.
  • 9.  You’ll use the RecordWriter to write the reducer outputs to the destination data sink.  It’s a simple class.