SlideShare a Scribd company logo
Advance MapReduce
Concepts
Counters
• Counter provides a way to measure the progress or the number of operations
that occur within MapReduce programs.
Built-in counter groups
Task counters
File System Counters
File InputFormat Counters
File OutputFormat Counters
Creating Custom Counters
• First step is to create an Enum that will contain the names of all custom
counters for a particular job.
public enum CustomCounters {VALID, INVALID};
• Inside the map or reduce task, the counter can be adjusted
if(validRecord)
context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1
else if(invalidRecord)
context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1
• The custom counter values will be displayed alongside the built-in counter
values on the summary web page for a job viewed through the JobTracker.
• The values can be accessed programmatically
long validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
Serialization
• Serialization is the process of turning structured objects into a byte
stream for transmission over a network or for writing to persistent
storage.
• Deserialization is the reverse process of turning a byte stream back
into a series of structured objects.
• Hadoop uses its own serialization format, Writables, which is certainly
compact and fast, but not so easy to extend or use from languages
other than Java.
The Writable Interface
• The Writable interface defines two methods
• one for writing its state to a DataOutput binary stream
• one for reading its state from a DataInput binary stream
Writable class hierarchy
Custom Writables Example
public class TextPair implements
WritableComparable<TextPair>{
private Text first;
private Text second;
public TextPair() {set(new Text(), new Text());}
public TextPair(String first, String second) {
set(new Text(first), new Text(second));}
public TextPair(Text first, Text second) {
set(first, second);}
public void set(Text first, Text second) {
this.first = first;
this.second = second;}
public Text getFirst() {
return first;}
public Text getSecond() {
return second;}
public void write(DataOutput out) throws
IOException {
first.write(out); second.write(out);}
public void readFields(DataInput in) throws
IOException {first.readFields(in);
second.readFields(in);}
@Override
public int hashCode() {
return first.hashCode() * 163 +
second.hashCode();}
@Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) &&
second.equals(tp.second);}return false;}
@Override
public String toString() {
return first + "t" + second;}
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp;}
return second.compareTo(tp.second);}
}
Error Handling
• Handling non-fatal errors that need to be tracked
• In the mapper:
if (some_error_condition){
context.getCounter(COUNTER_GROUP, COUNTER).increment(1);
}
• In the client:
boolean okay = job.waitForCompletion(true);
if (okay){
Counters counters = job.getCounters();
Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER);
System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue());
}
Compression
• It reduces the space needed to store files.
• It speeds up data transfer across the network, or to or from disk.
Tuning
Map Side Tuning Properties
Reduce Side Tuning Properties

More Related Content

PPT
Python advanced 3.the python std lib by example –data structures
PPTX
R: Apply Functions
PDF
Vasia Kalavri – Training: Gelly School
PPT
Python advanced 3.the python std lib by example – algorithm
PPT
collections
PDF
最新のデータベース技術の方向性で思うこと
PPT
Dynamic Memory Allocation
DOC
Database c# connetion
Python advanced 3.the python std lib by example –data structures
R: Apply Functions
Vasia Kalavri – Training: Gelly School
Python advanced 3.the python std lib by example – algorithm
collections
最新のデータベース技術の方向性で思うこと
Dynamic Memory Allocation
Database c# connetion

What's hot (20)

PPTX
Sql Connection and data table and data set and sample program in C# ....
PPTX
Memory management
PDF
Hadoop map reduce concepts
PDF
Hadoop exercise
PPTX
Functions in advanced programming
PPTX
Java Arrays and DateTime Functions
DOCX
Ecet 370 week 1 lab
PPT
Coherence SIG: Advanced usage of indexes in coherence
PPT
PPT
Cupdf.com introduction to-data-structures-and-algorithm
PPTX
Applications of data structures
PDF
Java data types
PPTX
Intro to plyr for Davis R Users' Group, by Steve Culman
PPT
Cost Based Optimizer - Part 2 of 2
PPTX
Python library
PPTX
Introduction to datastructure and algorithm
PDF
Pyclustering tutorial - K-means
PPT
Overview of query evaluation
PDF
Experiment no 05
PPT
SQL Optimization With Trace Data And Dbms Xplan V6
Sql Connection and data table and data set and sample program in C# ....
Memory management
Hadoop map reduce concepts
Hadoop exercise
Functions in advanced programming
Java Arrays and DateTime Functions
Ecet 370 week 1 lab
Coherence SIG: Advanced usage of indexes in coherence
Cupdf.com introduction to-data-structures-and-algorithm
Applications of data structures
Java data types
Intro to plyr for Davis R Users' Group, by Steve Culman
Cost Based Optimizer - Part 2 of 2
Python library
Introduction to datastructure and algorithm
Pyclustering tutorial - K-means
Overview of query evaluation
Experiment no 05
SQL Optimization With Trace Data And Dbms Xplan V6
Ad

Viewers also liked (20)

PDF
Image Guidelines For Keys & Synthesizers
DOCX
Estudios ambientales
PDF
PRESENTACIÓN - MIS PRIMEROS NUMEROS
PPTX
Diablada pillareña
DOCX
Ingenieria genetica y clonacion humana
PDF
Our continuous delivery journey
PDF
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
PPTX
Workshop evidence based talent &amp; motivatie
PDF
Clinical Samples & Disease State Plasma Newsflash October 2016 final
PPT
Marxismo francés
PPS
Redes sociales en la docencia1
PPTX
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
PPTX
GET GOING ON YOUR GOALS
PDF
Foodtruck research midterm
PPTX
Sistema digestivo
PDF
Sistemas admisnistrativos
DOC
Pramodkitekar_2.1Yr exp
Image Guidelines For Keys & Synthesizers
Estudios ambientales
PRESENTACIÓN - MIS PRIMEROS NUMEROS
Diablada pillareña
Ingenieria genetica y clonacion humana
Our continuous delivery journey
Khởi công, công ty tổ chức lễ khởi công chuyên nghiệp nhất tại Tây Ninh
Workshop evidence based talent &amp; motivatie
Clinical Samples & Disease State Plasma Newsflash October 2016 final
Marxismo francés
Redes sociales en la docencia1
Jmmo m3 u4_reporte_recursosweb2.0_personal_actividadopcional
GET GOING ON YOUR GOALS
Foodtruck research midterm
Sistema digestivo
Sistemas admisnistrativos
Pramodkitekar_2.1Yr exp
Ad

Similar to Advance MapReduce Concepts - Module 4 (20)

PDF
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
PPTX
Hadoop MapReduce framework - Module 3
PPT
TechTalk - Dotnet
PPT
Big-data-analysis-training-in-mumbai
PPT
Hadoop_Pennonsoft
PDF
Introduction to Scalding and Monoids
PPTX
Anti patterns
PPT
Functional Programming
PPT
Hadoop - Introduction to mapreduce
PPTX
An introduction to Test Driven Development on MapReduce
PPTX
Lecture 04 big data analytics | map reduce
PDF
Hadoop Integration in Cassandra
PDF
Enterprise workflow with Apps Script
PPTX
Cs267 hadoop programming
PPTX
Google cloud Dataflow & Apache Flink
PDF
Apache Spark, the Next Generation Cluster Computing
PPSX
Functional patterns and techniques in C#
PDF
JRubyKaigi2010 Hadoop Papyrus
PPTX
Functions
PDF
Spark what's new what's coming
Hadoop Programming - MapReduce, Input, Output, Serialization, Job
Hadoop MapReduce framework - Module 3
TechTalk - Dotnet
Big-data-analysis-training-in-mumbai
Hadoop_Pennonsoft
Introduction to Scalding and Monoids
Anti patterns
Functional Programming
Hadoop - Introduction to mapreduce
An introduction to Test Driven Development on MapReduce
Lecture 04 big data analytics | map reduce
Hadoop Integration in Cassandra
Enterprise workflow with Apps Script
Cs267 hadoop programming
Google cloud Dataflow & Apache Flink
Apache Spark, the Next Generation Cluster Computing
Functional patterns and techniques in C#
JRubyKaigi2010 Hadoop Papyrus
Functions
Spark what's new what's coming

More from Rohit Agrawal (8)

PPTX
Apache Oozie Workflow Scheduler - Module 10
PPTX
Hadoop 2.0, MRv2 and YARN - Module 9
PPTX
Advance HBase and Zookeeper - Module 8
PPTX
Advance Hive, NoSQL Database (HBase) - Module 7
PPTX
Pig and Pig Latin - Module 5
PPTX
Hadoop Cluster Configuration and Data Loading - Module 2
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Hive and HiveQL - Module6
Apache Oozie Workflow Scheduler - Module 10
Hadoop 2.0, MRv2 and YARN - Module 9
Advance HBase and Zookeeper - Module 8
Advance Hive, NoSQL Database (HBase) - Module 7
Pig and Pig Latin - Module 5
Hadoop Cluster Configuration and Data Loading - Module 2
Introduction to Big Data & Hadoop Architecture - Module 1
Hive and HiveQL - Module6

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Machine Learning_overview_presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine Learning_overview_presentation.pptx
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release

Advance MapReduce Concepts - Module 4

  • 2. Counters • Counter provides a way to measure the progress or the number of operations that occur within MapReduce programs.
  • 5. File System Counters File InputFormat Counters File OutputFormat Counters
  • 6. Creating Custom Counters • First step is to create an Enum that will contain the names of all custom counters for a particular job. public enum CustomCounters {VALID, INVALID}; • Inside the map or reduce task, the counter can be adjusted if(validRecord) context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1 else if(invalidRecord) context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1 • The custom counter values will be displayed alongside the built-in counter values on the summary web page for a job viewed through the JobTracker. • The values can be accessed programmatically long validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();
  • 7. Serialization • Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing to persistent storage. • Deserialization is the reverse process of turning a byte stream back into a series of structured objects. • Hadoop uses its own serialization format, Writables, which is certainly compact and fast, but not so easy to extend or use from languages other than Java.
  • 8. The Writable Interface • The Writable interface defines two methods • one for writing its state to a DataOutput binary stream • one for reading its state from a DataInput binary stream
  • 10. Custom Writables Example public class TextPair implements WritableComparable<TextPair>{ private Text first; private Text second; public TextPair() {set(new Text(), new Text());} public TextPair(String first, String second) { set(new Text(first), new Text(second));} public TextPair(Text first, Text second) { set(first, second);} public void set(Text first, Text second) { this.first = first; this.second = second;} public Text getFirst() { return first;} public Text getSecond() { return second;} public void write(DataOutput out) throws IOException { first.write(out); second.write(out);} public void readFields(DataInput in) throws IOException {first.readFields(in); second.readFields(in);} @Override public int hashCode() { return first.hashCode() * 163 + second.hashCode();} @Override public boolean equals(Object o) { if (o instanceof TextPair) { TextPair tp = (TextPair) o; return first.equals(tp.first) && second.equals(tp.second);}return false;} @Override public String toString() { return first + "t" + second;} public int compareTo(TextPair tp) { int cmp = first.compareTo(tp.first); if (cmp != 0) { return cmp;} return second.compareTo(tp.second);} }
  • 11. Error Handling • Handling non-fatal errors that need to be tracked • In the mapper: if (some_error_condition){ context.getCounter(COUNTER_GROUP, COUNTER).increment(1); } • In the client: boolean okay = job.waitForCompletion(true); if (okay){ Counters counters = job.getCounters(); Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER); System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue()); }
  • 12. Compression • It reduces the space needed to store files. • It speeds up data transfer across the network, or to or from disk.
  • 14. Map Side Tuning Properties
  • 15. Reduce Side Tuning Properties