SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
Writable collections
There are six Writable collection types in the org.apache.hadoop.io
package:
1. Array Writable,
2. ArrayPrimitiveWritable,
3. TwoDArrayWritable,
4. MapWritable,
5. SortedMapWritable,
6. EnumSetWritable.
 ArrayWritable and TwoDArrayWritable are Writable implementations for arrays and
two-dimensional arrays (array of arrays) of Writable instances. All the elements of an
ArrayWritable or a TwoDArrayWritable must be instances of the same class, which is
specified at construction, as follows:
ArrayWritable writable = new ArrayWritable(Text.class);
In contexts where the Writable is defined by type, such as in SequenceFile
keys or values, or as input to MapReduce in general, you need to subclass
ArrayWritable (or TwoDArrayWritable, as appropriate) to set the type
statically. For example:
public class TextArrayWritable extends ArrayWritable
{
public TextArrayWritable()
{
super(Text.class);
}
}
ArrayWritable and TwoDArrayWritable both have get() and set()
methods, as well as a toArray() method, which creates a shallow
copy of the array.
ArrayPrimitiveWritable is a wrapper for arrays of Java primitives. The
component type is detected when you call set(), so there is no need to subclass
to set the type. MapWritable and SortedMapWritable are implementations of
java.util.Map and java.util.SortedMap, respectively. Here’s a demonstration of
using a MapWritable with different types for keys and values:
MapWritable src = new MapWritable();
src.put(new IntWritable(1), new Text("cat"));
src.put(new VIntWritable(2), new LongWritable(163));
MapWritable dest = new MapWritable();
WritableUtils.cloneInto(dest, src);
assertThat((Text) dest.get(new IntWritable(1)), is(new Text("cat")));
assertThat((LongWritable) dest.get(new VIntWritable(2)), is(new
LongWritable(163)));
 Writable collection implementations for sets and lists.
A general set can be emulated by using a MapWritable
(or a SortedMapWritable for a sorted set), with
NullWritable values. There is also EnumSetWritable for
sets of enum types. For lists of a single type of Writable,
ArrayWritable is adequate, but to store different types of
Writable in a single list, you can use GenericWritable to
wrap the elements in an ArrayWritable
 Hadoop comes with a useful set of Writable implementations that
serve most purposes; however, on occasion, you may need to write
your own custom implementation. nWith a custom Writable, you have
full control over the binary representation and the sort order.
Because Writables are at the heart of the MapReduce data path,
tuning the binary representation can have a significant effect on
performance. To demonstrate how to create a custom Writable, we
shall write an implementation that represents a pair of strings, called
TextPair. The basic implementation is shown in following Example.
Implementing a Custom Writable
import java.io.*;
import org.apache.hadoop.io.*;
public class TextPair implements
WritableComparable<TextPair>
{
private Text first;
private Text second;
public TextPair() {
set(new Text(), new Text());
}
public TextPair(String first, String second)
{
set(new Text(first), new Text(second));
}
public TextPair(Text first, Text second)
{
set(first, second);
}
public void set(Text first, Text second)
{this.first = first;
this.second = second; }
public Text getFirst() {
return first; }
public Text getSecond() {
return second; }
@Override
public void write(DataOutput out) throws
IOException {
first.write(out);
second.write(out); }
@Override
public void readFields(DataInput in) throws IOException {
first.readFields(in);
second.readFields(in); }
@Override
public int hashCode() {
return first.hashCode() * 163 + second.hashCode(); }
@Override
public boolean equals(Object o) {
if (o instanceof TextPair) {
TextPair tp = (TextPair) o;
return first.equals(tp.first) && second.equals(tp.second); }
return false; }
@Override
public String toString() {
return first + "
t" + second;
}
@Override
public int compareTo(TextPair tp) {
int cmp = first.compareTo(tp.first);
if (cmp != 0) {
return cmp; }
return second.compareTo(tp.second);
}
}
A Writable implementation
that stores a pair of Text
objects
The first part of the implementation is straightforward: there are two Text instance
variables, first and second, and associated constructors, getters, and setters. All
Writable implementations must have a default constructor so that the MapReduce
framework can instantiate them, then populate their fields by calling readFields().
TextPair’s write() method serializes each Text object in turn to the output stream, by
delegating to the Text objects themselves. Similarly, readFields() deserializes the
bytes from the input stream by delegating to each Text object. The DataOutput and
DataInput interfaces have a rich set of methods for serializing and deserializing Java
Primitives.
Just as you would for any value object you write in Java, you should override the
hashCode(), equals(), and toString() methods from java.lang.Object. The hash Code()
method is used by the HashPartitioner (the default partitioner in MapReduce) to
choose a reduce partition, so you should make sure that you write a good hash
function that mixes well to ensure reduce partitions are of a similar size.
If you ever plan to use your custom Writable with TextOutputFormat, then you must
implement its toString() method. TextOutputFormat calls toString() on keys and values
for their output representation. For TextPair, we write the underlying Text objects as
strings separated by a tab character.
TextPair is an implementation of WritableComparable, so it provides an
implementation of the compareTo() method that imposes the ordering you would
expect: it sorts by the first string followed by the second.

More Related Content

PPTX
Challenges of Conventional Systems.pptx
PPTX
GFS & HDFS Introduction
PPT
Chapter 11 - File System Implementation
PPT
predicate logic example
PPT
System models in distributed system
PDF
Hybrid wireless protocols
PPTX
SLA Management in Cloud
PPTX
Rule Based Algorithms.pptx
Challenges of Conventional Systems.pptx
GFS & HDFS Introduction
Chapter 11 - File System Implementation
predicate logic example
System models in distributed system
Hybrid wireless protocols
SLA Management in Cloud
Rule Based Algorithms.pptx

What's hot (20)

PPTX
Data Integration and Transformation in Data mining
PPT
Communication primitives
PDF
Rule Based Architecture System
PPT
Trustworthy Records Retention
PPTX
object oriented methodologies
PPTX
Deductive databases
PPTX
Implementation levels of virtualization
PPTX
Genetic algorithms in Data Mining
PPTX
Fault tolerance in distributed systems
PPT
Architecture of Mobile Computing
PPTX
Problem Formulation in Artificial Inteligence Projects
PPTX
Lec 7 query processing
PPT
Testing under cloud
PPTX
Learning rule of first order rules
PDF
Token, Pattern and Lexeme
PPTX
Information retrieval introduction
PPTX
multi dimensional data model
PPTX
Message and Stream Oriented Communication
PPT
File replication
PPT
ch10 Mass Storage Structure .ppt
Data Integration and Transformation in Data mining
Communication primitives
Rule Based Architecture System
Trustworthy Records Retention
object oriented methodologies
Deductive databases
Implementation levels of virtualization
Genetic algorithms in Data Mining
Fault tolerance in distributed systems
Architecture of Mobile Computing
Problem Formulation in Artificial Inteligence Projects
Lec 7 query processing
Testing under cloud
Learning rule of first order rules
Token, Pattern and Lexeme
Information retrieval introduction
multi dimensional data model
Message and Stream Oriented Communication
File replication
ch10 Mass Storage Structure .ppt
Ad

Similar to Unit 3 writable collections (20)

PPTX
PPT
Generics collections
PPT
Generics Collections
PPT
Md08 collection api
PDF
Mapreduce by examples
PDF
Java Collections Tutorials
DOCX
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
PPT
Collection
PDF
Need help coding MorseCode in JavaCreate Class MorseCodeClient. T.pdf
PPT
Spring data ii
PPTX
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
PDF
Hadoop Integration in Cassandra
PPTX
Scalable and Flexible Machine Learning With Scala @ LinkedIn
PPTX
Introduction to Apache Cassandra and support within WSO2 Platform
PPTX
Comparable/ Comparator
PDF
DOC-20250225-WA0016..pptx_20250225_232439_0000.pdf
PPT
Collections and generic class
PDF
Apache avro data serialization framework
PPT
Introduction to Intermediate Java
PPTX
Big Data Analytics Module-4 as per vtu .pptx
Generics collections
Generics Collections
Md08 collection api
Mapreduce by examples
Java Collections Tutorials
Lab 1 Recursion  Introduction   Tracery (tracery.io.docx
Collection
Need help coding MorseCode in JavaCreate Class MorseCodeClient. T.pdf
Spring data ii
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
Hadoop Integration in Cassandra
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Introduction to Apache Cassandra and support within WSO2 Platform
Comparable/ Comparator
DOC-20250225-WA0016..pptx_20250225_232439_0000.pdf
Collections and generic class
Apache avro data serialization framework
Introduction to Intermediate Java
Big Data Analytics Module-4 as per vtu .pptx
Ad

More from vishal choudhary (20)

PPTX
mobile application using automatin using node ja java on
PPTX
mobile development using node js and java
PPTX
Pixel to Percentage conversion Convert left and right padding of a div to per...
PPTX
esponsive web design means that your website (
PPTX
function in php using like three type of function
PPTX
data base connectivity in php using msql database
PPTX
software evelopment life cycle model and example of water fall model
PPTX
software Engineering lecture on development life cycle
PPTX
strings in php how to use different data types in string
PPTX
OPEN SOURCE WEB APPLICATION DEVELOPMENT question
PPTX
web performnace optimization using css minification
PPTX
web performance optimization using style
PPTX
Data types and variables in php for writing and databse
PPTX
Data types and variables in php for writing
PPTX
Data types and variables in php for writing
PPTX
sofwtare standard for test plan it execution
PPTX
Software test policy and test plan in development
PPTX
function in php like control loop and its uses
PPTX
introduction to php and its uses in daily
PPTX
data type in php and its introduction to use
mobile application using automatin using node ja java on
mobile development using node js and java
Pixel to Percentage conversion Convert left and right padding of a div to per...
esponsive web design means that your website (
function in php using like three type of function
data base connectivity in php using msql database
software evelopment life cycle model and example of water fall model
software Engineering lecture on development life cycle
strings in php how to use different data types in string
OPEN SOURCE WEB APPLICATION DEVELOPMENT question
web performnace optimization using css minification
web performance optimization using style
Data types and variables in php for writing and databse
Data types and variables in php for writing
Data types and variables in php for writing
sofwtare standard for test plan it execution
Software test policy and test plan in development
function in php like control loop and its uses
introduction to php and its uses in daily
data type in php and its introduction to use

Recently uploaded (20)

PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Global journeys: estimating international migration
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PPTX
climate analysis of Dhaka ,Banglades.pptx
Business Acumen Training GuidePresentation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Miokarditis (Inflamasi pada Otot Jantung)
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
1_Introduction to advance data techniques.pptx
Global journeys: estimating international migration
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
.pdf is not working space design for the following data for the following dat...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Launch Your Data Science Career in Kochi – 2025
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Galatica Smart Energy Infrastructure Startup Pitch Deck
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
climate analysis of Dhaka ,Banglades.pptx

Unit 3 writable collections

  • 2. There are six Writable collection types in the org.apache.hadoop.io package: 1. Array Writable, 2. ArrayPrimitiveWritable, 3. TwoDArrayWritable, 4. MapWritable, 5. SortedMapWritable, 6. EnumSetWritable.  ArrayWritable and TwoDArrayWritable are Writable implementations for arrays and two-dimensional arrays (array of arrays) of Writable instances. All the elements of an ArrayWritable or a TwoDArrayWritable must be instances of the same class, which is specified at construction, as follows: ArrayWritable writable = new ArrayWritable(Text.class);
  • 3. In contexts where the Writable is defined by type, such as in SequenceFile keys or values, or as input to MapReduce in general, you need to subclass ArrayWritable (or TwoDArrayWritable, as appropriate) to set the type statically. For example: public class TextArrayWritable extends ArrayWritable { public TextArrayWritable() { super(Text.class); } } ArrayWritable and TwoDArrayWritable both have get() and set() methods, as well as a toArray() method, which creates a shallow copy of the array.
  • 4. ArrayPrimitiveWritable is a wrapper for arrays of Java primitives. The component type is detected when you call set(), so there is no need to subclass to set the type. MapWritable and SortedMapWritable are implementations of java.util.Map and java.util.SortedMap, respectively. Here’s a demonstration of using a MapWritable with different types for keys and values: MapWritable src = new MapWritable(); src.put(new IntWritable(1), new Text("cat")); src.put(new VIntWritable(2), new LongWritable(163)); MapWritable dest = new MapWritable(); WritableUtils.cloneInto(dest, src); assertThat((Text) dest.get(new IntWritable(1)), is(new Text("cat"))); assertThat((LongWritable) dest.get(new VIntWritable(2)), is(new LongWritable(163)));
  • 5.  Writable collection implementations for sets and lists. A general set can be emulated by using a MapWritable (or a SortedMapWritable for a sorted set), with NullWritable values. There is also EnumSetWritable for sets of enum types. For lists of a single type of Writable, ArrayWritable is adequate, but to store different types of Writable in a single list, you can use GenericWritable to wrap the elements in an ArrayWritable
  • 6.  Hadoop comes with a useful set of Writable implementations that serve most purposes; however, on occasion, you may need to write your own custom implementation. nWith a custom Writable, you have full control over the binary representation and the sort order. Because Writables are at the heart of the MapReduce data path, tuning the binary representation can have a significant effect on performance. To demonstrate how to create a custom Writable, we shall write an implementation that represents a pair of strings, called TextPair. The basic implementation is shown in following Example. Implementing a Custom Writable
  • 7. import java.io.*; import org.apache.hadoop.io.*; public class TextPair implements WritableComparable<TextPair> { private Text first; private Text second; public TextPair() { set(new Text(), new Text()); } public TextPair(String first, String second) { set(new Text(first), new Text(second)); } public TextPair(Text first, Text second) { set(first, second); } public void set(Text first, Text second) {this.first = first; this.second = second; } public Text getFirst() { return first; } public Text getSecond() { return second; } @Override public void write(DataOutput out) throws IOException { first.write(out); second.write(out); } @Override public void readFields(DataInput in) throws IOException { first.readFields(in); second.readFields(in); } @Override public int hashCode() { return first.hashCode() * 163 + second.hashCode(); } @Override public boolean equals(Object o) { if (o instanceof TextPair) { TextPair tp = (TextPair) o; return first.equals(tp.first) && second.equals(tp.second); } return false; } @Override public String toString() { return first + " t" + second; } @Override public int compareTo(TextPair tp) { int cmp = first.compareTo(tp.first); if (cmp != 0) { return cmp; } return second.compareTo(tp.second); } } A Writable implementation that stores a pair of Text objects
  • 8. The first part of the implementation is straightforward: there are two Text instance variables, first and second, and associated constructors, getters, and setters. All Writable implementations must have a default constructor so that the MapReduce framework can instantiate them, then populate their fields by calling readFields(). TextPair’s write() method serializes each Text object in turn to the output stream, by delegating to the Text objects themselves. Similarly, readFields() deserializes the bytes from the input stream by delegating to each Text object. The DataOutput and DataInput interfaces have a rich set of methods for serializing and deserializing Java Primitives. Just as you would for any value object you write in Java, you should override the hashCode(), equals(), and toString() methods from java.lang.Object. The hash Code() method is used by the HashPartitioner (the default partitioner in MapReduce) to choose a reduce partition, so you should make sure that you write a good hash function that mixes well to ensure reduce partitions are of a similar size. If you ever plan to use your custom Writable with TextOutputFormat, then you must implement its toString() method. TextOutputFormat calls toString() on keys and values for their output representation. For TextPair, we write the underlying Text objects as strings separated by a tab character. TextPair is an implementation of WritableComparable, so it provides an implementation of the compareTo() method that imposes the ordering you would expect: it sorts by the first string followed by the second.