Anatomy of file read in hadoop

M
KU

Y

F
O

OM P
AT OO
AN AD
H

LE
FI

AD
E
R

IN

AR

A
OM
.C
ND
OO
NA
H
A
YA
SH
K@
JE
90
RA
12
H_
ES
AJ
R

Data center D1
Name Node
Rack R1

R1N1

R1N2

R1N3

R1N4

Rack R2

R2N1

R2N2

R2N3

R2N4

1. This is a Hadoop cluster with one name node and two racks named R1 and R2 in a data center D1.
Each rack has 4 nodes and they are uniquely identified as R1N1, R1N2 and so on.
2. Replication factor is 3.
3. HDFS block size is 64 MB.
4. This cluster is used as an example to explain the concepts.

1. Name node saves part of HDFS metadata like file location, permission, etc. in files
called namespace image and edit logs. Files are stored in HDFS as blocks. These
block information are not saved in any file. Instead it is gathered every time the
cluster is started. And this information is stored in name node’s memory.
2. Replica Placement : Assuming the replication factor is 3; When a file is written from
a data node (say R1N1), Hadoop attempts to save the first replica in same data
node (R1N1). Second replica is written into another node (R2N2) in a different rack
(R2). Third replica is written into another node (R2N1) in the same rack (R2) where
the second replica was saved.
3. Hadoop takes a simple approach in which the network is represented as a tree and
the distance between two nodes is the sum of their distances to their closest
common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”.
Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data
center d1. Distance calculation has 4 possible scenarios as;

1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on same
node]
2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is
same rack]
3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different rack

ANATOMY OF FILE READ – HAPPY PATH

B1
sfo_crimes.csv

R2N1

R2N2

B2

R1N1 R2N3

R2N4

B3

Name Node

R1N1

R1N1 R2N3

R2N1

Metadata

Rack R1
B1

B2

B3

R1N1

R1N2

R1N4

R1N3

Rack R2
B3

B1

R2N1
•
•
•
•
•
•

B1

B3

R2N2

B2

R2N3

B2

R2N4

Let’s assume a file named “sfo_crime.csv” of size 192 MB is saved in this cluster.
Also assume that the file was written from node R1N1.
Metadata is written in name node.
The file is split into 3 blocks each of size 64 MB. And each block is copied 3 times in the cluster.
Along with data, a checksum will be saved in each block. This is used to ensure the data read
from the block is read with out error.
When cluster is started, the metadata will look as shown on top right corner.

HDFS
Client

open()

RPC call to get first few blocks of file

DistributedFileSystem
B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

FSDataInputStream

Name Node

DFSInputStream
B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

B1

R2N2
R2N1

RIN2 JVM

R1N1

B2

R2N4
R2N3
R1N1

B3

R2N1
R2N3

R1N1
• When the cluster is up and running, the name node looks like how its
shown here (right-side).
Metadata
• Let’s say we are trying to read the “sfo_crimes.csv” file from R1N2.
• So a HDFS Client program will run on R1N2’s JVM.
• First the HDFS client program calls the method open() on a Java class
DistributedFileSystem (subclass of FileSystem).
• DFS makes a RPC call returns first few blocks on the file. NN returns the address of the
DN ORDERED with respect to the node from where the read is performed.
• The block information is saved in DFSInputStream which is wrapped in
FSDataInputStream.
• In response to ‘FileSystem.open()’, HDFS Client receives this FSDataInputStream.

HDFS
Client

read()

FSDataInputStream
DFSInputStream
B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

Name Node

RIN2 JVM

Data streamed to
client directly from
data node.

DFSIS connects to
R1N1 to read block
B1

R1N1

•
•
•
•

From now on HDFS Client deals with FSDataInputStream (FSDIS).
HDFS Client invokes read() on the stream.
Blocks are read in order. DFSIS connects to the closest node (R1N1) to read block B1.
DFSIS connects to data node and streams data to client, which calls read() repeatedly
on the stream. DFSIS verifies checksums for the data transferred to client.
• When the block is read completely, DFSIS closes the connection.

HDFS
Client

read()

FSDataInputStream
DFSInputStream
B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

Name Node

RIN2 JVM

Data streamed to
data node.

DFSIS connects to
R1N1 to read block
B2

R1N1

• Next DFSIS attempts to read block B2. As mentioned earlier, the previous connection is
closed and a fresh connection is made to the closest node (R1N1) of block B2.

HDFS
Client

read()

FSDataInputStream

close()

DFSInputStream

Name Node

B3 (R1N1, R2N1, R2N3)
B3 (R1N1, R2N1, R2N3)

RIN2 JVM

Data streamed to
data node.

DFSIS connects to
R1N1 to read block
B3

R1N1

• Now DFSIS has read all blocks returned by the first RPC call (B1 & B2). But the file is not
read completely. In our case there is one more block to read.
• DFSIS calls name node to get data node locations for next batch of blocks as needed.
• After the complete file is read for the HDFS client call close().

ANATOMY OF FILE READ – DATA NODE
CONNECTION ERROR

HDFS
Client

read()

FSDataInputStream
DFSInputStream

R1N1

Name Node

B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

RIN2 JVM

Data streamed to
data node.

DFSIS connects to
R1N1 to read block
B2

R1N1

R2N3

• Let’s say there is some error while connecting to R1N1.
• DFSIS remembers this info, so it won’t try to read from R1N1 for future blocks. Then it
tries to connect to next closest node (R2N3).

ANATOMY OF FILE READ – DATA NODE
CHECKSUM ERROR

HDFS
Client

read()

FSDataInputStream
DFSInputStream

Inform name node
that the block in
R1N1 is corrupt.

B1 (R1N1, R2N1, R2N2)
B2 (R1N1, R2N3, R2N4)

Name Node

RIN2 JVM

Data streamed to
data node.

DFSIS connects to
R1N1 to read block
B2

R1N1

R2N3

• Let’s say there is a checksum error. This means the block is corrupt.
• Information about this corrupt block is sent to name node. Then DFSIS tries to connect
to next closest node (R2N3).

THE END

SORRY FOR MY POOR ENGLISH. 
PLEASE SEND YOUR VALUABLE FEEDBACK TO
RAJESH_1290K@YAHOO.COM

Anatomy of file read in hadoop

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to Anatomy of file read in hadoop (20)

Recently uploaded (20)

Anatomy of file read in hadoop