Understanding hdfs

HDFS
(Hadoop Distributed
File System)
Thiru

Typical Work Writing file in
flow to HDFS

Reading file Rack
Agenda from HDFS Awareness

Planning for a
Q&A
Cluster

Client
Masters

HDFS
Map Reduce
{Name Node}
{Job Tracker}
{Secondary Name Node}

Hadoop Server Data Node Data Node Data Node Data Node

Roles Task Tracker Task Tracker Task Tracker Task Tracker
Slaves

Data Node Data Node Data Node Data Node

Task Tracker Task Tracker Task Tracker Task Tracker

Hadoop Client

Hadoop
Cluster Name Node Job Tracker Secondary NN Hadoop Client DN + TT

DN + TT DN + TT DN + TT DN + TT DN + TT


DN + TT DN + TT DN + TT DN + TT
DN + TT


Write data in to cluster (HDFS)

Analyze the data (Map Reduce)

Sample HDFS Store the result in to cluster (HDFS)
Workflow Read the result from cluster (HDFS)

Sample scenario: How many times customer called to customer care enquiring
about a recently launched product? Compare it against the AD campaign in the
television. Correlate both and find the best time to run the AD
HDFS
CRM Data
entry
SQOOP Map Reduce
Result
Result

I want to
File size 200 MB
write file

Hadoop Client Name Node

Write data in Data Node 1 Data Node 2
Ok! Block size is
64 MB. Split the
to cluster file in to 3 and
(HDFS Data Node 3 Data Node 4
write in to node
1,4,5

Data Node 5 Data Node 6

Data node
replicates as
Client write
Client Consults per replication Cycle repeats
data to one
Name node factor and for every block
data node
intimates
Name node

Name Node

DN 1 DN 5 DN 9
A C C Rack Aware:
Rack 1:
B DN 2 DN 6 B DN 10 Data node 1
A Data node 2
Rack 2:
DN 3 DN 7 C DN 11
Rack DN 4
A
DN 8 B DN 12
Data node 5

Awareness

Never loose data when a rack is down

Keep bulky flows within Rack when possible

Assumption in rack has higher bandwidth, and low latency

File.txt File size 200 MB

A B C

Replicate in 3,8

Multi Block
Replication
DN 1 DN 5 DN 9
A
DN 2 DN 6 DN 10

DN 3 A DN 7 DN 11

DN 4 DN 8 A DN 12

 Data node sends hearth beats
 Every 10th heart beat is Block report
 Name node builds meta data from block report
 If name node is down, HDFS is down
Name Node
 Missing heartbeats signify lost nodes
 Name node consults metadata and finds affected
data
 Name node consults rack awareness script
 Name node tells data node to replicate

File System Metadata:
File.txt = A0 {1,5,7}
A1 {1,7,9}
A2{5,10,15}

Primary Name Secondary Name
Node
node

Name Node & It’s been
1 hr, give
Secondary your data

Name node  Not a hot standby for the name node* (Zoo keeper)
 Connects to name node every one hour* (Configurable)
 Housekeeping, backup of Name node meta data
 Saved meta data can be used to rebuild name node

Primary Name Node Secondary Name Node

edits fsimage

edits fsimage
Understanding edits-
new
Secondary
name node Fsimage.ckpt
Fsimage.ckpt
house keeping

edits
Fsimage.ckpt

I want to read file
file.txt


Reading data Data Node 1 Data Node 2 Data Node 7
Ok!
File.txt = blck a
from HDFS A B B
{1,5,6}
Cluster Data Node 3 Data Node 4 Data Node 8
Blck b {8,1,2}
Blck c {5,8,9}
C
B

Data Node 5 Data Node 6 Data Node 9
C A A C

Client
Client Client picks Client reads
receives DN
Consults first node of data
list for each
Name node list sequentially
block

Single Point of
Failure

# Task per
node
Dual power
•1 core can run supply for
1.5 Mapper or redundancy
Reducer

Choosing right Master
hardware node
RAM thumb
rule – 1 GB/ No Commodity
Million blocks hardware
of data

Regular Data
backup

 HDFS clusters at Yahoo! include about 3500 nodes
 A typical cluster node has:

 · 2 quad core Xeon processors @ 2.5ghz
Practice at  · Red Hat Enterprise Linux Server Release 5.1
YAHoo!  · Sun Java JDK 1.6.0_13-b03
 · 4 directly attached SATA drives (one terabyte each)
 · 16G RAM
 · 1-gigabit Ethernet

 70 percent of the disk space is allocated to HDFS. The remainder is
reserved for the operating system (Red Hat Linux), logs, and space
to spill the output of map tasks. (MapReduce intermediate data are
not stored in HDFS.)

 For each cluster, the NameNode and the BackupNode hosts are
Practice at specially provisioned with up to 64GB RAM; application tasks are
never assigned to those hosts.
YAHoo!
 In total, a cluster of 3500 nodes has 9.8 PB of storage available as
blocks that are replicated three times yielding a net 3.3 PB of
storage for user applications. As a convenient approximation, one
thousand nodes represent one PB of application storage.

 Durability of Data
uncorrelated node failures
Replication of data three times is a robust guard against loss of
data due to uncorrelated node failures.
Practice at correlated node failures, the failure of a rack or core switch.
YAHoo! HDFS can tolerate losing a rack switch (each block has a
replica on some other rack).
loss of electrical power to the cluster
a large cluster will lose a handful of blocks during a power-on
restart.

 Benchmarks

Practice at
YAHoo!

 Benchmarks

Practice at
YAHoo!

NameNode Throughput benchmark

 Automated failover

plan: Zookeeper, Yahoo’s distributed consensus technology to
build an automated failover solution
 Scalability of the NameNode
Future work
Solution: Our near-term solution to scalability is to allow
multiple namespaces (and NameNodes) to share the physical
storage within a cluster.
Drawbacks: The main drawback of multiple independent
namespaces is the cost of managing them.

Understanding hdfs

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to Understanding hdfs (20)

Understanding hdfs

Editor's Notes