Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)

Presentation Layout
 Introduction(Big data and Hadoop)
 Main components of HDFS
 Crisis on this field
 Motivation
 Goal
 Related work
 System model
 Proposed methodology
 Achievements of energy efficiency
 Limitation
 Future work
 Conclusion

Introduction
 What is Big Data?
 Lots of data(Terabytes or Petabytes)
 Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using one hand database management tools or
traditional data processing applications. The challenges include
capture,storage ,search, sharing, transfer, analysis and visualization.
 Systems/enterprises generate huge amount of data from terabytes to and
even petabytes of information.
Facebook offered some insight into how it
handles the more than 300 petabytes of data
it stores for its 1.19 billion monthly active
users.

When do you have a big data problem ?
Volume Velocity Variety
CHARACTERISTICS OF BIG DATA

Introduction(cont.)
What is Hadoop?
 Apache Hadoop is a framework that allows for the distributed processing of
large data sets across clusters of commodity computers using a simple
programming model.
 It is an open source data management with scale out storage and distributed
processing .

Main components of HDFS
 Namenode:
 Master of the system
 Maintenance and manages the blocks which
are present on the datanodes.
 Dtanodes:
 Slaves which are deployed on each machine
and provide the actual storage.
 Responsible for serving read and write
request for the clients.

Crisis on This Field
Since day by day for huge amount of data
processing so much sever node is included rapidly
& maintaining those servers energy(POWER)
becomes a great concern.
Energy Efficiency is a Great Crisis

Motivation
 Data centers are becoming critical in modern life.
 To maintain a data center, a large amount of energy
need to be consumed for both computing and cooling
 As data centers continue to grow in size and number,
some researchers estimated, in future the cost of
electricity for data centers could exceed the cost of the
original capital investment .

Goal
Power has become a major concern in our current society. To
maintain a data center, a large amount of energy needs to be
consumed for both computing and cooling . As a survey shows
that
 50% of total cost of owning a computer is needed for power within three
years .
 A survey also shows that 10-20KW of power per rack & 10-20MW per data
center is needed for power Supply .
 So a calculation shows that the upcoming data centers need to be feed
more than 200 MW Of energy.
 As a result, Our goal is how to achieve energy efficiency for
data centers.

Related Work
 Saving energy is a big challenge due to many constraints.
 For Bigdata processing in Hadoop cluster, three replicas of
each data block are randomly distributed in order to
improve performance and fault tolerance
 Previous research introduces a new mechanism called
covering subset which maintains a set of active nodes to
ensure the immediate availability of data, even when all
nodes not in the covering subset are turned off.
 However, a node in the covering subset may fail.

Related Work(cont.)
Crisis Response:
 In replication based failure recovery when a node fails in FS then a recovery
procedure is followed.
 ES & WS will turned on.
 First the replica blocks are searched in ES.
 If found then the data block from that node is copied to WS.
 In this time FS remains idle.
 After making lost replica that WS node is included to FS by replacing that
failed node of FS.(ES & WS will turned off again)
Limitations:
 Here FS nodes needs to be idle when Copy of data block is sending from
ES to WS which is power consuming fact.
 For different task FS nodes needs to work several times(like read/write)
which is time and power consuming.

Data nodes are grouped into three sets: Fundamental Set (FS),
Extended Set (ES), and Waiting Set (WS).

System Model
Existing Architecture

Proposed methodology
 Implemented on covering subset environment.
 Recover the lost node as fast as possible.
 Maintain concurrent execution.

Crisis Response on Proposed Methodology
 Generally FS always remain online. ES and WS remain turns off.
 After any failure occurs in FS the replica from ES node will first added
to FS by replacing the failed node.ES remains turns off.
 So here FS nodes don’t need to remain idle.
 Besides rather than initiating the copy process of that block from FS to
WS node FS will wait for concurrent execution.
 FS node will wait for a certain time (30 minute) for any client request
for that data onto this node. If there comes any request then FS will
send that data to Client and concurrently One copy of that block will
send to WS node.(WS will turn on now)
 After completion of making replica to WS ,that WS node will included
to ES.(WS will turn off again)

Energy Efficiency Achievements
How it achieves Energy Efficiency:
 Since in the previous system data was transferred from ES to WS that’s
why at these mean times the nodes of FS (n-1) were idle. This
methodology overcome from that problem by initially including the
data node from ES to FS.
 By concurrent execution workload has increased on server
processor(CPU). And energy efficiency on CPU mainly depends on
workload.
 As here two individual task is performing at the same time it also a
time efficient execution process.

Calculation Discussion & Simulation
CPU Utilization Test:
Equation for Proposed System:

Limitations of this paper
 Here single node failure is considered, two or three
node failure is not considered.
 Totally node failure is considered, power failure is not
considered.
 Accuracy measurement is not performed.

Future Work
 Recovery process for power failure of any
node.
 Recovery process for two node failure placed
in same rack.

Conclusion
 Mainly investigated on energy efficient failure recovery
process in Hadoop cluster.
 For gathering energy efficiency, first of all subset
construction mechanism is used then the waiting time
for recovering that node which is called idle time is
reduced.
 CPU utilization mechanism techniques is used.
 By using replication based redundancy technology
these all process works.
 The research idea of this paper is one of our approach
for gaining energy efficiency.

Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)

More Related Content

What's hot (17)

Similar to Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop) (20)

Recently uploaded (20)

Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)