SlideShare a Scribd company logo
Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)
Presentation Layout
 Introduction(Big data and Hadoop)
 Main components of HDFS
 Crisis on this field
 Motivation
 Goal
 Related work
 System model
 Proposed methodology
 Achievements of energy efficiency
 Limitation
 Future work
 Conclusion
Introduction
 What is Big Data?
 Lots of data(Terabytes or Petabytes)
 Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using one hand database management tools or
traditional data processing applications. The challenges include
capture,storage ,search, sharing, transfer, analysis and visualization.
 Systems/enterprises generate huge amount of data from terabytes to and
even petabytes of information.
Facebook offered some insight into how it
handles the more than 300 petabytes of data
it stores for its 1.19 billion monthly active
users.
When do you have a big data problem ?
Volume Velocity Variety
CHARACTERISTICS OF BIG DATA
Introduction(cont.)
What is Hadoop?
 Apache Hadoop is a framework that allows for the distributed processing of
large data sets across clusters of commodity computers using a simple
programming model.
 It is an open source data management with scale out storage and distributed
processing .
Main components of HDFS
 Namenode:
 Master of the system
 Maintenance and manages the blocks which
are present on the datanodes.
 Dtanodes:
 Slaves which are deployed on each machine
and provide the actual storage.
 Responsible for serving read and write
request for the clients.
Crisis on This Field
Since day by day for huge amount of data
processing so much sever node is included rapidly
& maintaining those servers energy(POWER)
becomes a great concern.
Energy Efficiency is a Great Crisis
Motivation
 Data centers are becoming critical in modern life.
 To maintain a data center, a large amount of energy
need to be consumed for both computing and cooling
 As data centers continue to grow in size and number,
some researchers estimated, in future the cost of
electricity for data centers could exceed the cost of the
original capital investment .
Goal
Power has become a major concern in our current society. To
maintain a data center, a large amount of energy needs to be
consumed for both computing and cooling . As a survey shows
that
 50% of total cost of owning a computer is needed for power within three
years .
 A survey also shows that 10-20KW of power per rack & 10-20MW per data
center is needed for power Supply .
 So a calculation shows that the upcoming data centers need to be feed
more than 200 MW Of energy.
 As a result, Our goal is how to achieve energy efficiency for
data centers.
Related Work
 Saving energy is a big challenge due to many constraints.
 For Bigdata processing in Hadoop cluster, three replicas of
each data block are randomly distributed in order to
improve performance and fault tolerance
 Previous research introduces a new mechanism called
covering subset which maintains a set of active nodes to
ensure the immediate availability of data, even when all
nodes not in the covering subset are turned off.
 However, a node in the covering subset may fail.
Related Work(cont.)
Crisis Response:
 In replication based failure recovery when a node fails in FS then a recovery
procedure is followed.
 ES & WS will turned on.
 First the replica blocks are searched in ES.
 If found then the data block from that node is copied to WS.
 In this time FS remains idle.
 After making lost replica that WS node is included to FS by replacing that
failed node of FS.(ES & WS will turned off again)
Limitations:
 Here FS nodes needs to be idle when Copy of data block is sending from
ES to WS which is power consuming fact.
 For different task FS nodes needs to work several times(like read/write)
which is time and power consuming.
Data nodes are grouped into three sets: Fundamental Set (FS),
Extended Set (ES), and Waiting Set (WS).
System Model
Existing Architecture
Proposed methodology
 Implemented on covering subset environment.
 Recover the lost node as fast as possible.
 Maintain concurrent execution.
Crisis Response on Proposed Methodology
 Generally FS always remain online. ES and WS remain turns off.
 After any failure occurs in FS the replica from ES node will first added
to FS by replacing the failed node.ES remains turns off.
 So here FS nodes don’t need to remain idle.
 Besides rather than initiating the copy process of that block from FS to
WS node FS will wait for concurrent execution.
 FS node will wait for a certain time (30 minute) for any client request
for that data onto this node. If there comes any request then FS will
send that data to Client and concurrently One copy of that block will
send to WS node.(WS will turn on now)
 After completion of making replica to WS ,that WS node will included
to ES.(WS will turn off again)
Proposed Architecture
Proposed Architecture
Energy Efficiency Achievements
How it achieves Energy Efficiency:
 Since in the previous system data was transferred from ES to WS that’s
why at these mean times the nodes of FS (n-1) were idle. This
methodology overcome from that problem by initially including the
data node from ES to FS.
 By concurrent execution workload has increased on server
processor(CPU). And energy efficiency on CPU mainly depends on
workload.
 As here two individual task is performing at the same time it also a
time efficient execution process.
Calculation Discussion & Simulation
CPU Utilization Test:
Equation for Proposed System:
Limitations of this paper
 Here single node failure is considered, two or three
node failure is not considered.
 Totally node failure is considered, power failure is not
considered.
 Accuracy measurement is not performed.
Future Work
 Recovery process for power failure of any
node.
 Recovery process for two node failure placed
in same rack.
Conclusion
 Mainly investigated on energy efficient failure recovery
process in Hadoop cluster.
 For gathering energy efficiency, first of all subset
construction mechanism is used then the waiting time
for recovering that node which is called idle time is
reduced.
 CPU utilization mechanism techniques is used.
 By using replication based redundancy technology
these all process works.
 The research idea of this paper is one of our approach
for gaining energy efficiency.
Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)

More Related Content

PDF
A new methodology for large scale nosql benchmarking
PPTX
Zaitsev Schooner Webinar 6 25 V1
PPT
Cloud Computing: Hadoop
PDF
Bigtable and Dynamo
PDF
First review presentation
PPTX
Data-Intensive Technologies for Cloud Computing
PPT
A new methodology for large scale nosql benchmarking
Zaitsev Schooner Webinar 6 25 V1
Cloud Computing: Hadoop
Bigtable and Dynamo
First review presentation
Data-Intensive Technologies for Cloud Computing

What's hot (17)

PDF
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
PPTX
Clustring computing
PPTX
Cluster computing
PPTX
Variations in Performance and Scalability when Migrating n-Tier Applications ...
PDF
Bigtable: A Distributed Storage System for Structured Data
PPTX
Google Big Table
PDF
google Bigtable
PPTX
Data Centers
PPTX
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
PDF
Exploring performance and energy consumption differences between recent Intel...
PDF
Big table presentation-final
PDF
Storrs HPC Overview - Feb. 2017
PDF
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
DOC
Distributed, concurrent, and independent access to encrypted cloud databases
PDF
Bigtable
PDF
Discover Database
PostgreSQL Extension APIs are Changing the Face of Relational Databases | PGC...
Clustring computing
Cluster computing
Variations in Performance and Scalability when Migrating n-Tier Applications ...
Bigtable: A Distributed Storage System for Structured Data
Google Big Table
google Bigtable
Data Centers
Deploying, Backups, and Restore w Datastax + Azure at Albertsons/Safeway (Gur...
Exploring performance and energy consumption differences between recent Intel...
Big table presentation-final
Storrs HPC Overview - Feb. 2017
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
Distributed, concurrent, and independent access to encrypted cloud databases
Bigtable
Discover Database
Ad

Similar to Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop) (20)

DOCX
Deep semantic understanding
PPTX
PDF
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
PPT
Hadoop for Scientific Workloads__HadoopSummit2010
PPT
JPA and Coherence with TopLink Grid
PDF
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
PPT
Bhupeshbansal bigdata
PPT
Chapter_1.ppt Peter S Pacheco, Matthew Malensek – An Introduction to Parallel...
PPT
Chapter_1_16_10_2024.pptPeter S Pacheco, Matthew Malensek – An Introduction t...
PDF
CNR @ VMUG.IT 20150304
PPTX
17-NoSQL.pptx
PPTX
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
PDF
4 026
PPTX
Apache ignite as in-memory computing platform
PDF
Facade
DOC
Probabilistic consolidation of virtual machines in self organizing cloud data...
PPTX
NoSQL Introduction, Theory, Implementations
DOC
Distributed, concurrent, and independent access to encrypted cloud databases
PPTX
http://www.hfadeel.com/Blog/?p=151
PPTX
Deep semantic understanding
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Hadoop for Scientific Workloads__HadoopSummit2010
JPA and Coherence with TopLink Grid
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
Bhupeshbansal bigdata
Chapter_1.ppt Peter S Pacheco, Matthew Malensek – An Introduction to Parallel...
Chapter_1_16_10_2024.pptPeter S Pacheco, Matthew Malensek – An Introduction t...
CNR @ VMUG.IT 20150304
17-NoSQL.pptx
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
4 026
Apache ignite as in-memory computing platform
Facade
Probabilistic consolidation of virtual machines in self organizing cloud data...
NoSQL Introduction, Theory, Implementations
Distributed, concurrent, and independent access to encrypted cloud databases
http://www.hfadeel.com/Blog/?p=151
Ad

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
web development for engineering and engineering
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Artificial Intelligence
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Well-logging-methods_new................
PPT
introduction to datamining and warehousing
PPTX
Sustainable Sites - Green Building Construction
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
web development for engineering and engineering
CYBER-CRIMES AND SECURITY A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Artificial Intelligence
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Model Code of Practice - Construction Work - 21102022 .pdf
R24 SURVEYING LAB MANUAL for civil enggi
additive manufacturing of ss316l using mig welding
Foundation to blockchain - A guide to Blockchain Tech
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
UNIT 4 Total Quality Management .pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Well-logging-methods_new................
introduction to datamining and warehousing
Sustainable Sites - Green Building Construction
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...

Presentation on Bigdata (Energy Efficient Failure Recovery in Hadoop)

  • 2. Presentation Layout  Introduction(Big data and Hadoop)  Main components of HDFS  Crisis on this field  Motivation  Goal  Related work  System model  Proposed methodology  Achievements of energy efficiency  Limitation  Future work  Conclusion
  • 3. Introduction  What is Big Data?  Lots of data(Terabytes or Petabytes)  Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using one hand database management tools or traditional data processing applications. The challenges include capture,storage ,search, sharing, transfer, analysis and visualization.  Systems/enterprises generate huge amount of data from terabytes to and even petabytes of information. Facebook offered some insight into how it handles the more than 300 petabytes of data it stores for its 1.19 billion monthly active users.
  • 4. When do you have a big data problem ? Volume Velocity Variety CHARACTERISTICS OF BIG DATA
  • 5. Introduction(cont.) What is Hadoop?  Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.  It is an open source data management with scale out storage and distributed processing .
  • 6. Main components of HDFS  Namenode:  Master of the system  Maintenance and manages the blocks which are present on the datanodes.  Dtanodes:  Slaves which are deployed on each machine and provide the actual storage.  Responsible for serving read and write request for the clients.
  • 7. Crisis on This Field Since day by day for huge amount of data processing so much sever node is included rapidly & maintaining those servers energy(POWER) becomes a great concern. Energy Efficiency is a Great Crisis
  • 8. Motivation  Data centers are becoming critical in modern life.  To maintain a data center, a large amount of energy need to be consumed for both computing and cooling  As data centers continue to grow in size and number, some researchers estimated, in future the cost of electricity for data centers could exceed the cost of the original capital investment .
  • 9. Goal Power has become a major concern in our current society. To maintain a data center, a large amount of energy needs to be consumed for both computing and cooling . As a survey shows that  50% of total cost of owning a computer is needed for power within three years .  A survey also shows that 10-20KW of power per rack & 10-20MW per data center is needed for power Supply .  So a calculation shows that the upcoming data centers need to be feed more than 200 MW Of energy.  As a result, Our goal is how to achieve energy efficiency for data centers.
  • 10. Related Work  Saving energy is a big challenge due to many constraints.  For Bigdata processing in Hadoop cluster, three replicas of each data block are randomly distributed in order to improve performance and fault tolerance  Previous research introduces a new mechanism called covering subset which maintains a set of active nodes to ensure the immediate availability of data, even when all nodes not in the covering subset are turned off.  However, a node in the covering subset may fail.
  • 11. Related Work(cont.) Crisis Response:  In replication based failure recovery when a node fails in FS then a recovery procedure is followed.  ES & WS will turned on.  First the replica blocks are searched in ES.  If found then the data block from that node is copied to WS.  In this time FS remains idle.  After making lost replica that WS node is included to FS by replacing that failed node of FS.(ES & WS will turned off again) Limitations:  Here FS nodes needs to be idle when Copy of data block is sending from ES to WS which is power consuming fact.  For different task FS nodes needs to work several times(like read/write) which is time and power consuming.
  • 12. Data nodes are grouped into three sets: Fundamental Set (FS), Extended Set (ES), and Waiting Set (WS).
  • 14. Proposed methodology  Implemented on covering subset environment.  Recover the lost node as fast as possible.  Maintain concurrent execution.
  • 15. Crisis Response on Proposed Methodology  Generally FS always remain online. ES and WS remain turns off.  After any failure occurs in FS the replica from ES node will first added to FS by replacing the failed node.ES remains turns off.  So here FS nodes don’t need to remain idle.  Besides rather than initiating the copy process of that block from FS to WS node FS will wait for concurrent execution.  FS node will wait for a certain time (30 minute) for any client request for that data onto this node. If there comes any request then FS will send that data to Client and concurrently One copy of that block will send to WS node.(WS will turn on now)  After completion of making replica to WS ,that WS node will included to ES.(WS will turn off again)
  • 18. Energy Efficiency Achievements How it achieves Energy Efficiency:  Since in the previous system data was transferred from ES to WS that’s why at these mean times the nodes of FS (n-1) were idle. This methodology overcome from that problem by initially including the data node from ES to FS.  By concurrent execution workload has increased on server processor(CPU). And energy efficiency on CPU mainly depends on workload.  As here two individual task is performing at the same time it also a time efficient execution process.
  • 19. Calculation Discussion & Simulation CPU Utilization Test: Equation for Proposed System:
  • 20. Limitations of this paper  Here single node failure is considered, two or three node failure is not considered.  Totally node failure is considered, power failure is not considered.  Accuracy measurement is not performed.
  • 21. Future Work  Recovery process for power failure of any node.  Recovery process for two node failure placed in same rack.
  • 22. Conclusion  Mainly investigated on energy efficient failure recovery process in Hadoop cluster.  For gathering energy efficiency, first of all subset construction mechanism is used then the waiting time for recovering that node which is called idle time is reduced.  CPU utilization mechanism techniques is used.  By using replication based redundancy technology these all process works.  The research idea of this paper is one of our approach for gaining energy efficiency.