SlideShare a Scribd company logo
By
Yogeshwari.M
Samayapriya.B
In Grid Computing
 In grid computing

environment, resources
are made available across
geographically distributed
locations.
 Since resources are
distributed there may a
chances of failure.
 Techniques for handling
failures.
Abstract:
 To reduce the probability of failures a checkpoint

strategy is introduced.
 The checkpoint strategy works along with a
timestamp.
 Delay in time stamp- Failure occurs, checkpoint
triggers a roll back mechanism to identify origin and
cause of failure.
 Rectifies and the process is rescheduled to the correct
node by using SMF( select most fitting resource).
Peer-to-Peer (P2P) computing
 The robust availability of distributed resources is a very

important issue.
 P2P grid system has to guarantee the correctness when
faults occur.
 To improve reliability, fault-tolerant mechanism for
such systems is mandatory.
 Although extensive fault tolerance policies were
proposed but the fact that mechanisms was seldom
taken into consideration.
Fault Tolerance Policy on Check
Point Monitoring (FTCPM)
 Here we are proposing a Fault Tolerance policy –

FTCPM.
 For improving the system reliability, FTCPM
duplicates jobs into different sites to tolerate failures.

 Check pointing is a process to save the state of a
running application to a stable storage that can be
used to resume execution of the application in case of
any failures.
 Thus reduce the execution time.
Previous Work:
Fault Tolerance policy on Dynamic Load Balance:
 In order to provide the uninterrupted services, there

were many proposed works.
 Fault tolerance mechanisms include the redundant
strategy uses redundant tasks to carry out concurrent
computations on different nodes for keeping a
higher availability.
 When a failure occurs, the system will take a
redundant copy of the failed job from another node.
 Such an approach wastes resources.
 In the P2P grid system:
 each site has a backup storage two queues, named

Ready Queue (REQ) and
Running Queue (RQ).
 Each site regularly sends "heartbeat" messages to its
neighbour site.
 neighbour site fails.
 The fault tolerance policy is triggered.
Proposed System:
Check pointing Techniques:
 New way approach of check point technique is






introduced.
The checkpoint strategy which helps to keep the job’s
turnaround time stable.
Timestamp delay - rollback process takes place.
The check pointing mechanism reschedules the job
to the correct node using the SMF (Select Most
Fitting Resource) algorithm.
Improves the system reliability, can achieve better job
execution and job completion rate, and thereby reduce
failure probability.
System Architecture:
 Organized in a layered

architecture.
 The job allocation
testing data
 The job allocation
services are MM, EM,
configure, FT, SMF, FT,
 Finding fault tolerance
node is based on 3
mechanisms.
CP (check pointing) Algorithm:
 The algorithm schedules check pointing interval in a

dynamic way:
1. Schedule initial check pointing at short time period
after job starts.
2. Next schedule depends on machine mean failure
interval by topping up (inew= Iold+Iinit).
3.If inequality remaining execution time<mean failure
interval & Inew< a* execution time is false then new
checkpoint interval decreased to Inew= Iold-Iinit.
SMF Algorithm (Select Most Fitting
Resource)
 1. Resources register to GIS( geo info system) –

Machines are grouped into L discrete levels based on
node speed.
2. User submits jobs with resource requirement and
closeness factor.
3. Starting from smallest level, it searches for an
available node that meets the needs.
4. If node found, the job is assigned. Else it will search
for a free node upward, from level rj+1, rj+2, rj+3 to
rL, and assign job to the first free node found.
 5. If node found, the job is assigned. Else it finds







downward from level rj-1, rj-2, rj-3 to r1, and assign
job to the first free node found.
6. If the job still not assigned, it will wait in queue.
7. When job starts running, it schedules initial
check pointing at a short time period.
8. During first check pointing interval, executing job
image is sent to resource’s storage machine. Storage
saves job image and informs resource upon
completion.
9. Algorithm schedules next checkpointing based on
remaining execution time and machine failure interval
information kept at GIS.
 12. During next intervals, instead of whole job

image, resource only sends updated
attributes/properties of the job to storage. Storage
updates job.
13.job is completed. It notifies storage machine to
remove the saved checkpoint to free storage space.
 14. On machine failure or when the resource is out
of order, storage machine notifies job’s corresponding
user whether to load saved checkpoint.
 15. When user receives the load checkpoint request, it
will use SMF to choose a fitting free machine in
same or from other resources, to resume execution of
the job.
Job
Allocati
on
(Testing
Data)

 Gridlets creation:
 Job Allocation

(Testing Data):
 Check point
Controlling:
 Applying Rollback
Mechanism: check
point triggers a roll
back mechanism to
identify cause &
origin of failure.
 Finding Fault Tolerant
Node in the Network

Time
Stamp
Event
Tracki
ng

Mem
ory

Reso
urce
Allo
catio
n

Fault
Toler
ant
File
Trans
fer

Che
ck
Poin
tnt
Con
trol

SMF
(Select
Most
Fitting
Resourc
e)

Find
Fault
Tolera
nt &
Rollback

OU
TPU
T
(Tas
k
Exe
cuti
on)
Thank you!!!
 Thank you for Giving us this GOLDEN

opportunity!!!
Queries???

More Related Content

PDF
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
PPT
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
DOCX
Operating system Q/A
PPT
advanced computer architesture-conditions of parallelism
PDF
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
PDF
Real Time System
PPT
Real time os(suga)
PPT
Unit 8
Integrating fault tolerant scheme with feedback control scheduling algorithm ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
Operating system Q/A
advanced computer architesture-conditions of parallelism
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
Real Time System
Real time os(suga)
Unit 8

What's hot (20)

PDF
Iaetsd improved load balancing model based on
PDF
Survey of Real Time Scheduling Algorithms
PPT
Multiprocessor scheduling 2
PPTX
Functional Parameter & Scheduling Hierarchy | Real Time System
PPT
Real time scheduling - basic concepts
PPT
program partitioning and scheduling IN Advanced Computer Architecture
PPTX
Flow control in computer
PDF
Recurrent fuzzy neural network backstepping control for the prescribed output...
PPTX
Reference Model of Real Time System
PPTX
Homework solutionsch9
PDF
Analysis and Design of PID controller with control parameters in MATLAB and S...
PPTX
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
PPT
Homework solution1
PPTX
Real Time System
PDF
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
PPTX
Distributed System Management
PPTX
Approaches to real time scheduling
PDF
IMPLEMENTATION OF FRACTIONAL ORDER TRANSFER FUNCTION USING LOW COST DSP
PPTX
Multiprocessor scheduling 3
Iaetsd improved load balancing model based on
Survey of Real Time Scheduling Algorithms
Multiprocessor scheduling 2
Functional Parameter & Scheduling Hierarchy | Real Time System
Real time scheduling - basic concepts
program partitioning and scheduling IN Advanced Computer Architecture
Flow control in computer
Recurrent fuzzy neural network backstepping control for the prescribed output...
Reference Model of Real Time System
Homework solutionsch9
Analysis and Design of PID controller with control parameters in MATLAB and S...
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
Homework solution1
Real Time System
Performance Comparision of Dynamic Load Balancing Algorithm in Cloud Computing
Distributed System Management
Approaches to real time scheduling
IMPLEMENTATION OF FRACTIONAL ORDER TRANSFER FUNCTION USING LOW COST DSP
Multiprocessor scheduling 3
Ad

Similar to p2 p grid (20)

PPT
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
PDF
E01113138
PDF
Adaptive check-pointing and replication strategy to tolerate faults in comput...
PPTX
Grds conferences icst and icbelsh (9)
PPTX
Fault tolerance in distributed systems
PDF
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
PDF
Exploring Fault Tolerance Strategies in Big Data Infrastructures and Their Im...
PDF
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
PDF
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
PPT
Distributed Checkpointing on an Enterprise Desktop Grid
PPTX
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
PDF
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
PDF
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
PDF
Fault tolerance
PPTX
Resilience reloaded - more resilience patterns
PPT
Adaptive fault tolerance in cloud survey
PDF
H04553942
PDF
Parallel and Distributed Computing Chapter 12
PDF
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
PDF
Distributed Consensus: Making Impossible Possible
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
E01113138
Adaptive check-pointing and replication strategy to tolerate faults in comput...
Grds conferences icst and icbelsh (9)
Fault tolerance in distributed systems
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
Exploring Fault Tolerance Strategies in Big Data Infrastructures and Their Im...
Efficient Resource Management Mechanism with Fault Tolerant Model for Computa...
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
Distributed Checkpointing on an Enterprise Desktop Grid
Dynamic Resource Management In a Massively Parallel Stream Processing Engine
A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed S...
Checkpointing and Rollback Recovery Algorithms for Fault Tolerance in MANETs:...
Fault tolerance
Resilience reloaded - more resilience patterns
Adaptive fault tolerance in cloud survey
H04553942
Parallel and Distributed Computing Chapter 12
Hierarchical Non-blocking Coordinated Checkpointing Algorithms for Mobile Dis...
Distributed Consensus: Making Impossible Possible
Ad

Recently uploaded (20)

PPTX
Introduction to Building Materials
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Computing-Curriculum for Schools in Ghana
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
Introduction to Building Materials
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
UNIT III MENTAL HEALTH NURSING ASSESSMENT
A powerpoint presentation on the Revised K-10 Science Shaping Paper
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
LDMMIA Reiki Yoga Finals Review Spring Summer
What if we spent less time fighting change, and more time building what’s rig...
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Final Presentation General Medicine 03-08-2024.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
Indian roads congress 037 - 2012 Flexible pavement
Computing-Curriculum for Schools in Ghana
Weekly quiz Compilation Jan -July 25.pdf
Hazard Identification & Risk Assessment .pdf
Cell Types and Its function , kingdom of life
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
History, Philosophy and sociology of education (1).pptx

p2 p grid

  • 2. In Grid Computing  In grid computing environment, resources are made available across geographically distributed locations.  Since resources are distributed there may a chances of failure.  Techniques for handling failures.
  • 3. Abstract:  To reduce the probability of failures a checkpoint strategy is introduced.  The checkpoint strategy works along with a timestamp.  Delay in time stamp- Failure occurs, checkpoint triggers a roll back mechanism to identify origin and cause of failure.  Rectifies and the process is rescheduled to the correct node by using SMF( select most fitting resource).
  • 4. Peer-to-Peer (P2P) computing  The robust availability of distributed resources is a very important issue.  P2P grid system has to guarantee the correctness when faults occur.  To improve reliability, fault-tolerant mechanism for such systems is mandatory.  Although extensive fault tolerance policies were proposed but the fact that mechanisms was seldom taken into consideration.
  • 5. Fault Tolerance Policy on Check Point Monitoring (FTCPM)  Here we are proposing a Fault Tolerance policy – FTCPM.  For improving the system reliability, FTCPM duplicates jobs into different sites to tolerate failures.  Check pointing is a process to save the state of a running application to a stable storage that can be used to resume execution of the application in case of any failures.  Thus reduce the execution time.
  • 6. Previous Work: Fault Tolerance policy on Dynamic Load Balance:  In order to provide the uninterrupted services, there were many proposed works.  Fault tolerance mechanisms include the redundant strategy uses redundant tasks to carry out concurrent computations on different nodes for keeping a higher availability.  When a failure occurs, the system will take a redundant copy of the failed job from another node.  Such an approach wastes resources.
  • 7.  In the P2P grid system:  each site has a backup storage two queues, named Ready Queue (REQ) and Running Queue (RQ).  Each site regularly sends "heartbeat" messages to its neighbour site.  neighbour site fails.  The fault tolerance policy is triggered.
  • 8. Proposed System: Check pointing Techniques:  New way approach of check point technique is     introduced. The checkpoint strategy which helps to keep the job’s turnaround time stable. Timestamp delay - rollback process takes place. The check pointing mechanism reschedules the job to the correct node using the SMF (Select Most Fitting Resource) algorithm. Improves the system reliability, can achieve better job execution and job completion rate, and thereby reduce failure probability.
  • 9. System Architecture:  Organized in a layered architecture.  The job allocation testing data  The job allocation services are MM, EM, configure, FT, SMF, FT,  Finding fault tolerance node is based on 3 mechanisms.
  • 10. CP (check pointing) Algorithm:  The algorithm schedules check pointing interval in a dynamic way: 1. Schedule initial check pointing at short time period after job starts. 2. Next schedule depends on machine mean failure interval by topping up (inew= Iold+Iinit). 3.If inequality remaining execution time<mean failure interval & Inew< a* execution time is false then new checkpoint interval decreased to Inew= Iold-Iinit.
  • 11. SMF Algorithm (Select Most Fitting Resource)  1. Resources register to GIS( geo info system) – Machines are grouped into L discrete levels based on node speed. 2. User submits jobs with resource requirement and closeness factor. 3. Starting from smallest level, it searches for an available node that meets the needs. 4. If node found, the job is assigned. Else it will search for a free node upward, from level rj+1, rj+2, rj+3 to rL, and assign job to the first free node found.
  • 12.  5. If node found, the job is assigned. Else it finds     downward from level rj-1, rj-2, rj-3 to r1, and assign job to the first free node found. 6. If the job still not assigned, it will wait in queue. 7. When job starts running, it schedules initial check pointing at a short time period. 8. During first check pointing interval, executing job image is sent to resource’s storage machine. Storage saves job image and informs resource upon completion. 9. Algorithm schedules next checkpointing based on remaining execution time and machine failure interval information kept at GIS.
  • 13.  12. During next intervals, instead of whole job image, resource only sends updated attributes/properties of the job to storage. Storage updates job. 13.job is completed. It notifies storage machine to remove the saved checkpoint to free storage space.  14. On machine failure or when the resource is out of order, storage machine notifies job’s corresponding user whether to load saved checkpoint.  15. When user receives the load checkpoint request, it will use SMF to choose a fitting free machine in same or from other resources, to resume execution of the job.
  • 14. Job Allocati on (Testing Data)  Gridlets creation:  Job Allocation (Testing Data):  Check point Controlling:  Applying Rollback Mechanism: check point triggers a roll back mechanism to identify cause & origin of failure.  Finding Fault Tolerant Node in the Network Time Stamp Event Tracki ng Mem ory Reso urce Allo catio n Fault Toler ant File Trans fer Che ck Poin tnt Con trol SMF (Select Most Fitting Resourc e) Find Fault Tolera nt & Rollback OU TPU T (Tas k Exe cuti on)
  • 15. Thank you!!!  Thank you for Giving us this GOLDEN opportunity!!!