SlideShare a Scribd company logo
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
DOI:10.5121/ijgca.2016.7401 1
SERVICE LEVEL AGREEMENT BASED FAULT
TOLERANT WORKLOAD SCHEDULING IN
CLOUD COMPUTING ENVIRONMENT
Manpreet Singh Gill1
and Dr. R. K. Bawa2
1
Research Scholar, Department of Computer SciencePunjabi University, Patiala, Punjab,
India
2
Professor, Department of Computer Science, Punjabi University, Patiala, Punjab, India
ABSTRACT
Cloud computing is a concept of providing user and application oriented services in a virtual environment.
Users can use the various cloud services as per their requirements dynamically. Different users have
different requirements in terms of application reliability, performance and fault tolerance. Static and rigid
fault tolerance policies provide a consistent degree of fault tolerance as well as overhead. In this research
work we have proposed a method to implement dynamic fault tolerance considering customer
requirements. The cloud users have been classified in to sub classes as per the fault tolerance requirements.
Their jobs have also been classified into compute intensive and data intensive categories. The varying
degree of fault tolerance has been applied consisting of replication and input buffer. From the simulation
based experiments we have found that the proposed dynamic method performs better than the existing
methods.
KEYWORDS:
Fault Tolerance, User Classification, Job Classification, Replication, Buffering, Input Buffer,
1. INTRODUCTION
Cloud Computing is a model for supporting universal, appropriate, on-demand access of network
to a shared group of configurable resources of computing which can be quickly provisioned and
released with minimum effort of management. The cloud computing is a new computing model
which comes from distributed computing, grid computing, parallel computing, utility computing,
virtualization technology, and other computer technologies. Now days everywhere cloud
computing is highly accepted, because every organization wants to get rid of large storage
devices. In cloud computing, data is stored permanently not on your personal PC/SERVER but
rather on a remote server, which is connected to the Internet . Cloud environment provides the
following services to the cloud users.
• Infrastructure as a Service (IaaS)
• Software as a Service (SaaS)
• Platform as a Service (PaaS)
Due to scale and complexity cloud computing environment is very prone to various types of
faults. Such faults and failures lead to cloud job failure which can decrease the cloud performance
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
2
in terms of efficiency and throughput. At the same time, it may violate the Service Level
Agreement (SLA) which was decided between the cloud service user and provider to ensure the
Quality of Service (QoS).
2. FAULT TOLERANCE
Cloud environment uses the various fault tolerance strategies and counter measures to deal with
these issues. Fault tolerance is the property of a system which enables it to continue operating in
proper manner in the occurrence of the failure of some of systems components.
In a life-critical system, the existence of a fault-tolerant control system is really important. One of
its important functions is to steer the procedure to a safe state whenever unwanted events like
faults occur. To achieve this role availability, the reliably of the fault-tolerant control system has
to be high. To attain a high degree of availability alongside random failures, one has to recourse
to redundancy. Moreover, to avoid common failures, there are distinct necessities on the
redundancy, such as independence, reliability, diversity and separation.
3. CLOUD FAULT TOLERANCE
In cloud environment the faults and failures can be at the various levels which include Virtual
Machine failure, host failure and network failure etc. These failures can be due to hardware and
software failures. Following are the techniques which are used for fault tolerance in Cloud
computing environment.
a) Reactive Fault Tolerance
Reactive fault tolerance strategies lessen the influence of failures on application execution when
the failure happens efficiently. Following are the various approaches which fall under the reactive
fault tolerance category.
• Replay: Replay is a fault-tolerant strategy in which the execution of the field restarted
once again on the same machine or on a different machine. This pretty execution is
initiated by the cloud service not the cloud user.
• Retry: in this approach resubmitted by the cloud user for execution. This is the simplest
and most widely used method among the public cloud.
• Job Migration: If a job is failed during execution, then it is moved on a different
machine and its execution is restarted.
• User Defined Exception Handling: In case of failure, the procedure defined by the user
to handle exceptions is initiated to either recover the execution or to fix the workflow to
avoid another instance of failure.
b) Proactive Fault Tolerance
The principle of proactive FT strategies is to avoid the errors, faults and failures by calculating
and proactively replacing them with the doubted components of other working components. Some
of the methods that are established on proactive fault tolerance strategies are using Software
Rejuvenation, migration and Load Balancing, etc.
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
3
• Check pointing: Check pointing is a mechanism in which the partial results of a job
execution are stored from time to time on a stable storage. These partial job results can be
used to resume the execution of field job. Checkpoints can be taken and regular time
intervals called periodic checkpoints or these can be at regular time intervals called
aperiodic checkpoints. Check pointing is the most widely used fault tolerance approach in
distributed environment
• Replication:In this approach, multiple copies of the same job are executed in parallel. In
case one of the Virtual Machine (VM) fails still the job can be executed by the alternate
VM. Replication provides more reliability but at the cost of added cost of replicated job.
4. PROPOSED METHODOLOGY
As discussed earlier, cloud computing environment is based on distributed architecture to provide
the users with robust and scalable service. Using the dynamic design, the cloud environment is
flexible enough to incorporate the changing user requirements from time to time. The fault-
tolerant solutions discussed in the literature review addresses a specific kind of failure in a
specific predefined way to make cloud environment more reliable. These methods they are very
stringent in accommodating the varying or changing requirements of the cloud users. The above
discussed solutions, treat the user workload in a static manner. The solutions they provide a
varying degree of fault tolerance and performance from one user to another but they are not able
to implement this approach dynamic within the cloud user workload. In the following sections we
have discussed the design and implementation of the new proposed method which can provide a
dynamic fault-tolerant solution to the different kind of jobs within the workload of one specific
user.
Research Gap
The various fault tolerance methods studied in this research are based on coarse granularity which
makes these methods rigid and unable to support a flexible fault tolerance as per the user
requirements. When we talk about cloud computing environment and the various users, who have
hosted their business logic applications on Cloud environment, we cannot say that the
requirements of every cloud user are same. The priority of these users may vary according to their
business requirements. For example, the fault tolerance and reliability requirement of a banking
application would be more as compared to an online shopping website.
Cloud computing environment may be hosting all these kind of businesses and their
corresponding applications. A very strict cloud fault tolerant job scheduling policy can no doubt
prevent and recover from any failures but this will also lead to unnecessary overhead on the cloud
resources, which will ultimately decrease the resource utilization. On the other hand, a passive
fault tolerant scheduling policy will not be able to deal with the faults and failures since it will
lead to increased number of job failures and delayed job execution. In this case, the service level
agreement between the service provider and the service user will be violated. It effects the
reputation of the cloud service provider in future and it also leads to financial penalties also.
The motivation behind this research is to find an intermediate solution which can handle the cloud
applications with adaptive fault tolerance approach. The proposed method should be able to
switch between the various fault tolerant solutions as per the user classification. To make the
proposed method even more fine granular, it should be able to provide an adaptive fault tolerance
to the jobs within the cloud user application.
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
4
5. OBJECTIVES
Following are the objectives of the proposed SLA based fault tolerant scheduling approach:
• To propose an adaptive fault tolerance method for cloud job scheduling, which can adjust
the degree of fault tolerance for a particular user and workload based on the user and
workload classification method. This will help in providing more reliable service to the
high-end customers while keeping the fault tolerance overhead to a minimum for low-end
services.
• To increase the probability of cloud job getting executed within the set SLA guidelines,
while keeping the cost and turnaround time to a minimum. The primary goal is to decrease
or eradicate the SLA violations for cloud job execution.
• For quantifying the performance, the proposed method will be compared with the existing
fault tolerance solutions based on cost and SLA matrix.
6. ASSUMPTIONS AND CONSIDERATIONS FOR THE PROPOSED
METHOD
Following are the research assumptions which have been considered for the proposed fault
tolerant solution for SLA based scheduling.
• Infrastructure Provider: is an entity which provides the basic infrastructure required for the
installation and deployment of cloud resources for providing various cloud services.
• Cloud User: Cloud user is the one who uses the services provided by the infrastructure
provider would in order to deploy a his/her customised business applications.
• Fault Tolerant Service: fault tolerant service provides the pages fault tolerant solutions the
cloud user by the various application programming interfaces and programming extensions.
• Service Provider Roles: we have assumed that the cloud service provider has a consistent
and accurate access to the availability and failure state of the cloud resources. The
availability and failure information can be accessed uniformly throughout the cloud
infrastructure.
• Fault Model: the fault model defines and sets the boundaries for the design and operation
of the fault tolerant solution. This consists of the various mechanisms which are to be
applied pre or post failure in order to provide a satisfactory service to the cloud user.
• Fault Tolerance Overhead: failure overhead is measured according to cost and amount of
resources consumed for implementing a particular fault tolerant approach. The failure
overhead increases with the degree or extent of fault tolerance.
• Fault tolerance performance: This is used to measure the effectiveness of the applied fault-
tolerant solution. This is evaluated in terms of successful versus failed job executions. The
job waiting time and the overall turnaround time is also a part of the fault tolerance
performance metrics.
• Resource Manager: Resource manager service keeps track of the resource states. The
resources State may vary from ideal to busy. This service also keeps track of the resources
which are being used for implementing the replication service. It also keeps track of the
machine attributes in terms of wood serial number, hardware architecture in terms of
processor speed, the storage capacity and type along with the main memory details. The
information related to the busy/ideal state of the machine processor cores is also
maintained in the database of the resource manager service.
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
5
• Replica Manager: Replica manager service keeps track of the details related to the number
of active replicas of client job along with their physical location and synchronisation in
case of interactive workload.
• Fault Detection Service: this service keeps track of the availability of cloud VMs by using
the concepts of heartbeats.
• Buffer Manager: the buffer manager service manages the process or job-related data buffer
to facilitate job processing. The data stored in this service can be used to migrate or resume
job execution.
• Recovery Manager: This service is responsible for recovering the processing of the failed
job by using the recovery methods decided in SLA.
• Input Buffer: Input buffer service provides buffering capability to store the job states and
input data. This data can be used to restart job execution upon VM failure without
transferring the input data once again. This is less costly as compared to job replication.
• We have assumed that the cloud resources are of homogeneous nature in terms of operating
system, hardware and network resources and the job migration from one cloud resource to
another cloud resource does not lead to any compatibility issues.
7. CLOUD USER CLASSIFICATION
The cloud user classification is one of the basic foundation of the proposed fault tolerant solution.
As already discussed, different users may have different requirements for their business
applications. We cannot treat all the cloud users with a single fault tolerant or job scheduling
policy, and expect the quality of service and user satisfaction. To achieve this, the fault-tolerant
solution should be able to cater the needs of different users. To invoke the fault-tolerant solution
for a particular class of users, first of all these users should be classified into various classes as
per the fault-tolerant requirements. A particular user class will be assigned a particular priority.
This priority will be used to map the user would with the corresponding SLA for fault tolerance.
For the proposed method, we have considered the following user classes.
• Gold Users:Premium user class is the class with the highest fault tolerance priority over
the other user classes. During the race for fault tolerance, the premium user’s jobs will be
given priority would over the other user jobs for handling the various requests.
• Silver Users:This user class is given less priority as compared to the premium user class.
The workload jobs of this fault tolerant user class will be delayed in case of clash with the
premium user class jobs.
• Bronze Users:This user class is the class of normal users. This class is provided with all
the basic services. This user class as the least priority in terms of fault tolerance
implementations.
8. CLOUD JOB CLASSIFICATION AND SLA SPECIFICATION
Similar to the cloud user classification, the cloud jobs of a particular user have further been
classified into the following two classes.
• Compute Intensive:Compute intensive jobs which require a lot of CPU capacity for
processing. For a smaller size of input data, a lot of analytical or predictive steps are
performed. Scientific experiments and DNA sequencing are two examples of compute
intensive jobs.
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
6
• Data Intensive Jobs:Data intensive jobs spend most of their time in Input/Output rather
than processing. These jobs require a lot of data (terabytes) to be transferred for
processing. The fault tolerance requirements of data intensive jobs in different from
compute intensive jobs.
• SLA Specification:The SLA for cloud users have been considered in terms of number of
job failures and job turnaround time.
The users who are processing the workload which is not a real time based application, can afford
slight delay in the execution and also some degree of job failures. This was the basic idea behind
the proposed technique to provide requirement based fault tolerance while minimizing the fault
tolerance overhead in terms of replication and input storage buffer.
9. SLA AND CLOUD USER MAPPING FOR FAULT TOLERANCE
IMPLEMENTATION
A rigid fault tolerance policy induces a static overhead of fault tolerance on all the workload. It
leadsto increased turnaround time and execution cost. To avoid this, the proposed adaptive
method can optimize the execution cost, turnaround time vs. the fault tolerance implementation
requirements. The fault tolerance implementation for the cloud users has been proposed as
follows.
• Gold Users: Gold users have been considered as high end cloud users, who have no
restriction in terms of cost, fault tolerance is the utmost requirement for these users. So
for this user class, the SLA policy is oriented towards minimizing the number of job
failures. To do that, the proposed policy replicates all the jobs of Gold class users. This
adds up to the cost but it provides a high degree of fault tolerance to the user jobs. Due to
replication, if one VM fails, its corresponding failed job can be completed on the other
VM which was executing the redundant copy.
• Silver Users: For this user class, the job sub class has also been considered. As already
discussed the sub-classes are of two types: computer intensive, data intensive. For silver
users, the fault tolerance requirement for compute intensive job is different from data
intensive job. Computer intensive jobs have been replicated but the data intensive jobs
have been backed up by input buffer. If a compute intensive job fails, in order to recover,
the parallel running copy of the same job is accepted as the final result. In case of data
intensive job, to avoid re-transfer of data, input buffer is used to resume the job
execution.
• Bronze Users: In this case job subclass has not been considered. All the jobs submitted by
this user class use the input buffer service for fault tolerance.
International Journal of Grid Computing & Applications (IJGCA
10. CONCLUSION
The proposed issue Fault tolerance is one of the most crucial issue which is faced by the cloud
users and cloud service providers. If poorly handled, it can lead to increased waiting time,
increased job turnaround time and in worst case increased job
policies provide a static fault tolerance but induce additional overhead. Considering this we
have propose an adaptive fault tolerant job scheduling method which
Fault tolerance as per the user requireme
classes based on the application requirements along with the job classification.
REFERENCES
[1] P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud
computing,” in 2014 5th International Conference
Technology Summit (Confluence), 2014, pp. 138
[2] S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int.
J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24
[3] A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud
Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844
849.
[4] I. P. Egwutuoha, S. Chen, D. Levy, and B. Selic, “A fault tolerance fra
performance computing in cloud,” in Proceedings
Cluster, Cloud and Grid Computing, CCGrid 2012, 2012, pp. 709
[5] D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dyn
fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193
228, 2013.
[6] P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud
computing,” in 2013 IEEE Conference on
2013, pp. 473–478.
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
Figure 1: Proposed Architecture
Fault tolerance is one of the most crucial issue which is faced by the cloud
users and cloud service providers. If poorly handled, it can lead to increased waiting time,
increased job turnaround time and in worst case increased job failures. Strict fault tolerance
policies provide a static fault tolerance but induce additional overhead. Considering this we
have propose an adaptive fault tolerant job scheduling method which should vary the degree of
as per the user requirements. The cloud users have been classified into various
classes based on the application requirements along with the job classification.
P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud
computing,” in 2014 5th International Conference - Confluence The Next Generation Information
ummit (Confluence), 2014, pp. 138–143.
S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int.
J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24–37, 2014.
A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud
Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844
I. P. Egwutuoha, S. Chen, D. Levy, and B. Selic, “A fault tolerance framework for high
performance computing in cloud,” in Proceedings - 12th IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing, CCGrid 2012, 2012, pp. 709–710.
D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dyn
fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193
P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud
computing,” in 2013 IEEE Conference on Information and Communication Technologies, ICT 2013,
) Vol.7, No.3/4, December 2016
7
Fault tolerance is one of the most crucial issue which is faced by the cloud
users and cloud service providers. If poorly handled, it can lead to increased waiting time,
Strict fault tolerance
policies provide a static fault tolerance but induce additional overhead. Considering this we to
vary the degree of
nts. The cloud users have been classified into various
P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud
Confluence The Next Generation Information
S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int.
A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud
Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844–
mework for high
12th IEEE/ACM International Symposium on
D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dynamic adaptive
fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193–
P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud
Information and Communication Technologies, ICT 2013,
International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016
8
[7] M. Armbrust, A. Fox, R. Griffith, A. Joseph, and RH, “Above the clouds: A Berkeley view of
cloud computing,” Univ. California, Berkeley, Tech. Rep. UCB , pp. 07–013, 2009.
[8] R. Rajavel and T. Mala, “Achieving service level agreement in cloud environment using job
prioritization in hierarchical scheduling,” in Advances in Intelligent and Soft Computing, 2012, vol.
132 AISC, pp. 547–554.
[9] S. Fu, “Failure-aware resource management for high-availability computing clusters with distributed
virtual machines,” J. Parallel Distrib. Comput., vol. 70, no. 4, pp. 384–393, 2010.
[10] K. Lu, R. Yahyapour, P. Wieder, C. Kotsokalis, E. Yaqub, and A. I. Jehangiri, “QoS-aware VM
placement in multi-domain service level agreements scenarios,” in IEEE International Conference on
Cloud Computing, CLOUD, 2013, pp. 661–668.
[11] L. Wu, S. K. Garg, and R. Buyya, “SLA-Based Resource Allocation for Software as a Service
Provider (SaaS) in Cloud Computing Environments,” 2011 11th IEEE/ACM Int. Symp. Clust. Cloud
Grid Comput., pp. 195–204, 2011.
[12] M. M. Hassan, B. Song, M. S. Hossain, and A. Alamri, “QoS-aware Resource Provisioning for Big
Data Processing in Cloud Computing Environment,” in Computational Science and Computational
Intelligence (CSCI), 2014 International Conference on, 2014, vol. 2, pp. 107–112.
[13] S. Malik and F. Huet, “Adaptive fault tolerance in real time cloud computing,” in Proceedings - 2011
IEEE World Congress on Services, SERVICES 2011, 2011, pp. 280–287.
AUTHORS
1. Manpreet Singh Gill
2. Dr. R. K. Bawa

More Related Content

PDF
An Investigation of Fault Tolerance Techniques in Cloud Computing
PDF
An Introduction to Designing Reliable Cloud Services January 2014
PPT
Adaptive fault tolerance in cloud survey
DOCX
the client assignment problem for continuous distributed interactive applicat...
PDF
Advanced resource allocation and service level monitoring for container orche...
PDF
Paper id 25201464
PDF
Fault tolerance on cloud computing
An Investigation of Fault Tolerance Techniques in Cloud Computing
An Introduction to Designing Reliable Cloud Services January 2014
Adaptive fault tolerance in cloud survey
the client assignment problem for continuous distributed interactive applicat...
Advanced resource allocation and service level monitoring for container orche...
Paper id 25201464
Fault tolerance on cloud computing

What's hot (20)

PDF
Toward Cloud Computing: Security and Performance
PDF
An enhanced wireless presentation system for large scale content distribution
PDF
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
DOCX
Project book on WINDS OF CHANGE:FROM VENDOR LOCK-IN TO THE META CLOUD
PDF
Score based deadline constrained workflow scheduling algorithm for cloud systems
PDF
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
PPTX
Data Replication In Cloud Computing
PDF
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
PDF
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
PDF
Ijcet 06 07_005
PDF
Iaetsd pinpointing performance deviations of subsystems in distributed
PDF
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
PPT
Client server computing in mobile environments
PDF
CPET- Project Report
PDF
Harnessing the Cloud for Performance Testing- Impetus White Paper
PDF
IRJET- Research Paper on Energy-Aware Virtual Machine Migration for Cloud Com...
PDF
Design patterns in distributed system
PDF
Cloudlet-Based Cyber-Foraging in Resource-Constrained Environments
PDF
T04503113118
PPTX
Cloud testing v1
Toward Cloud Computing: Security and Performance
An enhanced wireless presentation system for large scale content distribution
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
Project book on WINDS OF CHANGE:FROM VENDOR LOCK-IN TO THE META CLOUD
Score based deadline constrained workflow scheduling algorithm for cloud systems
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Data Replication In Cloud Computing
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Ijcet 06 07_005
Iaetsd pinpointing performance deviations of subsystems in distributed
Analysis of a Pool Management Scheme for Cloud Computing Centres by Using Par...
Client server computing in mobile environments
CPET- Project Report
Harnessing the Cloud for Performance Testing- Impetus White Paper
IRJET- Research Paper on Energy-Aware Virtual Machine Migration for Cloud Com...
Design patterns in distributed system
Cloudlet-Based Cyber-Foraging in Resource-Constrained Environments
T04503113118
Cloud testing v1
Ad

Viewers also liked (10)

PDF
The Journey to Customer Journeys
PPTX
Accounting firms melbourne
PPT
Www.tinhgiac.com marketing dich-vu_2
PDF
Programa Oficial del COOPEBRAS 2017
PDF
Hacking using Kali Linux
PDF
2016.12.04 台灣國際基督教會主日講道投影片
PPTX
The sky is calling
PDF
TOM MOSHER PORTFOLIO 2016
PDF
عالم التسويق الإلكتروني
PPTX
Abaya Fashion Designer
The Journey to Customer Journeys
Accounting firms melbourne
Www.tinhgiac.com marketing dich-vu_2
Programa Oficial del COOPEBRAS 2017
Hacking using Kali Linux
2016.12.04 台灣國際基督教會主日講道投影片
The sky is calling
TOM MOSHER PORTFOLIO 2016
عالم التسويق الإلكتروني
Abaya Fashion Designer
Ad

Similar to SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COMPUTING ENVIRONMENT (20)

PDF
fault tolerance management in cloud computing
PDF
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
PDF
A Comparative Review on Fault Tolerance methods and models in Cloud Computing
PDF
Proactive Scheduling in Cloud Computing
PDF
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
PDF
(5 10) chitra natarajan
PDF
Failure Free Cloud Computing Architectures
PDF
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
PPTX
FT Architecture For Cloud Service Computing
PDF
Efficient fault tolerant cost optimized approach for scientific workflow via ...
PDF
Adaptive fault tolerance_in_real_time_cloud_computing
PPT
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
PDF
A REVIEW ON LOAD BALANCING IN CLOUD USING ENHANCED GENETIC ALGORITHM
PDF
Fault tolerance on cloud computing
PDF
A cloud computing scheduling and its evolutionary approaches
PDF
Cost-Efficient Task Scheduling with Ant Colony Algorithm for Executing Large ...
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Availability in Cloud Computing
PDF
Providing a multi-objective scheduling tasks by Using PSO algorithm for cost ...
PDF
Improved quality of service-based cloud service ranking and recommendation model
fault tolerance management in cloud computing
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Comparative Review on Fault Tolerance methods and models in Cloud Computing
Proactive Scheduling in Cloud Computing
An Analysis Of Cloud ReliabilityApproaches Based on Cloud Components And Reli...
(5 10) chitra natarajan
Failure Free Cloud Computing Architectures
FAILURE FREE CLOUD COMPUTING ARCHITECTURES
FT Architecture For Cloud Service Computing
Efficient fault tolerant cost optimized approach for scientific workflow via ...
Adaptive fault tolerance_in_real_time_cloud_computing
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
A REVIEW ON LOAD BALANCING IN CLOUD USING ENHANCED GENETIC ALGORITHM
Fault tolerance on cloud computing
A cloud computing scheduling and its evolutionary approaches
Cost-Efficient Task Scheduling with Ant Colony Algorithm for Executing Large ...
International Journal of Engineering Research and Development (IJERD)
Availability in Cloud Computing
Providing a multi-objective scheduling tasks by Using PSO algorithm for cost ...
Improved quality of service-based cloud service ranking and recommendation model

More from ijgca (20)

PDF
Call for Papers - 10th International Conference on Networks, Mobile Communica...
PDF
11th International Conference on Computer Science, Engineering and Informati...
PDF
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
PDF
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
DOCX
11th International Conference on Computer Science, Engineering and Informatio...
PDF
Topology Aware Load Balancing for Grids.
PDF
11th International Conference on Computer Science and Information Technology ...
PDF
AN INTELLIGENT SYSTEM FOR THE ENHANCEMENT OF VISUALLY IMPAIRED NAVIGATION AND...
DOCX
13th International Conference on Data Mining & Knowledge Management Process (...
DOCX
Call for Papers - 15th International Conference on Wireless & Mobile Networks...
DOCX
Call for Papers - 4th International Conference on Big Data (CBDA 2023)
PDF
Call for Papers - 15th International Conference on Computer Networks & Commun...
PDF
Call for Papers - 15th International Conference on Computer Networks & Commun...
PDF
Call for Papers - 9th International Conference on Cryptography and Informatio...
DOCX
Call for Papers - 9th International Conference on Cryptography and Informatio...
PDF
Call for Papers - 4th International Conference on Machine learning and Cloud ...
PDF
Call for Papers - 11th International Conference on Data Mining & Knowledge Ma...
PDF
Call for Papers - 4th International Conference on Blockchain and Internet of ...
PDF
Call for Papers - International Conference IOT, Blockchain and Cryptography (...
PDF
Call for Paper - 4th International Conference on Cloud, Big Data and Web Serv...
Call for Papers - 10th International Conference on Networks, Mobile Communica...
11th International Conference on Computer Science, Engineering and Informati...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
11th International Conference on Computer Science, Engineering and Informatio...
Topology Aware Load Balancing for Grids.
11th International Conference on Computer Science and Information Technology ...
AN INTELLIGENT SYSTEM FOR THE ENHANCEMENT OF VISUALLY IMPAIRED NAVIGATION AND...
13th International Conference on Data Mining & Knowledge Management Process (...
Call for Papers - 15th International Conference on Wireless & Mobile Networks...
Call for Papers - 4th International Conference on Big Data (CBDA 2023)
Call for Papers - 15th International Conference on Computer Networks & Commun...
Call for Papers - 15th International Conference on Computer Networks & Commun...
Call for Papers - 9th International Conference on Cryptography and Informatio...
Call for Papers - 9th International Conference on Cryptography and Informatio...
Call for Papers - 4th International Conference on Machine learning and Cloud ...
Call for Papers - 11th International Conference on Data Mining & Knowledge Ma...
Call for Papers - 4th International Conference on Blockchain and Internet of ...
Call for Papers - International Conference IOT, Blockchain and Cryptography (...
Call for Paper - 4th International Conference on Cloud, Big Data and Web Serv...

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Basic Mud Logging Guide for educational purpose
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Classroom Observation Tools for Teachers
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
Basic Mud Logging Guide for educational purpose
human mycosis Human fungal infections are called human mycosis..pptx
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
TR - Agricultural Crops Production NC III.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O7-L3 Supply Chain Operations - ICLT Program
Classroom Observation Tools for Teachers
Microbial disease of the cardiovascular and lymphatic systems
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COMPUTING ENVIRONMENT

  • 1. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 DOI:10.5121/ijgca.2016.7401 1 SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COMPUTING ENVIRONMENT Manpreet Singh Gill1 and Dr. R. K. Bawa2 1 Research Scholar, Department of Computer SciencePunjabi University, Patiala, Punjab, India 2 Professor, Department of Computer Science, Punjabi University, Patiala, Punjab, India ABSTRACT Cloud computing is a concept of providing user and application oriented services in a virtual environment. Users can use the various cloud services as per their requirements dynamically. Different users have different requirements in terms of application reliability, performance and fault tolerance. Static and rigid fault tolerance policies provide a consistent degree of fault tolerance as well as overhead. In this research work we have proposed a method to implement dynamic fault tolerance considering customer requirements. The cloud users have been classified in to sub classes as per the fault tolerance requirements. Their jobs have also been classified into compute intensive and data intensive categories. The varying degree of fault tolerance has been applied consisting of replication and input buffer. From the simulation based experiments we have found that the proposed dynamic method performs better than the existing methods. KEYWORDS: Fault Tolerance, User Classification, Job Classification, Replication, Buffering, Input Buffer, 1. INTRODUCTION Cloud Computing is a model for supporting universal, appropriate, on-demand access of network to a shared group of configurable resources of computing which can be quickly provisioned and released with minimum effort of management. The cloud computing is a new computing model which comes from distributed computing, grid computing, parallel computing, utility computing, virtualization technology, and other computer technologies. Now days everywhere cloud computing is highly accepted, because every organization wants to get rid of large storage devices. In cloud computing, data is stored permanently not on your personal PC/SERVER but rather on a remote server, which is connected to the Internet . Cloud environment provides the following services to the cloud users. • Infrastructure as a Service (IaaS) • Software as a Service (SaaS) • Platform as a Service (PaaS) Due to scale and complexity cloud computing environment is very prone to various types of faults. Such faults and failures lead to cloud job failure which can decrease the cloud performance
  • 2. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 2 in terms of efficiency and throughput. At the same time, it may violate the Service Level Agreement (SLA) which was decided between the cloud service user and provider to ensure the Quality of Service (QoS). 2. FAULT TOLERANCE Cloud environment uses the various fault tolerance strategies and counter measures to deal with these issues. Fault tolerance is the property of a system which enables it to continue operating in proper manner in the occurrence of the failure of some of systems components. In a life-critical system, the existence of a fault-tolerant control system is really important. One of its important functions is to steer the procedure to a safe state whenever unwanted events like faults occur. To achieve this role availability, the reliably of the fault-tolerant control system has to be high. To attain a high degree of availability alongside random failures, one has to recourse to redundancy. Moreover, to avoid common failures, there are distinct necessities on the redundancy, such as independence, reliability, diversity and separation. 3. CLOUD FAULT TOLERANCE In cloud environment the faults and failures can be at the various levels which include Virtual Machine failure, host failure and network failure etc. These failures can be due to hardware and software failures. Following are the techniques which are used for fault tolerance in Cloud computing environment. a) Reactive Fault Tolerance Reactive fault tolerance strategies lessen the influence of failures on application execution when the failure happens efficiently. Following are the various approaches which fall under the reactive fault tolerance category. • Replay: Replay is a fault-tolerant strategy in which the execution of the field restarted once again on the same machine or on a different machine. This pretty execution is initiated by the cloud service not the cloud user. • Retry: in this approach resubmitted by the cloud user for execution. This is the simplest and most widely used method among the public cloud. • Job Migration: If a job is failed during execution, then it is moved on a different machine and its execution is restarted. • User Defined Exception Handling: In case of failure, the procedure defined by the user to handle exceptions is initiated to either recover the execution or to fix the workflow to avoid another instance of failure. b) Proactive Fault Tolerance The principle of proactive FT strategies is to avoid the errors, faults and failures by calculating and proactively replacing them with the doubted components of other working components. Some of the methods that are established on proactive fault tolerance strategies are using Software Rejuvenation, migration and Load Balancing, etc.
  • 3. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 3 • Check pointing: Check pointing is a mechanism in which the partial results of a job execution are stored from time to time on a stable storage. These partial job results can be used to resume the execution of field job. Checkpoints can be taken and regular time intervals called periodic checkpoints or these can be at regular time intervals called aperiodic checkpoints. Check pointing is the most widely used fault tolerance approach in distributed environment • Replication:In this approach, multiple copies of the same job are executed in parallel. In case one of the Virtual Machine (VM) fails still the job can be executed by the alternate VM. Replication provides more reliability but at the cost of added cost of replicated job. 4. PROPOSED METHODOLOGY As discussed earlier, cloud computing environment is based on distributed architecture to provide the users with robust and scalable service. Using the dynamic design, the cloud environment is flexible enough to incorporate the changing user requirements from time to time. The fault- tolerant solutions discussed in the literature review addresses a specific kind of failure in a specific predefined way to make cloud environment more reliable. These methods they are very stringent in accommodating the varying or changing requirements of the cloud users. The above discussed solutions, treat the user workload in a static manner. The solutions they provide a varying degree of fault tolerance and performance from one user to another but they are not able to implement this approach dynamic within the cloud user workload. In the following sections we have discussed the design and implementation of the new proposed method which can provide a dynamic fault-tolerant solution to the different kind of jobs within the workload of one specific user. Research Gap The various fault tolerance methods studied in this research are based on coarse granularity which makes these methods rigid and unable to support a flexible fault tolerance as per the user requirements. When we talk about cloud computing environment and the various users, who have hosted their business logic applications on Cloud environment, we cannot say that the requirements of every cloud user are same. The priority of these users may vary according to their business requirements. For example, the fault tolerance and reliability requirement of a banking application would be more as compared to an online shopping website. Cloud computing environment may be hosting all these kind of businesses and their corresponding applications. A very strict cloud fault tolerant job scheduling policy can no doubt prevent and recover from any failures but this will also lead to unnecessary overhead on the cloud resources, which will ultimately decrease the resource utilization. On the other hand, a passive fault tolerant scheduling policy will not be able to deal with the faults and failures since it will lead to increased number of job failures and delayed job execution. In this case, the service level agreement between the service provider and the service user will be violated. It effects the reputation of the cloud service provider in future and it also leads to financial penalties also. The motivation behind this research is to find an intermediate solution which can handle the cloud applications with adaptive fault tolerance approach. The proposed method should be able to switch between the various fault tolerant solutions as per the user classification. To make the proposed method even more fine granular, it should be able to provide an adaptive fault tolerance to the jobs within the cloud user application.
  • 4. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 4 5. OBJECTIVES Following are the objectives of the proposed SLA based fault tolerant scheduling approach: • To propose an adaptive fault tolerance method for cloud job scheduling, which can adjust the degree of fault tolerance for a particular user and workload based on the user and workload classification method. This will help in providing more reliable service to the high-end customers while keeping the fault tolerance overhead to a minimum for low-end services. • To increase the probability of cloud job getting executed within the set SLA guidelines, while keeping the cost and turnaround time to a minimum. The primary goal is to decrease or eradicate the SLA violations for cloud job execution. • For quantifying the performance, the proposed method will be compared with the existing fault tolerance solutions based on cost and SLA matrix. 6. ASSUMPTIONS AND CONSIDERATIONS FOR THE PROPOSED METHOD Following are the research assumptions which have been considered for the proposed fault tolerant solution for SLA based scheduling. • Infrastructure Provider: is an entity which provides the basic infrastructure required for the installation and deployment of cloud resources for providing various cloud services. • Cloud User: Cloud user is the one who uses the services provided by the infrastructure provider would in order to deploy a his/her customised business applications. • Fault Tolerant Service: fault tolerant service provides the pages fault tolerant solutions the cloud user by the various application programming interfaces and programming extensions. • Service Provider Roles: we have assumed that the cloud service provider has a consistent and accurate access to the availability and failure state of the cloud resources. The availability and failure information can be accessed uniformly throughout the cloud infrastructure. • Fault Model: the fault model defines and sets the boundaries for the design and operation of the fault tolerant solution. This consists of the various mechanisms which are to be applied pre or post failure in order to provide a satisfactory service to the cloud user. • Fault Tolerance Overhead: failure overhead is measured according to cost and amount of resources consumed for implementing a particular fault tolerant approach. The failure overhead increases with the degree or extent of fault tolerance. • Fault tolerance performance: This is used to measure the effectiveness of the applied fault- tolerant solution. This is evaluated in terms of successful versus failed job executions. The job waiting time and the overall turnaround time is also a part of the fault tolerance performance metrics. • Resource Manager: Resource manager service keeps track of the resource states. The resources State may vary from ideal to busy. This service also keeps track of the resources which are being used for implementing the replication service. It also keeps track of the machine attributes in terms of wood serial number, hardware architecture in terms of processor speed, the storage capacity and type along with the main memory details. The information related to the busy/ideal state of the machine processor cores is also maintained in the database of the resource manager service.
  • 5. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 5 • Replica Manager: Replica manager service keeps track of the details related to the number of active replicas of client job along with their physical location and synchronisation in case of interactive workload. • Fault Detection Service: this service keeps track of the availability of cloud VMs by using the concepts of heartbeats. • Buffer Manager: the buffer manager service manages the process or job-related data buffer to facilitate job processing. The data stored in this service can be used to migrate or resume job execution. • Recovery Manager: This service is responsible for recovering the processing of the failed job by using the recovery methods decided in SLA. • Input Buffer: Input buffer service provides buffering capability to store the job states and input data. This data can be used to restart job execution upon VM failure without transferring the input data once again. This is less costly as compared to job replication. • We have assumed that the cloud resources are of homogeneous nature in terms of operating system, hardware and network resources and the job migration from one cloud resource to another cloud resource does not lead to any compatibility issues. 7. CLOUD USER CLASSIFICATION The cloud user classification is one of the basic foundation of the proposed fault tolerant solution. As already discussed, different users may have different requirements for their business applications. We cannot treat all the cloud users with a single fault tolerant or job scheduling policy, and expect the quality of service and user satisfaction. To achieve this, the fault-tolerant solution should be able to cater the needs of different users. To invoke the fault-tolerant solution for a particular class of users, first of all these users should be classified into various classes as per the fault-tolerant requirements. A particular user class will be assigned a particular priority. This priority will be used to map the user would with the corresponding SLA for fault tolerance. For the proposed method, we have considered the following user classes. • Gold Users:Premium user class is the class with the highest fault tolerance priority over the other user classes. During the race for fault tolerance, the premium user’s jobs will be given priority would over the other user jobs for handling the various requests. • Silver Users:This user class is given less priority as compared to the premium user class. The workload jobs of this fault tolerant user class will be delayed in case of clash with the premium user class jobs. • Bronze Users:This user class is the class of normal users. This class is provided with all the basic services. This user class as the least priority in terms of fault tolerance implementations. 8. CLOUD JOB CLASSIFICATION AND SLA SPECIFICATION Similar to the cloud user classification, the cloud jobs of a particular user have further been classified into the following two classes. • Compute Intensive:Compute intensive jobs which require a lot of CPU capacity for processing. For a smaller size of input data, a lot of analytical or predictive steps are performed. Scientific experiments and DNA sequencing are two examples of compute intensive jobs.
  • 6. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 6 • Data Intensive Jobs:Data intensive jobs spend most of their time in Input/Output rather than processing. These jobs require a lot of data (terabytes) to be transferred for processing. The fault tolerance requirements of data intensive jobs in different from compute intensive jobs. • SLA Specification:The SLA for cloud users have been considered in terms of number of job failures and job turnaround time. The users who are processing the workload which is not a real time based application, can afford slight delay in the execution and also some degree of job failures. This was the basic idea behind the proposed technique to provide requirement based fault tolerance while minimizing the fault tolerance overhead in terms of replication and input storage buffer. 9. SLA AND CLOUD USER MAPPING FOR FAULT TOLERANCE IMPLEMENTATION A rigid fault tolerance policy induces a static overhead of fault tolerance on all the workload. It leadsto increased turnaround time and execution cost. To avoid this, the proposed adaptive method can optimize the execution cost, turnaround time vs. the fault tolerance implementation requirements. The fault tolerance implementation for the cloud users has been proposed as follows. • Gold Users: Gold users have been considered as high end cloud users, who have no restriction in terms of cost, fault tolerance is the utmost requirement for these users. So for this user class, the SLA policy is oriented towards minimizing the number of job failures. To do that, the proposed policy replicates all the jobs of Gold class users. This adds up to the cost but it provides a high degree of fault tolerance to the user jobs. Due to replication, if one VM fails, its corresponding failed job can be completed on the other VM which was executing the redundant copy. • Silver Users: For this user class, the job sub class has also been considered. As already discussed the sub-classes are of two types: computer intensive, data intensive. For silver users, the fault tolerance requirement for compute intensive job is different from data intensive job. Computer intensive jobs have been replicated but the data intensive jobs have been backed up by input buffer. If a compute intensive job fails, in order to recover, the parallel running copy of the same job is accepted as the final result. In case of data intensive job, to avoid re-transfer of data, input buffer is used to resume the job execution. • Bronze Users: In this case job subclass has not been considered. All the jobs submitted by this user class use the input buffer service for fault tolerance.
  • 7. International Journal of Grid Computing & Applications (IJGCA 10. CONCLUSION The proposed issue Fault tolerance is one of the most crucial issue which is faced by the cloud users and cloud service providers. If poorly handled, it can lead to increased waiting time, increased job turnaround time and in worst case increased job policies provide a static fault tolerance but induce additional overhead. Considering this we have propose an adaptive fault tolerant job scheduling method which Fault tolerance as per the user requireme classes based on the application requirements along with the job classification. REFERENCES [1] P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud computing,” in 2014 5th International Conference Technology Summit (Confluence), 2014, pp. 138 [2] S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int. J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24 [3] A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844 849. [4] I. P. Egwutuoha, S. Chen, D. Levy, and B. Selic, “A fault tolerance fra performance computing in cloud,” in Proceedings Cluster, Cloud and Grid Computing, CCGrid 2012, 2012, pp. 709 [5] D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dyn fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193 228, 2013. [6] P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud computing,” in 2013 IEEE Conference on 2013, pp. 473–478. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 Figure 1: Proposed Architecture Fault tolerance is one of the most crucial issue which is faced by the cloud users and cloud service providers. If poorly handled, it can lead to increased waiting time, increased job turnaround time and in worst case increased job failures. Strict fault tolerance policies provide a static fault tolerance but induce additional overhead. Considering this we have propose an adaptive fault tolerant job scheduling method which should vary the degree of as per the user requirements. The cloud users have been classified into various classes based on the application requirements along with the job classification. P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud computing,” in 2014 5th International Conference - Confluence The Next Generation Information ummit (Confluence), 2014, pp. 138–143. S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int. J. Grid High Perform. Comput., vol. 6, no. 2, pp. 24–37, 2014. A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844 I. P. Egwutuoha, S. Chen, D. Levy, and B. Selic, “A fault tolerance framework for high performance computing in cloud,” in Proceedings - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, 2012, pp. 709–710. D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dyn fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193 P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud computing,” in 2013 IEEE Conference on Information and Communication Technologies, ICT 2013, ) Vol.7, No.3/4, December 2016 7 Fault tolerance is one of the most crucial issue which is faced by the cloud users and cloud service providers. If poorly handled, it can lead to increased waiting time, Strict fault tolerance policies provide a static fault tolerance but induce additional overhead. Considering this we to vary the degree of nts. The cloud users have been classified into various P. Kumar, G. Raj, and A. K. Rai, “A novel high adaptive fault tolerance model in real time cloud Confluence The Next Generation Information S. Limam and G. Belalem, “A Migration Approach for Fault Tolerance in Cloud Computing,” Int. A. Ganesh, M. Sandhya, and S. Shankar, “A study on fault tolerance methods in Cloud Computing,” in 2014 IEEE International Advance Computing Conference (IACC), 2014, pp. 844– mework for high 12th IEEE/ACM International Symposium on D. Sun, G. Chang, C. Miao, and X. Wang, “Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments,” J. Supercomput., vol. 66, no. 1, pp. 193– P. Das and P. M. Khilar, “VFT: A virtualization and fault tolerance approach for cloud Information and Communication Technologies, ICT 2013,
  • 8. International Journal of Grid Computing & Applications (IJGCA) Vol.7, No.3/4, December 2016 8 [7] M. Armbrust, A. Fox, R. Griffith, A. Joseph, and RH, “Above the clouds: A Berkeley view of cloud computing,” Univ. California, Berkeley, Tech. Rep. UCB , pp. 07–013, 2009. [8] R. Rajavel and T. Mala, “Achieving service level agreement in cloud environment using job prioritization in hierarchical scheduling,” in Advances in Intelligent and Soft Computing, 2012, vol. 132 AISC, pp. 547–554. [9] S. Fu, “Failure-aware resource management for high-availability computing clusters with distributed virtual machines,” J. Parallel Distrib. Comput., vol. 70, no. 4, pp. 384–393, 2010. [10] K. Lu, R. Yahyapour, P. Wieder, C. Kotsokalis, E. Yaqub, and A. I. Jehangiri, “QoS-aware VM placement in multi-domain service level agreements scenarios,” in IEEE International Conference on Cloud Computing, CLOUD, 2013, pp. 661–668. [11] L. Wu, S. K. Garg, and R. Buyya, “SLA-Based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments,” 2011 11th IEEE/ACM Int. Symp. Clust. Cloud Grid Comput., pp. 195–204, 2011. [12] M. M. Hassan, B. Song, M. S. Hossain, and A. Alamri, “QoS-aware Resource Provisioning for Big Data Processing in Cloud Computing Environment,” in Computational Science and Computational Intelligence (CSCI), 2014 International Conference on, 2014, vol. 2, pp. 107–112. [13] S. Malik and F. Huet, “Adaptive fault tolerance in real time cloud computing,” in Proceedings - 2011 IEEE World Congress on Services, SERVICES 2011, 2011, pp. 280–287. AUTHORS 1. Manpreet Singh Gill 2. Dr. R. K. Bawa