SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 791
COST EFFECTIVE WORKFLOW SCHEDULING IN BIGDATA
V.M. PAVITHRA1, Dr. S.M. JAGATHEESAN2
1M.Phil Research Scholar, Dept. of Computer Science, Gobi Arts & Science College, Gobichettipalayam,
TamilNadu, India
2Associate Professor, Dept. of Computer Science, Gobi Arts & Science College, Gobichettipalayam, TamilNadu, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Computational science workflows have been
successfully run on traditional High Performance Computing
(HPC) systems like clusters and Grids for many years. Now a
day, users are interested to execute their workflow
applications in the Cloud to exploittheeconomicandtechnical
benefits of this new rising technology. The deployment and
management of workflows over the current existing
heterogeneous and not yet interoperable Cloud providers, is
still a challenging task for the workflow developers. The
Pointer Gossip Content Addressable Network Montage
Framework allows an automatic selectionofthegoalclouds, a
uniform get entry to the clouds, and workflow data
management with respect to user Service Level Agreement
(SLA) Requirements. Consequently, a number of studies,
focusing on different aspects, emerged in the literature. Inthis
comparative review on workflow scheduling algorithm cloud
environment is provide solution fortheproblems. Basedon the
analysis, the authors also highlight some research directions
for future investigation. The previous results offer benefits to
users by executing workflows with the expected performance
and service quality at lowest cost.
Key Words: Montage framework,Pointergossip,Content
addressable network, Scheduling.
1. INTRODUCTION
Since 2007, the term cloud has become one of the most
buzz words in IT industry. Lots of researchers try to define
cloud computing from different application aspects, but
there is not a consensus definition on it. Among the many
definitions, three widely quoted as follows “A large-scale
distributed computing paradigmthatisdrivenby economies
of scale, in which a pool of abstracted virtualized,
dynamically-scalable, managed computing power, storage,
platforms and services are delivered on demand to external
customers over Internet.” As an academic representative,
foster focuses on several technical featuresthatdifferentiate
cloud computing from other distributed computing
paradigms. For example, computing entities are virtualized
and delivered as services and these servicesaredynamically
driven by economies of scale.
A style of computing where scalable and elastic IT
capabilities are provided as a service to multiple external
customers using internet technologies. Garter is an IT
consulting company, so it examines qualities ofcloudmostly
from the point of view of industry. Functional characteristics
are emphasized in this definition, such as whether cloud
computing is scalable, elastic, service offering and Internet
based.
“Cloud computing is a modelforenabling convenient, on-
demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned
and released with minimal management effort or service
provider interaction.” Compared with other two definitions,
U.S. National Institute of Standards and Technologyprovides
a relatively more objective and specific definition, which not
only defines cloud concept overall, but also, specifies
essential characteristics of cloudcomputinganddeliveryand
deployment models.
2. REVIEW OF LITERATURE
A.Greenberg[1] hasproposedWide-area transfer
of large data sets is still a large challenge despite the
deployment of high-bandwidth networks with speeds
attainment 100 Gbps. Most users fail to obtain even a
fraction of hypothetical speeds promisedbythese networks.
Effective usage of the obtainable network capacity has turn
out to be increasingly important for wide-area data
movement. They have developed a “data transferscheduling
and optimization system as a Cloud-hosted service”, Stork
Cloud, which will mitigate the large-scale end-to-end data
group bottleneck by efficiently utilizing fundamental
networks and effectively preparation and optimizing data
transfer. In this paper, the authors present the initial design
and prototype performance of Stork Cloud,anddemonstrate
its efficiency in large dataset transfer across geographically
distant storage sites, data centres, and collaborating
institutions.
A. Thakar [2] has proposed Today’s continuously
growing cloud infrastructures provide support for
processing ever increasing amounts of scientific data. Cloud
resources for computation and storage are spread among
globally distributed data centres. Thus, to leverage the full
computation power of the clouds, global data processing
across multiple sites has to be fully enabled. However,
managing data across geographically distributed data
centres is not trivial as it involves high andvariablelatencies
among sites which come at a high monetary cost. In this
work, the author proposes a uniform data management
system for scientific applications running across
geographically distributed sites. This project solution is
environment aware, as it monitors and models the global
cloud infrastructure and offers predictable data handling
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 792
performance for transfer cost and time. In terms of
efficiency, it provides the applications with the possibilityto
set tradeoffs between money and time and optimizes the
transfer strategy accordingly. The system was validated on
Microsoft’s Azure cloud across the 6 EU and US data centres.
The experiments they are conducted on hundreds of nodes
using both synthetic benchmarks and the real life A-Brain
application. The results show that the project system is able
to model and predict they’ll the cloud performance and to
leverage this into efficient data dissemination. This project
approach reduces the monetary costs and transfer time by
up to 3 times.
B. Da Mota[3] has proposed this e-Science Central
(e-SC) cloud data processing system and its application to a
number e-Science projects. e-SC provides bothSoftwareand
Platform as a Service (SaaS/PaaS) for scientific data
management, analysis and collaboration. It is a portable
system and can be deployed onbothprivate(e.g.Eucalyptus)
and public clouds (Amazon AWS and Microsoft Windows
Azure ). The SaaS application allows scientists to upload
data, edit and run workflows and share results in the cloud
using only a web browser. It is underpinned by a scalable
cloud platform consisting of a set of componentsdesignedto
support the needs of scientists. The platform is exposed to
developers so that they can easily upload their own analysis
services into the system and make these available to other
users. A REST-based API is also provided so that external
applications can leverage the platform’s functionality,
making it easier to build scalable, secure cloud based
applications. This paper describes the design of e-SC, its API
and its use in three different case studies: spectral data
visualisation, medical data capture and analysis, and
chemical property prediction.
B. E. A. Calder [4] has proposed Dryad is a general-
purpose distributed execution engine for coarse-grain data-
parallel applications. A Dryad application combines
computational “vertices” with communication “channels”to
form a dataflow graph. Dryad runs the application by
executing the vertices of the graph on a set of available
computers, communicatingasappropriatethroughfiles, TCP
pipes and shared-memory FIFOs. The vertices provided by
the application developer are quite simple and are usually
written as sequential programs with no thread creation or
locking. Concurrency arises from Dryad scheduling vertices
to run simultaneously on multiple computers,oronmultiple
CPU cores within a computer. The application can discover
the size and placement of data at run time, and modify the
graph as the computation progresses tomakeefficientuseof
the available resources. Dryad is designed to scale from
powerful multi-core single computers, through small
clusters of computers, to data centres with thousands of
computers. The Dryad execution engine handles all the
difficult problems of creating a large distributed,concurrent
application: scheduling the use of computersandtheirCPUs,
recovering from communication or computer failures and
transporting data between vertices.
C. Guo [5] has proposed the widely discussed
scientific data deluge creates not only a need to
computationally scale an application from a local desktopor
cluster to a supercomputer, but also the need to cope with
variable data loads over time. Cloud computing offers a
scalable, economic, on-demand model well matched to the
evolving eScience needs. Yet cloud computing creates gaps
that must be crossed to move science applications to the
cloud. In this article, the authors proposed a GenericWorker
framework to deploy and invoke science applications in the
Cloud with minimal usereffortandpredictable,costeffective
performance. Their framework is an evolution of Grid
computing application factory pattern and addresses the
distinct challenges posed by the Cloud such as efficient data
transfers to and from the Cloud, and the transient nature of
its VMs. The authors present an implementation of the
Generic Worker for the Microsoft Azure Cloud and evaluate
its use in a genome sequencing application pipeline. The
results shows that the user overhead to port and run the
application seamlessly across desktop and the cloud can be
substantially reduced without significant performance
penalties, while providing on-demand scalability.
Brad Calder[6] has proposed Windows Azure
Storage (WAS) is a cloud storage system that provides
customers the ability to store seeminglylimitlessamountsof
data for any duration of time. WAS customers have access to
their data from anywhere at any time and only pay for what
they use and store. In WAS, data is stored durablyusingboth
local and geographic replication to facilitate disaster
recovery. Currently, WAS storage comes in the formofBlobs
(files), Tables (structured storage), and Queues (message
delivery). The paper described the WAS architecture, global
namespace, and data model, as they’ll as its resource
provisioning, load balancing, and replication systems.
3. METHODOLOGY
Their resource Scheduling approach is based on they
find many risks in a PG-CAN on multiple clouds. Still, there
are many practical and challenging issues for current multi-
cloud environments. The author presents a Montage-based
Pointer Gossip Content Addressable Network Montage
Frame Work for running workflows in a multi-Cloud
environment. The framework allows an automatic selection
of the target Clouds, a uniform access to the Clouds, and
workflow data management with respect to user Service
Level Agreement (SLA) requirements. Following a
simulation approach, they evaluated the framework with a
real scientific workflow application in different deployment
scenarios. The results show that the author Pointer Gossip
Content Addressable Network Montage Framework offers
benefits to users by executing workflows with the expected
performance and service quality at the lowest cost [12].
Those issues includerelativelylimitedcross-cloudnetwork
bandwidth and lacking of cloud standards among cloud
providers. It relies on the assumptionthatall qualifiednodes
must satisfy Inequalities in existing system. All the re-
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 793
projection jobs can be added to a pool of tasks and
performed by as many processorsasareavailable,exploiting
the parallelization inherent in the Montagearchitecture. The
author shows how they can describe the Montage
application in terms of an abstract workflow so that a
planning tool such as Pegasus can derive an executable
workflow that can be run in the Grid environment. The
execution of the workflow is performed by the workflow
manager DAG and the associatedSchedulinggraphs.Tomeet
this requirement, the author designs a resource discovery
protocol, namely Montage Approach, to find these qualified
nodes. The author proposes pg-can, a workflow scheduling
system in order to minimize the monetary cost of executing
the workflows in IaaS clouds. The main components of pg-
can are illustrated in Figure 3. When a user has specified the
probabilistic deadline requirement for a workflow, WaaS
providers schedule the workflow by choosing the cost-
effective instance types for each task in the workflow. The
overall functionality of the pg-can optimizations is to
determine the suitable instance configuration for each task
of a workflow so that the monetary cost is minimized whiles
the probabilistic performance requirement is satisfied. The
author formulates the optimization process as a search
problem, and develops a two-step approach to find the
solution efficiently. The instance configurations of the two
steps are illustrated in Figure 3. The author first adoptan A⋆
-based instance configuration approach to select the on-
demand instance type for each task of theworkflow,inorder
to minimize the monetary cost while satisfying the
probabilistic deadline guarantee. Second, starting from the
on-demand instance configuration, the author adopt the
hybrid instance configuration refinement to consider using
hybrid of both on-demand and spot instances for executing
tasks in order to further reduce cost. After the two
optimization steps, the tasks of the workflow are scheduled
to execute on the cloud according to their hybrid instance
configuration. At runtime, they maintain a pool of spot
instances and on-demand instances, organized in lists
according to different instance types. Instance
acquisition/release operations are performed in an auto-
scaling manner. For the instances that do not have any task
and are approaching multiples of full instance hours, they
release them and remove them from the pool. The author
schedule tasks to instances in the earliest-deadline-first
manner. When a task with the deadline residual of zero
requests an instance and the task are not consolidated to an
existing instance in the pool, they acquire a new instance
from the cloud provider, and add it into the pool. In their
experiment, for example, Amazon EC2 poses the capacity
limitation of 200 instances. If this cap is met, they cannot
acquire new instances until some instances are released [9].
They choose Pointer Gossip Content Addressable in
this system each node works as a dutynodeunderPG-CAN is
responsible for a unique multidimensional range zone
randomly selected when it joins the overlay.
4. RESULTS AND DISCUSSIONS
The system have two sets of experiments: firstly
calibrating the cloud dynamics from Amazon EC2 as the
input of optimization system; secondly running scientific
workflows on Amazon EC2 and a cloud simulator with the
compared algorithmsforevaluation.Theauthormeasure the
performance of CPU, I/O and network for four frequently
used instance types, namely m1.small,m1.medium,m1.large
and m1.xlarge. The author find that CPU performance is
rather stable, which is consistent with the previous studies.
Thus, the experiments focus on the calibration for I/O and
network performance. In particular, it repeats the
performance measurement on each kind of instance for 10,
000 times (once every minute in 7 days). When an instance
has been acquired for a full hour, it will be released and a
new instance of the same type will becreatedtocontinuethe
measurement [13].
The measurement results are used to model the
probabilistic distributions of I/O and network performance.
The author measure both sequential and random I/O
performance for local disks. The sequential I/O reads
performance is measured with hdparm. The random I/O
performance is measured by generatingrandomI/Oreadsof
512 bytes each. Reads and writes have similar performance
results, and they do not distinguish them in this study. The
author measure the uploading and downloading bandwidth
between different types of instances and Amazon S3. The
bandwidth is measured from uploading and downloading a
file to/from S3.
The author acquires the four measured types of
instances from the dataset using thecreatedAMI.Thehourly
costs of the on-demand instance for the two instance types
are shown in Table.
Those four instances have also been used in the
previous studies. As for the instanceacquisitiontime(lag),in
this experiment shows that each on demand instance
acquisition costs 2 minutes and spot instance acquisition
costs 7 minutes on average. This is consistent with the
existing studies. The deadline of workflows is an important
factor for the candidate space of determining the instance
configuration. There are two deadline settings with
particular interests: Dmin and Dmax,the expectedexecution
time of all the tasks in the critical path of the workflow all on
the m1.xlarge and m1.small instances, respectively. By
default, they set the deadline to be Dmin+Dmax
The authors assume there are many workflows
submitted by the users to the WaaS provider. In each
experiment, they submit 100 jobs of the same workflow
structure to the cloud. They assume the job arrival conforms
to a Poisson distribution. The parameter λ in the Poisson
distribution affects the chance for virtual machine reuse. By
default, they set λ as 0.1. As for metrics, they study the
average monetary cost and elapsed time for a workflow. All
the metrics in the figure 4.1are normalized to those ofStatic.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 794
Given the probabilistic deadline requirement, they run the
compared algorithms multipletimesonthecloudandrecord
their monetary cost and execution time.
In this module the dynamic optimal proportional-
share resource allocation method, which leverages the
proportional share model? The key idea to redistribute
available resources among running tasks dynamically, such
that these tasks could use up the maximum capacity of each
resource in a node, while each task’s execution time can be
further minimized in a fair way. DOPS consists of two main
procedures:
1) Slice handler: It is activated periodically to equally
scale the amount of resources allocated to tasks, such
that each running task can acquire additional resources
proportional to their demand along each resource
dimension.
2)Event handler: It is responsible for resource
redistribution upon the events of task arrival and
completion.
RESULT
Figure 4.1
5. CONCLUSION AND FUTUREWORK
The proposed approach processing consists of not
just one application but some applicationscombinedtoform
a workflow to reach a certain goal. The existing approach
does not work with large data difference and at different
speed, but their work will focus applications' execution and
resource needs will also vary at runtime. These are called
dynamic workflows. One can say that they can just throw
more and more resources during runtime [10].
Applied to proficiently schedule computation jobs
among processing resources ontotheclouddata centre’sina
way to reduce implementation time byspreadingthejobson
to available resources. The proposedwork istocreatepower
efficient clusters in cloud data centre’s that shows that
cluster creation that helps in decreasing the power
consumption as compared to other available algorithm.
Their explored the performance variation under different
workloads and supply plans. Their also center a couple of
data aware scheduling policies and evaluated them with
positive results. The future work should focus for this
problem of economic power dispatch to obtaina systemthat
is fast and more robust.
REFERECES
[1] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel,
“The cost of a cloud: research problems in data centre
networks,” SIGCOMM Comput. Commun. Rev., vol. 39,
no. 1, pp. 68–73, Dec. 2008.
[2] A. Thakar and A. Szalay, “Migrating a (Large)
Science Database to the Cloud”, 1st Workshop on
Scientific Cloud Computing, June 2010.
[3] B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G.
Brasche, P. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V.
Frouin, J.-B. Poline, G. Antoniu, and B. Thirion, “Generic
machine learning pattern for neuroimaging-genetic
studies in the cloud,” Frontiers in Neuroinformatics,vol.
8, no. 31, 2014.
[4] B. E. A. Calder, “Windows azure storage: a highly
available cloud storage servicewithstrongconsistency,”
in Proceedings of the Ttheynty-Third ACM Symposium
on Operating Systems Principles,ser.SOSP’11,2011,pp.
143–157.
[5] C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, and S. Luz.
Dcell: A scalable andfault-tolerantnetwork structurefor
data centers. In SIGCOMM, 2008.
[6] C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D.
Wischik, and M. Handley, “Data centre networking with
multipath tcp,” in Proceedings of the 9th ACM SIGCOMM
Workshop on Hot Topics in Networks, ser. Hotnets-IX,
2010, pp. 10:1–10:6.
[7] E. S. Ogasawara, J. Dias, V. Silva, F. S.Chirigati,D.de
Oliveira, F. Porto, P. Valduriez, and M. Mattoso, “Chiron:
a parallel engine for algebraic scientific workflows,”
Concurrency and Computation:PracticeandExperience,
vol. 25, no. 16, pp. 2327–2341, 2013.[Online].Available:
http://dx.doi.org/10.1002/cpe.3032.
[8] E. Yildirim, J. Kim, and T. Kosar, Optimizing the
sample size for a cloud-hosted data scheduling service.
In Proc. of CCSA Workshop (2012).
[9] G. Khanna, U. Catalyurek, T. Kurc, R. Kettimuthu,P.
Sadayappan, and J. Saltz, “A dynamic scheduling
approach for coordinated wide-area data transfersusing
gridftp,” in Parallel and Distributed Processing, 2008.
IPDPS 2008., 2008, pp. 1–12.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 795
[10] H.Hiden, S. Woodman, P.Watson, and J.Caa,
“Developing cloud applications using the e-science
central platform.” In Proceedings of Royal Society A,
2012.
[11] J. Dean and S. Ghemawat, “Mapreduce: simplified
data processing on large clusters,” Commun. ACM, vol.
51, no. 1, pp. 107–113, Jan. 2008.
[12] J. Yu, R. Buyya, and R. Kotagiri, (2008)
Metaheuristics for Scheduling in DistributedComputing
Environments. In Workflow Scheduling Algorithms for
Grid Computing, pp. 173–214. Springer, Berlin,
Germany.
[13] K. Jackson, L. Ramakrishnan, K. Runge, R. Thomas,
“Seeking Supernovae in the Clouds: A Performance
Study”, 1st Workshop on Scientific Cloud Computing,
June 2010.

More Related Content

PDF
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
PDF
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
DOCX
Cloud colonography distributed medical testbed over cloud
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
Cloud Computing: A Perspective on Next Basic Utility in IT World
PDF
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
PDF
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
PDF
Service oriented cloud architecture for improved
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
Cloud colonography distributed medical testbed over cloud
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Cloud Computing: A Perspective on Next Basic Utility in IT World
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Service oriented cloud architecture for improved

What's hot (20)

PDF
Service oriented cloud architecture for improved performance of smart grid ap...
PDF
An efficient resource sharing technique for multi-tenant databases
PDF
50120140502008
PDF
Performance evaluation and estimation model using regression method for hadoo...
PDF
Scheduling in Virtual Infrastructure for High-Throughput Computing
PDF
A 01
PDF
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
PDF
A premeditated cdm algorithm in cloud computing environment for fpm 2
PPTX
Grid computing
PDF
An Algorithm to synchronize the local database with cloud Database
PDF
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
PDF
Toward a real time framework in cloudlet-based architecture
PDF
Cloud middleware and services-a systematic mapping review
DOC
Grid computing 12
PDF
Resource Allocation for Task Using Fair Share Scheduling Algorithm
DOC
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
PPT
A Survey on Resource Allocation & Monitoring in Cloud Computing
PDF
11. grid scheduling and resource managament
PPTX
Unit i introduction to grid computing
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
Service oriented cloud architecture for improved performance of smart grid ap...
An efficient resource sharing technique for multi-tenant databases
50120140502008
Performance evaluation and estimation model using regression method for hadoo...
Scheduling in Virtual Infrastructure for High-Throughput Computing
A 01
An Efficient Cloud Scheduling Algorithm for the Conservation of Energy throug...
A premeditated cdm algorithm in cloud computing environment for fpm 2
Grid computing
An Algorithm to synchronize the local database with cloud Database
IRJET- Optimization of Completion Time through Efficient Resource Allocation ...
Toward a real time framework in cloudlet-based architecture
Cloud middleware and services-a systematic mapping review
Grid computing 12
Resource Allocation for Task Using Fair Share Scheduling Algorithm
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
A Survey on Resource Allocation & Monitoring in Cloud Computing
11. grid scheduling and resource managament
Unit i introduction to grid computing
Welcome to International Journal of Engineering Research and Development (IJERD)
Ad

Similar to IRJET- Cost Effective Workflow Scheduling in Bigdata (20)

PDF
Introduction to Cloud Computing
PDF
Guaranteed Availability of Cloud Data with Efficient Cost
PDF
Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud c...
PDF
Implementing K-Out-Of-N Computing For Fault Tolerant Processing In Mobile and...
PDF
Energy-Efficient Task Scheduling in Cloud Environment
PDF
Improving Cloud Performance through Performance Based Load Balancing Approach
PDF
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
PDF
Review and Classification of Cloud Computing Research
PDF
Privacy preserving public auditing for secured cloud storage
PDF
Toward Cloud Network Infrastructure Approach Service and Security Perspective
PDF
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
PDF
Cloud Computing- future framework for e- management of NGO's
PDF
Am36234239
PDF
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
PDF
Flaw less coding and authentication of user data using multiple clouds
PDF
Addressing the cloud computing security menace
PDF
Addressing the cloud computing security menace
PDF
Addressing the cloud computing security menace
PDF
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
Introduction to Cloud Computing
Guaranteed Availability of Cloud Data with Efficient Cost
Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud c...
Implementing K-Out-Of-N Computing For Fault Tolerant Processing In Mobile and...
Energy-Efficient Task Scheduling in Cloud Environment
Improving Cloud Performance through Performance Based Load Balancing Approach
IRJET- A Workflow Management System for Scalable Data Mining on Clouds
Review and Classification of Cloud Computing Research
Privacy preserving public auditing for secured cloud storage
Toward Cloud Network Infrastructure Approach Service and Security Perspective
IRJET- Improving Data Availability by using VPC Strategy in Cloud Environ...
Cloud Computing- future framework for e- management of NGO's
Am36234239
IRJET- A Detailed Study and Analysis of Cloud Computing Usage with Real-Time ...
Flaw less coding and authentication of user data using multiple clouds
Addressing the cloud computing security menace
Addressing the cloud computing security menace
Addressing the cloud computing security menace
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Welding lecture in detail for understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Well-logging-methods_new................
PDF
composite construction of structures.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
PPT on Performance Review to get promotions
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Welding lecture in detail for understanding
Internet of Things (IOT) - A guide to understanding
Lecture Notes Electrical Wiring System Components
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Arduino robotics embedded978-1-4302-3184-4.pdf
Well-logging-methods_new................
composite construction of structures.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
UNIT 4 Total Quality Management .pptx
bas. eng. economics group 4 presentation 1.pptx
PPT on Performance Review to get promotions
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...

IRJET- Cost Effective Workflow Scheduling in Bigdata

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 791 COST EFFECTIVE WORKFLOW SCHEDULING IN BIGDATA V.M. PAVITHRA1, Dr. S.M. JAGATHEESAN2 1M.Phil Research Scholar, Dept. of Computer Science, Gobi Arts & Science College, Gobichettipalayam, TamilNadu, India 2Associate Professor, Dept. of Computer Science, Gobi Arts & Science College, Gobichettipalayam, TamilNadu, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Computational science workflows have been successfully run on traditional High Performance Computing (HPC) systems like clusters and Grids for many years. Now a day, users are interested to execute their workflow applications in the Cloud to exploittheeconomicandtechnical benefits of this new rising technology. The deployment and management of workflows over the current existing heterogeneous and not yet interoperable Cloud providers, is still a challenging task for the workflow developers. The Pointer Gossip Content Addressable Network Montage Framework allows an automatic selectionofthegoalclouds, a uniform get entry to the clouds, and workflow data management with respect to user Service Level Agreement (SLA) Requirements. Consequently, a number of studies, focusing on different aspects, emerged in the literature. Inthis comparative review on workflow scheduling algorithm cloud environment is provide solution fortheproblems. Basedon the analysis, the authors also highlight some research directions for future investigation. The previous results offer benefits to users by executing workflows with the expected performance and service quality at lowest cost. Key Words: Montage framework,Pointergossip,Content addressable network, Scheduling. 1. INTRODUCTION Since 2007, the term cloud has become one of the most buzz words in IT industry. Lots of researchers try to define cloud computing from different application aspects, but there is not a consensus definition on it. Among the many definitions, three widely quoted as follows “A large-scale distributed computing paradigmthatisdrivenby economies of scale, in which a pool of abstracted virtualized, dynamically-scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over Internet.” As an academic representative, foster focuses on several technical featuresthatdifferentiate cloud computing from other distributed computing paradigms. For example, computing entities are virtualized and delivered as services and these servicesaredynamically driven by economies of scale. A style of computing where scalable and elastic IT capabilities are provided as a service to multiple external customers using internet technologies. Garter is an IT consulting company, so it examines qualities ofcloudmostly from the point of view of industry. Functional characteristics are emphasized in this definition, such as whether cloud computing is scalable, elastic, service offering and Internet based. “Cloud computing is a modelforenabling convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Compared with other two definitions, U.S. National Institute of Standards and Technologyprovides a relatively more objective and specific definition, which not only defines cloud concept overall, but also, specifies essential characteristics of cloudcomputinganddeliveryand deployment models. 2. REVIEW OF LITERATURE A.Greenberg[1] hasproposedWide-area transfer of large data sets is still a large challenge despite the deployment of high-bandwidth networks with speeds attainment 100 Gbps. Most users fail to obtain even a fraction of hypothetical speeds promisedbythese networks. Effective usage of the obtainable network capacity has turn out to be increasingly important for wide-area data movement. They have developed a “data transferscheduling and optimization system as a Cloud-hosted service”, Stork Cloud, which will mitigate the large-scale end-to-end data group bottleneck by efficiently utilizing fundamental networks and effectively preparation and optimizing data transfer. In this paper, the authors present the initial design and prototype performance of Stork Cloud,anddemonstrate its efficiency in large dataset transfer across geographically distant storage sites, data centres, and collaborating institutions. A. Thakar [2] has proposed Today’s continuously growing cloud infrastructures provide support for processing ever increasing amounts of scientific data. Cloud resources for computation and storage are spread among globally distributed data centres. Thus, to leverage the full computation power of the clouds, global data processing across multiple sites has to be fully enabled. However, managing data across geographically distributed data centres is not trivial as it involves high andvariablelatencies among sites which come at a high monetary cost. In this work, the author proposes a uniform data management system for scientific applications running across geographically distributed sites. This project solution is environment aware, as it monitors and models the global cloud infrastructure and offers predictable data handling
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 792 performance for transfer cost and time. In terms of efficiency, it provides the applications with the possibilityto set tradeoffs between money and time and optimizes the transfer strategy accordingly. The system was validated on Microsoft’s Azure cloud across the 6 EU and US data centres. The experiments they are conducted on hundreds of nodes using both synthetic benchmarks and the real life A-Brain application. The results show that the project system is able to model and predict they’ll the cloud performance and to leverage this into efficient data dissemination. This project approach reduces the monetary costs and transfer time by up to 3 times. B. Da Mota[3] has proposed this e-Science Central (e-SC) cloud data processing system and its application to a number e-Science projects. e-SC provides bothSoftwareand Platform as a Service (SaaS/PaaS) for scientific data management, analysis and collaboration. It is a portable system and can be deployed onbothprivate(e.g.Eucalyptus) and public clouds (Amazon AWS and Microsoft Windows Azure ). The SaaS application allows scientists to upload data, edit and run workflows and share results in the cloud using only a web browser. It is underpinned by a scalable cloud platform consisting of a set of componentsdesignedto support the needs of scientists. The platform is exposed to developers so that they can easily upload their own analysis services into the system and make these available to other users. A REST-based API is also provided so that external applications can leverage the platform’s functionality, making it easier to build scalable, secure cloud based applications. This paper describes the design of e-SC, its API and its use in three different case studies: spectral data visualisation, medical data capture and analysis, and chemical property prediction. B. E. A. Calder [4] has proposed Dryad is a general- purpose distributed execution engine for coarse-grain data- parallel applications. A Dryad application combines computational “vertices” with communication “channels”to form a dataflow graph. Dryad runs the application by executing the vertices of the graph on a set of available computers, communicatingasappropriatethroughfiles, TCP pipes and shared-memory FIFOs. The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers,oronmultiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses tomakeefficientuseof the available resources. Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centres with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed,concurrent application: scheduling the use of computersandtheirCPUs, recovering from communication or computer failures and transporting data between vertices. C. Guo [5] has proposed the widely discussed scientific data deluge creates not only a need to computationally scale an application from a local desktopor cluster to a supercomputer, but also the need to cope with variable data loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to the evolving eScience needs. Yet cloud computing creates gaps that must be crossed to move science applications to the cloud. In this article, the authors proposed a GenericWorker framework to deploy and invoke science applications in the Cloud with minimal usereffortandpredictable,costeffective performance. Their framework is an evolution of Grid computing application factory pattern and addresses the distinct challenges posed by the Cloud such as efficient data transfers to and from the Cloud, and the transient nature of its VMs. The authors present an implementation of the Generic Worker for the Microsoft Azure Cloud and evaluate its use in a genome sequencing application pipeline. The results shows that the user overhead to port and run the application seamlessly across desktop and the cloud can be substantially reduced without significant performance penalties, while providing on-demand scalability. Brad Calder[6] has proposed Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seeminglylimitlessamountsof data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durablyusingboth local and geographic replication to facilitate disaster recovery. Currently, WAS storage comes in the formofBlobs (files), Tables (structured storage), and Queues (message delivery). The paper described the WAS architecture, global namespace, and data model, as they’ll as its resource provisioning, load balancing, and replication systems. 3. METHODOLOGY Their resource Scheduling approach is based on they find many risks in a PG-CAN on multiple clouds. Still, there are many practical and challenging issues for current multi- cloud environments. The author presents a Montage-based Pointer Gossip Content Addressable Network Montage Frame Work for running workflows in a multi-Cloud environment. The framework allows an automatic selection of the target Clouds, a uniform access to the Clouds, and workflow data management with respect to user Service Level Agreement (SLA) requirements. Following a simulation approach, they evaluated the framework with a real scientific workflow application in different deployment scenarios. The results show that the author Pointer Gossip Content Addressable Network Montage Framework offers benefits to users by executing workflows with the expected performance and service quality at the lowest cost [12]. Those issues includerelativelylimitedcross-cloudnetwork bandwidth and lacking of cloud standards among cloud providers. It relies on the assumptionthatall qualifiednodes must satisfy Inequalities in existing system. All the re-
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 793 projection jobs can be added to a pool of tasks and performed by as many processorsasareavailable,exploiting the parallelization inherent in the Montagearchitecture. The author shows how they can describe the Montage application in terms of an abstract workflow so that a planning tool such as Pegasus can derive an executable workflow that can be run in the Grid environment. The execution of the workflow is performed by the workflow manager DAG and the associatedSchedulinggraphs.Tomeet this requirement, the author designs a resource discovery protocol, namely Montage Approach, to find these qualified nodes. The author proposes pg-can, a workflow scheduling system in order to minimize the monetary cost of executing the workflows in IaaS clouds. The main components of pg- can are illustrated in Figure 3. When a user has specified the probabilistic deadline requirement for a workflow, WaaS providers schedule the workflow by choosing the cost- effective instance types for each task in the workflow. The overall functionality of the pg-can optimizations is to determine the suitable instance configuration for each task of a workflow so that the monetary cost is minimized whiles the probabilistic performance requirement is satisfied. The author formulates the optimization process as a search problem, and develops a two-step approach to find the solution efficiently. The instance configurations of the two steps are illustrated in Figure 3. The author first adoptan A⋆ -based instance configuration approach to select the on- demand instance type for each task of theworkflow,inorder to minimize the monetary cost while satisfying the probabilistic deadline guarantee. Second, starting from the on-demand instance configuration, the author adopt the hybrid instance configuration refinement to consider using hybrid of both on-demand and spot instances for executing tasks in order to further reduce cost. After the two optimization steps, the tasks of the workflow are scheduled to execute on the cloud according to their hybrid instance configuration. At runtime, they maintain a pool of spot instances and on-demand instances, organized in lists according to different instance types. Instance acquisition/release operations are performed in an auto- scaling manner. For the instances that do not have any task and are approaching multiples of full instance hours, they release them and remove them from the pool. The author schedule tasks to instances in the earliest-deadline-first manner. When a task with the deadline residual of zero requests an instance and the task are not consolidated to an existing instance in the pool, they acquire a new instance from the cloud provider, and add it into the pool. In their experiment, for example, Amazon EC2 poses the capacity limitation of 200 instances. If this cap is met, they cannot acquire new instances until some instances are released [9]. They choose Pointer Gossip Content Addressable in this system each node works as a dutynodeunderPG-CAN is responsible for a unique multidimensional range zone randomly selected when it joins the overlay. 4. RESULTS AND DISCUSSIONS The system have two sets of experiments: firstly calibrating the cloud dynamics from Amazon EC2 as the input of optimization system; secondly running scientific workflows on Amazon EC2 and a cloud simulator with the compared algorithmsforevaluation.Theauthormeasure the performance of CPU, I/O and network for four frequently used instance types, namely m1.small,m1.medium,m1.large and m1.xlarge. The author find that CPU performance is rather stable, which is consistent with the previous studies. Thus, the experiments focus on the calibration for I/O and network performance. In particular, it repeats the performance measurement on each kind of instance for 10, 000 times (once every minute in 7 days). When an instance has been acquired for a full hour, it will be released and a new instance of the same type will becreatedtocontinuethe measurement [13]. The measurement results are used to model the probabilistic distributions of I/O and network performance. The author measure both sequential and random I/O performance for local disks. The sequential I/O reads performance is measured with hdparm. The random I/O performance is measured by generatingrandomI/Oreadsof 512 bytes each. Reads and writes have similar performance results, and they do not distinguish them in this study. The author measure the uploading and downloading bandwidth between different types of instances and Amazon S3. The bandwidth is measured from uploading and downloading a file to/from S3. The author acquires the four measured types of instances from the dataset using thecreatedAMI.Thehourly costs of the on-demand instance for the two instance types are shown in Table. Those four instances have also been used in the previous studies. As for the instanceacquisitiontime(lag),in this experiment shows that each on demand instance acquisition costs 2 minutes and spot instance acquisition costs 7 minutes on average. This is consistent with the existing studies. The deadline of workflows is an important factor for the candidate space of determining the instance configuration. There are two deadline settings with particular interests: Dmin and Dmax,the expectedexecution time of all the tasks in the critical path of the workflow all on the m1.xlarge and m1.small instances, respectively. By default, they set the deadline to be Dmin+Dmax The authors assume there are many workflows submitted by the users to the WaaS provider. In each experiment, they submit 100 jobs of the same workflow structure to the cloud. They assume the job arrival conforms to a Poisson distribution. The parameter λ in the Poisson distribution affects the chance for virtual machine reuse. By default, they set λ as 0.1. As for metrics, they study the average monetary cost and elapsed time for a workflow. All the metrics in the figure 4.1are normalized to those ofStatic.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 794 Given the probabilistic deadline requirement, they run the compared algorithms multipletimesonthecloudandrecord their monetary cost and execution time. In this module the dynamic optimal proportional- share resource allocation method, which leverages the proportional share model? The key idea to redistribute available resources among running tasks dynamically, such that these tasks could use up the maximum capacity of each resource in a node, while each task’s execution time can be further minimized in a fair way. DOPS consists of two main procedures: 1) Slice handler: It is activated periodically to equally scale the amount of resources allocated to tasks, such that each running task can acquire additional resources proportional to their demand along each resource dimension. 2)Event handler: It is responsible for resource redistribution upon the events of task arrival and completion. RESULT Figure 4.1 5. CONCLUSION AND FUTUREWORK The proposed approach processing consists of not just one application but some applicationscombinedtoform a workflow to reach a certain goal. The existing approach does not work with large data difference and at different speed, but their work will focus applications' execution and resource needs will also vary at runtime. These are called dynamic workflows. One can say that they can just throw more and more resources during runtime [10]. Applied to proficiently schedule computation jobs among processing resources ontotheclouddata centre’sina way to reduce implementation time byspreadingthejobson to available resources. The proposedwork istocreatepower efficient clusters in cloud data centre’s that shows that cluster creation that helps in decreasing the power consumption as compared to other available algorithm. Their explored the performance variation under different workloads and supply plans. Their also center a couple of data aware scheduling policies and evaluated them with positive results. The future work should focus for this problem of economic power dispatch to obtaina systemthat is fast and more robust. REFERECES [1] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel, “The cost of a cloud: research problems in data centre networks,” SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 68–73, Dec. 2008. [2] A. Thakar and A. Szalay, “Migrating a (Large) Science Database to the Cloud”, 1st Workshop on Scientific Cloud Computing, June 2010. [3] B. Da Mota, R. Tudoran, A. Costan, G. Varoquaux, G. Brasche, P. Conrod, H. Lemaitre, T. Paus, M. Rietschel, V. Frouin, J.-B. Poline, G. Antoniu, and B. Thirion, “Generic machine learning pattern for neuroimaging-genetic studies in the cloud,” Frontiers in Neuroinformatics,vol. 8, no. 31, 2014. [4] B. E. A. Calder, “Windows azure storage: a highly available cloud storage servicewithstrongconsistency,” in Proceedings of the Ttheynty-Third ACM Symposium on Operating Systems Principles,ser.SOSP’11,2011,pp. 143–157. [5] C. Guo, H. Wu, K. Tan, L. Shiy, Y. Zhang, and S. Luz. Dcell: A scalable andfault-tolerantnetwork structurefor data centers. In SIGCOMM, 2008. [6] C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D. Wischik, and M. Handley, “Data centre networking with multipath tcp,” in Proceedings of the 9th ACM SIGCOMM Workshop on Hot Topics in Networks, ser. Hotnets-IX, 2010, pp. 10:1–10:6. [7] E. S. Ogasawara, J. Dias, V. Silva, F. S.Chirigati,D.de Oliveira, F. Porto, P. Valduriez, and M. Mattoso, “Chiron: a parallel engine for algebraic scientific workflows,” Concurrency and Computation:PracticeandExperience, vol. 25, no. 16, pp. 2327–2341, 2013.[Online].Available: http://dx.doi.org/10.1002/cpe.3032. [8] E. Yildirim, J. Kim, and T. Kosar, Optimizing the sample size for a cloud-hosted data scheduling service. In Proc. of CCSA Workshop (2012). [9] G. Khanna, U. Catalyurek, T. Kurc, R. Kettimuthu,P. Sadayappan, and J. Saltz, “A dynamic scheduling approach for coordinated wide-area data transfersusing gridftp,” in Parallel and Distributed Processing, 2008. IPDPS 2008., 2008, pp. 1–12.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 04 | Apr-2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 795 [10] H.Hiden, S. Woodman, P.Watson, and J.Caa, “Developing cloud applications using the e-science central platform.” In Proceedings of Royal Society A, 2012. [11] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008. [12] J. Yu, R. Buyya, and R. Kotagiri, (2008) Metaheuristics for Scheduling in DistributedComputing Environments. In Workflow Scheduling Algorithms for Grid Computing, pp. 173–214. Springer, Berlin, Germany. [13] K. Jackson, L. Ramakrishnan, K. Runge, R. Thomas, “Seeking Supernovae in the Clouds: A Performance Study”, 1st Workshop on Scientific Cloud Computing, June 2010.