Grid Cloud and Cluster Computing 1st Edition
Hamid R. Arabnia pdf download
https://ebookfinal.com/download/grid-cloud-and-cluster-
computing-1st-edition-hamid-r-arabnia/
Explore and download more ebooks or textbooks
at ebookfinal.com
Here are some recommended products for you. Click the link to
download, or explore more at ebookfinal
Parallel and Distributed Processing Techniques and
Applications 1st Edition Hamid R. Arabnia
https://ebookfinal.com/download/parallel-and-distributed-processing-
techniques-and-applications-1st-edition-hamid-r-arabnia/
Embedded Systems Cyber Physical Systems and Applications
1st Edition Hamid R. Arabnia
https://ebookfinal.com/download/embedded-systems-cyber-physical-
systems-and-applications-1st-edition-hamid-r-arabnia/
Grid Computing A Research Monograph 1st Edition Janakiram
https://ebookfinal.com/download/grid-computing-a-research-
monograph-1st-edition-janakiram/
Introduction to Grid Computing 1st Edition Frederic
Magoules
https://ebookfinal.com/download/introduction-to-grid-computing-1st-
edition-frederic-magoules/
Cloud Computing Solutions 1st Edition Souvik Pal
https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/
Grid Computing The New Frontier of High Performance
Computing 1st Edition Lucio Grandinetti (Eds.)
https://ebookfinal.com/download/grid-computing-the-new-frontier-of-
high-performance-computing-1st-edition-lucio-grandinetti-eds/
Microsoft Private Cloud Computing 1st Edition Aidan Finn
https://ebookfinal.com/download/microsoft-private-cloud-computing-1st-
edition-aidan-finn/
Fundamentals of Grid Computing Theory Algorithms and
Technologies 1st Edition Frederic Magoules
https://ebookfinal.com/download/fundamentals-of-grid-computing-theory-
algorithms-and-technologies-1st-edition-frederic-magoules/
OpenStack Cloud Computing Cookbook 3rd Edition Over 110
effective recipes to help you build and operate OpenStack
cloud computing storage networking and automation Kevin
Jackson
https://ebookfinal.com/download/openstack-cloud-computing-
cookbook-3rd-edition-over-110-effective-recipes-to-help-you-build-and-
operate-openstack-cloud-computing-storage-networking-and-automation-
kevin-jackson/
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
Grid Cloud and Cluster Computing 1st Edition Hamid R.
Arabnia Digital Instant Download
Author(s): Hamid R. Arabnia; Leonidas Deligiannidis; Fernando G. Tinetti
ISBN(s): 9781683925699, 1683925696
Edition: 1
File Details: PDF, 1.79 MB
Year: 2017
Language: english
P u b l i c a t i o n o f t h e 2 0 1 9 W o r l d C o n g r e s s i n C o m p u t e r S c i e n c e ,
C o m p u t e r E n g i n e e r i n g , & A p p l i e d C o m p u t i n g ( C S C E ’ 1 9 )
J u l y 2 9 - A u g u s t 0 1 , 2 0 1 9 | L a s V e g a s , N e v a d a , U S A
h t t p s : / / a m e r i c a n c s e . o r g / e v e n t s / c s c e 2 0 1 9
Copyright © 2019 CSREA Press
GCC’19
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON
GRID, CLOUD, & CLUSTER COMPUTING
Editors
Hamid R. Arabnia
Leonidas Deligiannidis, Fernando G. Tinetti
WORLDCOMP’19
Grid,
Cloud,
and
Cluster
Computing
Arabnia
324993
781601
9
ISBN 9781601324993
54995
U.S. $49.95
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON
GRID, CLOUD, & CLUSTER COMPUTING
EMBD-GCC19_Full-Cover.indd All Pages 18-Feb-20 5:28:50 PM
American Council on Science and Education (ACSE)
Copyright © 2019 CSREA Press
ISBN: 1-60132-499-5
Printed in the United States of America
https://americancse.org/events/csce2019/proceedings
This volume contains papers presented at the 2019 International Conference on Grid, Cloud, & Cluster
Computing. Their inclusion in this publication does not necessarily constitute endorsements by editors or by the
publisher.
Copyright and Reprint Permission
Copying without a fee is permitted provided that the copies are not made or distributed for direct
commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source.
Please contact the publisher for other copying, reprint, or republication permission.
Foreword
It gives us great pleasure to introduce this collection of papers to be presented at the 2019 International
Conference on Grid, Cloud, and Cluster Computing (GCC’19), July 29 – August 1, 2019, at Luxor Hotel (a
property of MGM Resorts International), Las Vegas, USA. The preliminary edition of this book (available
in July 2019 for distribution on site at the conference) includes only a small subset of the accepted research
articles. The final edition (available in August 2019) will include all accepted research articles. This is due
to deadline extension requests received from most authors who wished to continue enhancing the write-up
of their papers (by incorporating the referees’ suggestions). The final edition of the proceedings will be
made available at https://americancse.org/events/csce2019/proceedings .
An important mission of the World Congress in Computer Science, Computer Engineering, and Applied
Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a
unique platform for a diverse community of constituents composed of scholars, researchers, developers,
educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated
with diverse entities (such as: universities, institutions, corporations, government agencies, and research
centers/labs) from all over the world. The congress also attempts to connect participants from institutions
that have teaching as their main mission with those who are affiliated with institutions that have research
as their main mission. The congress uses a quota system to achieve its institution and geography diversity
objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in
USA. We are proud to report that this federated congress has authors and participants from 67 different
nations representing variety of personal and scientific experiences that arise from differences in culture and
values. As can be seen (see below), the program committee of this conference as well as the program
committee of all other tracks of the federated congress are as diverse as its authors and participants.
The program committee would like to thank all those who submitted papers for consideration. About 70%
of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two
experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory
recommendations, a member of the conference program committee was charged to make the final decision;
often, this involved seeking help from additional referees. In addition, papers whose authors included a
member of the conference program committee were evaluated using the double-blinded review process.
One exception to the above evaluation process was for papers that were submitted directly to
chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were
responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers
was 18%; 20% of the remaining papers were accepted as poster papers (at the time of this writing, we had
not yet received the acceptance rate for a couple of individual tracks.)
We are very grateful to the many colleagues who offered their services in organizing the conference. In
particular, we would like to thank the members of Program Committee of GCC’19, members of the
congress Steering Committee, and members of the committees of federated congress tracks that have topics
within the scope of GCC. Many individuals listed below, will be requested after the conference to provide
their expertise and services for selecting papers for publication (extended versions) in journal special
issues as well as for publication in a set of research books (to be prepared for publishers including:
Springer, Elsevier, BMC journals, and others).
 Prof. Emeritus Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical
and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of
Detroit Mercy, Detroit, Michigan, USA
 Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS,
MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer);
Editor-in-Chief, Transactions of Computational Science & Computational Intelligence (Springer);
Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research
(CENTRIC).
 Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV),
Department of Computer Engineering (DISCA), Valencia, Spain
 Prof. Emeritus Kevin Daimi (Congress Steering Committee); Director, Computer Science and
Software Engineering Programs, Department of Mathematics, Computer Science and Software
Engineering, University of Detroit Mercy, Detroit, Michigan, USA
 Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer
Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting
Professor, MIT, USA
 Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of
Engineering Practice, University of Southern California, California, USA; Adjunct Professor,
Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California,
USA
 Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of
Engineering, Adamson University, Manila, Philippines; Senior Member, International Association
of Computer Science and Information Technology (IACSIT), Singapore; Member, International
Association of Online Engineering (IAOE), Austria
 Prof. Hyo Jong Lee; Director, Center for Advanced Image and Information Technology, Division
of Computer Science and Engineering, Chonbuk National University, South Korea
 Dr. Ali Mostafaeipour; Industrial Engineering Department, Yazd University, Yazd, Iran
 Dr. Houssem Eddine Nouri; Informatics Applied in Management, Institut Superieur de Gestion de
Tunis, University of Tunis, Tunisia
 Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of
Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli
University, Edo State, Nigeria
 Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer,
Maverick Technologies America Inc.
 Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science,
Universidad Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones
Cientificas de la Prov. de Bs. As., Argentina
 Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National
Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean
Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA
 Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The
Hong Kong Polytechnic University, Kowloon, Hong Kong
 Dr. Farhana H. Zulkernine; Coordinator of the Cognitive Science Program, School of Computing,
Queen's University, Kingston, ON, Canada
We would like to extend our appreciation to the referees, the members of the program committees of
individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on
the web sites of individual tracks.
As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons)
provided help for at least one track of the Congress: Computer Science Research, Education, and
Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science &
Education & Federated Research Council (http://www.americancse.org/). In addition, a number of
university faculty members and their staff (names appear on the cover of the set of proceedings), several
publishers of computer science and computer engineering books and journals, chapters and/or task forces of
computer science associations/organizations from 3 regions, and developers of high-performance machines
and systems provided significant help in organizing the conference as well as providing some resources.
We are grateful to them all.
We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the
list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS
(Universal Conference Management Systems & Support, California, USA) for managing all aspects of the
conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the
staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last
but not least, we would like to thank the Co-Editors of GCC’19: Prof. Hamid R. Arabnia, Prof. Leonidas
Deligiannidis, and Prof. Fernando G. Tinetti.
We present the proceedings of GCC’19.
Steering Committee, 2019
http://americancse.org/
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
Contents
SESSION: HIGH-PERFORMANCE COMPUTING - CLOUD COMPUTING
The Design and Implementation of Astronomical Data Analysis System on HPC Cloud 3
Jaegyoon Hahm, Ju-Won Park, Hyeyoung Cho, Min-Su Shin, Chang Hee Ree
SESSION: HIGH-PERFORMANCE COMPUTING - HADOOP FRAMEWORK
A Speculation and Prefetching Model for Efficient Computation of MapReduce Tasks on
Hadoop HDFS System
9
Lan Yang
15
SESSION: LATE BREAKING PAPER: CLOUD MIGRATION
Critical Risk Management Practices to Mitigate Cloud Migration Misconfigurations
Michael Atadika, Karen Burke, Neil Rowe
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
SESSION
HIGH-PERFORMANCE COMPUTING - CLOUD
COMPUTING
Chair(s)
TBA
Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 1
ISBN: 1-60132-499-5, CSREA Press ©
2 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
The Design and Implementation of Astronomical Data
Analysis System on HPC Cloud
Jaegyoon Hahm1
, Ju-Won Park1
, Hyeyoung Cho1
, Min-Su Shin2
, and Chang Hee Ree2
1
Supercomputing Infrastructure Center, Korea Institute of Science and Technology Information, Daejeon,
Republic of Korea
2
Galaxy Evolution Research Group, Korea Astronomy and Space Science Institute, Daejeon, Republic of
Korea
Abstract - Astronomy is a representative data-intensive
science that can take advantage of cloud computing because
it requires flexible infrastructure services for variable
workloads and various data analysis tools. The purpose of
this study is to show the usefulness of cloud computing as a
research environment required to analyze large scale data in
science, such as astronomy. We implemented an OpenStack
cloud and a Kubernetes-based orchestration service for
scientific data analysis. On the cloud, we have successfully
constructed data analysis systems with a task scheduler and
an in-memory database tool to support the task processing
and data I/O environment which are required in astronomical
researches. Furthermore, we aim to construct high-
performance cloud service for various data-intensive
research in more scientific fields.
Keywords: cloud computing, astronomical data analysis,
data analysis platform, openstack, kubernetes
1 Introduction
Recently, in the field of science and technology, more
and more data is generated through advanced data-capturing
sources [1]. And naturally, researchers are increasingly using
cutting-edge data analysis techniques, such as big data
analysis and machine learning. Astronomy is a typical field
of collecting and analyzing large amounts of data through
various observation tools, such as astronomical telescopes,
and data growth rate will increase rapidly in the near future.
As a notable example, Large Synoptic Survey Telescope
(LSST) will start to produce large volume of datasets up to
20TB per day from observing large area of the sky in full
operations from 2023. Total database for ten years is
expected to be 60 PB for the raw data, and 15 PB for the
catalog database [2]. As another big data project, Square
Kilometer Array (SKA), which will be constructed as the
largest in the world radio telescope until 2024, is also
projected to generate and archive 130-300PB per year [3].
In this era of data deluge, there is a growing demand for
utilizing cloud computing for data intensive sciences.
Particular, the astronomical research has demands to utilize
cloud computing, which is the ability to acquire resources for
simulation-driven numerical experiments or mass data
analysis in an immediate and dynamic way. Therefore, the
type of cloud service that is expected by astronomical science
researchers will be an Infrastructure as a Service (IaaS) for
flexible resources for running with existing software and
research methodologies and a Platform as a Service (PaaS) to
be applied with new data analytic tools.
In this paper, we propose a methodology and feasibility of
cloud computing that focuses on flexible use of resources and
astronomical science researchers' problems when using cloud
services. Section 2 introduces related researches, and Section
3 describes the features and requirements of the target
application. In Section 4 we describe the implementation of
the data analysis system for the target application. Finally, in
Section 5 we provide the conclusions and future plans.
2 Related Works
There have been several examples of cloud applications
for astronomical research. The Gemini Observatory has been
building a new archive using EC2, EBS, S3 and GLACIER from
the Amazon Web Services (AWS) cloud to replace the
existing Gemini Science Archive (GSA) [4]. In addition,
Williams et al.(2018) have conducted studies to reduce the
Panchromatic Hubble Andromeda Treasury (PHAT)
photometric data set using Amazon EC2 [5].
Unlike the cases of using public clouds, there are also studies
that use an private cloud environment to be built to perform
astronomical researches. AstroCloud [6] is a distributed
cloud platform which integrates lots of data management and
processing tasks for Chinese Virtual Observatory (China-
VO). In addition, Hahm et al.(2012) developed a platform for
constructing virtual machine-based condor clusters for
analyzing astronomical time-series data in a private cloud
[7]. The purpose of this study was to confirm the possibility
of constructing a cluster type analysis platform to perform
mass astronomical data analysis in a cloud environment.
Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 3
ISBN: 1-60132-499-5, CSREA Press ©
3 Application Requirements and Design
The application used in this study is MAGPHYS SED
fitting code, which reads and analyzes the data of the
brightness and color of galaxies to estimate their physical
properties. The used data is the large-scale survey data of
Galaxy And Mass Assembly (GAMA), which is a project to
exploit the latest generation of ground-based and space-borne
survey facilities to study cosmology and galaxy formation
and evolution [8]. On a single processor, MAGPHYS will
typically take 10 min to run for a single galaxy. In Figure 1,
the data analysis in the application starts with the data
obtained by analyzing original image data collected from the
telescope through preprocessing. The preprocessed data is a
text file DB, which is input data for the analysis. The
application extracts the data one line at a time from the input
file, submits it to the task queue together with the spectral
analysis code, and creates a new DB by storing the analyzed
result in the output file. In a traditional research
environment, the analysis will be done by building its own
clusters for data analysis, or through a job scheduler in a
shared cluster.
Fig. 1. Data Analysis Workflow
The main technical challenge of the application is to achieve
faster data I/O and use their own task queue for convenience.
GAMA dataset has information of 197,000 galaxies
approximately. However, file-based I/O is too slow and hard
to manage in this size of dataset. Therefore, a fast data I/O
tools and a job scheduler for high-throughput batch
processing are required.
To satisfy these requirements, we designed two types of
lightweight data analysis systems. First, data is read through
file I/O as usual and data processing environment is
configured by using asynchronous task scheduler for analysis
work (see Figure 2). In this case, we need a shared file system
that can be accessed by a large number of workers
performing analysis tasks. Second, as shown in Figure 3,
data input is performed through in-memory DB instead of
file reading for faster I/O, and the output of the analysis
result is also stored in the in-memory DB.
4 Cloud Data Analysis System
Implementation
4.1 KISTI HPC Infrastructure as a Service
Korea Institute of Science and Technology Information
(KISTI) is building a high-performance cloud service in
order to support data-intensive researches in various science
and technology research fields. This is because emerging
data-centric sciences require more flexible and dynamic
computing environment than traditional HPC service.
Especially big data and deep learning researches need
customized HPC resources in a flexible manner. So, KISTI
cloud will be a service providing customizable high-
performance computing and storage resources, such as
supercomputer, GPU cluster, etc.
In the first stage, the cloud service will be implemented on a
supercomputer. KISTI's newly introduced supercomputer
NURION is a national strategic research infrastructure to
Fig. 2. Data Analysis with Task Scheduler and File I/O
Fig. 3. Data Analysis with In-memory DB
Fig. 4. OpenStack Cloud Testbed
4 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
support R&Ds in various fields. In particular, it has a plan to
utilize it for data intensive computing and artificial
intelligence. In order to build such a service environment, we
will be leverage cloud computing technologies.
In order to design the service and verify the required skills,
we have constructed an OPENSTACK-based IaaS cloud testbed
system using a computational cluster (see Figure 4).
OPENSTACK [9] is a cloud deployment platform that is used as a
de facto standard in industry and research, and is well suited
to cloud deployments for high-performance computing too.
The cluster used for the testbed has thirteen Intel Xeon-based
servers: one deployment node, one controller node, three
storage nodes and compute nodes for the rest. OPENSTACK
services implemented here are NOVA (Compute), GLANCE
(Image), NEUTRON (Network), CINDER (Block Storage), SWIFT
(Object Storage), KEYNOTE (Identity), HEAT (Orchestration),
HORIZON (Dashboard), MANILA (Network Filesystem) and MAGNUM
(Container). In the case of storage, CEPH [10] storage was
configured using three servers and used as a backend for
GLANCE, CINDER, SWIFT, and MANILA services. Apart from this,
we have configured the Docker-based KUBERNETES
orchestration environment using MAGNUM. KUBERNETES is an
open source platform for automating Linux container
operations [11]. In this study, it is composed with one
KUBERNETES master and four workers.
4.2 Implementation of Data Analysis Platform
in the Cloud
The data analysis system constructed in this study
focuses on how to configure task scheduler and data I/O
environment for task processing. We describe the
architecture of the analysis system in Figure 5. First, the task
scheduler should efficiently distribute and process individual
tasks asynchronously. We adopted a lightweight task
scheduler so that it can be dynamically configured and used
independently, differently from the shared job schedulers,
such as PBS and SLURM in the conventional HPC system. In
particular, tasks for astronomical data analysis, which
require long research time, are often asynchronous
processing rather than synchronous processing. In the
experiments, we used DASK [12] and CELERY [13] as task
schedulers, which are readily available to scientific and
technological researchers and are likely to be used in a
common data analysis environment. The structure of the
scheduler is very simple, consisting of a scheduler and
workers. We write a Python code to submit tasks to the
scheduler and manage data. The difference between DASK
and CELERY is that DASK allocates and monitors tasks in its
own scheduler module, whereas CELERY worker’s tasks are
assigned from a separate message queue, such as RABBITMQ.
In the data I/O environment configuration, the initial
experiment was conducted by configuring a shared file
system using OPENSTACK MANILA for the file-based I/O
processing used in the existing analysis environment.
However, in data processing, file I/O is significantly slower
than computing performance, which causes severe
performance degradation in analyzing the entire data. In
order to solve this bottleneck problem and improve the
overall performance, we used an in-memory DB tool called
REDIS [14]. REDIS is a memory-based key-value store that is
known to handle more than 100,000 transactions per second.
heat_template_version: queens
…
parameters:
worker_num:
default: 16
…
…
resources:
scheduler:
type: OS::Nova::Server
properties:
name: dask-scheduler
image: Ubuntu 18.04 Redis with Dask
…
template: |
#!/bin/bash
pip3 install dask distributed --upgrade
…
dask-scheduler &
workers:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: worker_num }
resource_def:
type: OS::Nova::Server
properties:
name: dask-worker%index%
image: Ubuntu 18.04 Redis with Dask
…
template: |
#!/bin/bash
apt-get install redis-server -y
pip3 install dask distributed --upgrade
…
dask-worker dask-scheduler:8786 &
outputs:
instance_name:
…
instance_ip:
…
Fig. 6. HEAT Template for Analysis Platform with REDIS & DASK
Fig. 5. Data Analytics System Architecture
Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 5
ISBN: 1-60132-499-5, CSREA Press ©
.
A combination of task scheduler and data I/O environment
can be created and configured automatically in an
orchestration environment through OPENSTACK HEAT or
KUBERNETES in our self-contained cloud. Figure 6 is the
structure of one of the HEAT templates used in this
experiment. The template is structured with parameters and
resources. The resource is composed of scheduler and
workers, and the required softwares are installed and
configured for each scheduler and workers after boot-up.
5 Conclusion and Future Work
Through experiments, we have successfully analyzed
about 5,300 galaxy brightness and color data in a parallel
distributed processing environment consisting of DASK or
CELERY with REDIS. Figure 7 shows one of the example galaxy
from the GAMA data showing a result of the MAGPHYS
analysis in the cloud. With OPENSTACK-based cloud, we
confirmed that the research environment, especially data
analysis system with tools like task scheduler and in-memory
DB, can be automatically configured and well-utilized. In
addition, we confirmed the availability of an elastic service
environment through the cloud to meet the demand for large-
scale data analysis with volatility.
Fig. 7. An example result of the MAGPHYS analysis on the cloud
In this study, we have identified some useful aspects of the
cloud for data-driven research. First, we confirmed that it is
easy to build an independent execution environment that
provides the necessary software stack for research through
the cloud. Also in a cloud environment, researchers can
easily reuse the same research environment and share
research experience by reusing virtual machines or container
images deployed by the research community.
In the next step, we will configure an environment for real-
time processing of in-memory cache data. For practical real-
time data processing, it is necessary to construct an optimal
environment for data I/O as well as memory-based data
processing in stream, and various experiments need be
performed through the cloud. Based on the experiences of
building astronomical big data processing environment in
this study, we will provide more flexible and high
performance cloud service and let researchers utilize it in
various fields of data-centric researches.
6 References
[1] T. Hey, S. Tansley and K. Tolle, The Fourth Paradigm:
Data-intensive Scientific Discovery, Microsoft Research,
2009.
[2] LSST Corporation. About LSST: Data Management.
[Online]. Available from: https://www.lsst.org/about/dm/
2019.03.10
[3] P. Diamond, SKA Community Briefing. [Online].
Available from https://www.skatelescope.org/ska-
community-briefing-18jan2017/ 2019.03.10
[4] P. Hirest and R. Cardenes, “The new Gemini
Observatory archvieve: a fast and low cost observatory data
archive running in the cloud”, Proc. SPIE 9913, Software
and Cyberinfrastructure for Astronomy IV, 99131E (8
August 2016); doi: 10.1117/12.2231833
[5] B. F. Williams, K. Olsen, R. Khan, D. Pirone and K.
Rosema, “Reducing and analyzing the PHAT survey with the
cloud”, The Astrophysical Journal Supplemement Series,
Volume 236, Number 1
[6] C. Cui et al., “AstroCloud: a distributed cloud
computing and application platform for astronomy”, Proc.
WCSN2016
[7] J. Hahm et al., “Astronomical time series data analysis
leveraging sceince cloud”, Proc. Embedded and Multimedia
Computing Tehnology and Service, pp493-500, 2012
[8] S. P. Driver et al., “Galaxy And Mass Assembly
(GAMA): Panchromatic Data Release (far-UV-far-IR) and
the low-z energy budget”, MNRAS 455, 3911-3942, 2016.
[9] OpenStack Foundation. OpenStack Overview. [Online].
Available from: https://www.openstack.org/software/
2019.03.10
[10] Red Hat Inc. Ceph Introduction. [Online]. Available
from: https://ceph.com/ceph-storage/ 2019.03.10
[11] The Kubernetes Authors. What is Kubernetes?. [Online].
Available from: https://kubernetes.io/docs/concepts/overview/what-
is-kubernetes/2019.03.10
[12] Dask Core Developers, Why Dask?. [Online]. Available
from: https://docs.dask.org/en/latest/why.html 2019.03.10
[13] A. Solem, Celery - Distributed Task Queue. [Online].
Available from: http://docs.celeryproject.org/en/latest/index.html
2019.03.10
[14] S. Sanfilippo, Introduction to Redis. [Online]. Available
from: https://redis.io/topics/introduction 2019.03.10.
6 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
SESSION
HIGH-PERFORMANCE COMPUTING - HADOOP
FRAMEWORK
Chair(s)
TBA
Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 7
ISBN: 1-60132-499-5, CSREA Press ©
8 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
A Speculation and Prefetching Model for Efficient
Computation of MapReduce Tasks on Hadoop HDFS
System
Lan Yang
Computer Science Department
California State Polytechnic University, Pomona
Pomona, CA 91768, USA
Abstract - MapReduce programming model and Hadoop
software framework are keys to big data processing on high
performance computing (HPC) clusters. The Hadoop
Distributed File System (HDFS) is designed to stream large
data sets at high bandwidth. However, Hadoop suffers from a
set of drawbacks, particularly having issues with small files
as well as dynamic datasets. In this research we target big
data applications working with many on-demand datasets of
varying sizes. We propose a speculation model that
prefetches anticipated datasets for upcoming tasks in support
of efficient big data processing on HPC clusters.
Keywords: Prefetching, Speculation, Hadoop, MapReduce,
High performance computing cluster.
1 Introduction
Along with the emerging technology of cloud
computing, Google proposed the MapReduce
programming model [1] that allows for massive
scalability of unstructured data across hundreds or
thousands of high performance computing nodes.
Hadoop is an open source software framework that
performs distributed processing for huge data sets across
the cluster of commodity servers simultaneously. [2]
Now distributed as Apache Hadoop [3] many cloud
services such as AWS, Cloudera, HortonWorks, and
IBM InfoSphere Insights employ Apace Hadoop to offer
big data solutions. The Hadoop Distributed File System
(HDFS) [2], inspired by Google File System (GFS) [4],
is a reliable filesystem of Hadoop designed for storing
very large files running on a cluster of commodity
hardware. To process big data in Apache Hadoop, the
client submits data and program to Hadoop. HDFS
stores the data while MapReduce processes the data.
While Hadoop is a powerful tool for processing
massive data it suffers from a set of drawbacks
including issues with small files, no real time data
processing and for batch processing only [5]. The
Apache Spark [6] partially solved Hadoop’s real time
and batch processing problems by introducing in-
memory processing [7]. As a model of Hadoop
ecosystem Spark doesn’t have its own distributed
filesystem, though it can use HDFS. Hadoop does not
suit for small data due to the factor that HDFS lacks the
ability to efficiently support the random reading of
small files because of its high capacity design. Small
files are the major problem in HDFS.
In this research, we study a special type of iterative
MapReduce tasks working on HDFS with input datasets
coming from many small files dynamically, i.e. on-
demand. We propose a data prefetching speculation
model aiming at improving the performance and
flexibility of big data processing on Hadoop HDFS for
that special type of MapReduce tasks.
2 Background
2.1 Description of a special type of MapReduce
tasks
In today’s big data world, MapReduce programming
model and Hadoop software framework remain as
popular tools for big data processing. Based on a
number of big data applications performed on Hadoop
we observed the following:
(1) An HDFS file splits into chunks, typically of 64-
128MB in size. To benefit from Hadoop’s parallel
processing ability an HDFS file must be large enough to
be divided into multiple chunks. Therefore, a file is
considered as small if it is significantly smaller than the
HDFS chunk size.
(2) While many big data applications use large data
files that could be pushed to HDFS input directory prior
to task execution, some applications use many small
datasets distributed across a wide range.
Int'l Conf. Grid, Cloud,  Cluster Computing | GCC'19 | 9
ISBN: 1-60132-499-5, CSREA Press ©
(3) With the increasing demand of big data processing,
more and more applications now require multiple
rounds (or iterations) of processing with each round
requiring new datasets determined on the outcome of
previous computation. For example, in a data processing
application for a legal system, the first round
MapReduce computation uses prest ored case
documents, while the second round might require
accessing to certain assets or utilities datasets based on
the case outcomes resulted from the first-round analysis.
The assets or utilities datasets could consist of hundreds
to thousands of files ranging from 1KB to 10MB with
only dozens of files relevant depending on the outcome
of the first round. It would be very inefficient or
inflexible if we have to divide these two rounds into
separate client requests. Also, if we could overlap
computation and data access time by speculating and
prefetching data we could reduce the overall processing
time significantly. Here we refer to big data applications
with one or more of the above characteristics (i.e.
requiring iterative or multiple passes of MapReduce
computation, using many small files to form a HDFS
chunk, dynamic datasets that are dependent on the
outcome of previous rounds of computation) as a special
type of MapReduce tasks.
2.2 Observation: execution time and HDFS
chunks
We conducted several dozens of big data applications
using Hadoop on a high-performance computing cluster.
Table 1 summarizes the MapReduce performance of
three relatively large big data analytics tasks.
 
 

%
 !


 
!
! !
 
#  '
, ,*, +.
 # 
$'
0-.

--* -+
#
($
/.0

.0 .
Table 1: Performance data for some big data
applications (*requires multi-phase analysis)
2.3 Computation time vs. data fetch time
In this research, we first tested and analyzed data
accessing time ranging from 1K to 16MB on an HPC
cluster which consists of 2 DL360 management nodes,
20 DL160 compute nodes, 3.3 TB RAM, 40GBit
InfiniBand, 10GBit external Ethernet connection with
overall system throughput at 36.6 Tflp at double
prevision mode and 149.6 Tflp. Slurm job scheduler [8]
is the primary software we use for our testing. The
performance data shown in Figure 1 serve as our basis
for deriving the performance of our speculation
algorithms.
Figure 1: Data Access Performance Base
3 Speculation and Prefetching Models
3.1 Speculation model
We establish a connection graph (CG) to represent
relations of commonly used tasks with tasks as nodes
and edges as links between tasks. For example, link a
birthday party planning task to restaurant reservation
tasks as well as entertainment or recreation tasks. An
address change task is linked with moving or furniture
shopping tasks. The links on CG are prioritized, for
example, for birthday task, the restaurant task initially is
set with higher priority than the movie ticketing task.
The priorities are in 0.0 to 1.0 range and are
dynamically updated based on the outcome of our
prediction. For example, based on the connection in CG
graph and priorities of the links we predict the top two
tasks following the birthday task are in order of
restaurant task and movie task. If for that particular
application it turns out movie task is the correct choice
thus we will increase the priority by a small fraction,
say 0.1 and capped to 1.0 maximum.
10 Int'l Conf. Grid, Cloud,  Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
3.2 Prefetching algorithm
Prefetching concept is inspired by the compiler-
directed instruction/data prefetching technique that
speculates and prefetches instructions for
multiprocessing [9] [10]. Our basic fetching strategy is:
overlapping with the computation of current task, we
prefetch associated datasets for the next round of
computation based on the task speculation.
The association between task and data files can be
represented as a many-to-many relations between tasks
and data files. Each task is pre-associated with a list of
files in the order of established ranks. For example, for
the restaurant task could be associated with pizza
delivery files, restaurant location files etc. The ranks are
initialized based on the popularity of the service with a
value between 0.0 to 1.0 range with higher value as
most popular or most recommended services. The ranks
are then adjusted based on the network distance of file
locations with priority given to local or nearby files.
Again, after the task execution, if a prefetched file
turned out to be irrelevant (i.e. the whole file was
filtered out at easy MapReduce stage) the rank of that
file with regard to that task is reduced.
Based on the system configuration we also preset two
constant values K and S with K as the
optimized/recommended number of containers and S the
size of each container (suggest S to be the HDFS chunk
size and K to be the desired number of chunks with
regard to requested compute nodes.) When prefetching
datasets for a set of speculated tasks, the prefetching
process repeatedly reads files until it fills up all the
containers.
4 Simulation Design
We used a Python dictionary to implement the
connection graph CG with each distinct task name as a
key. The value for a key is a list of task links sorted in
descending order of priorities. The task and data
relations are also represented as a Python dictionary
with task names as keys and a list of data file names
sorted in descending order of ranks as values. Currently
we simulate the effectiveness of prefetching by using
parallel processes created by Slurm directly. Once the
approaches are validated we will test it on
Hadoop/HDFS.
4.1 Speculation Model
For any current task t, the simulated speculation
model always fetches the top candidate task p from the
CG directory, i.e. CG[t][0] as p and starts the
prefetching process. When the t completes it will choose
the next task t’. If t’ is the same as p, let t be p and the
process continues. If t’s is different from t, restart the
prefetching process, reduce the priority for p by one
level (currently 0.1) but not less than 0.0, and increase
the priority of t’ by 0.1 (capped at 1.0) if it’s already in
t’s connect link or add to t’s connect link with a
randomly assigned priority (between 0.1 and 0.5) if it’s
not in t’s connection link yet.
4.2 Prefetching Model
(1)Configuration: one file node N (i.e. a process that
only reads data in and writes to certain shared
location), created four shared storages (arrays or
dictionaries) representing the containers, C1 to C4.
Initially all Ci’s are empty and each container has a
current capacity and a maximum capacity (all
containers may have the same maximum capacity.)
It’s easily expendable to multiple file nodes and
larger number of containers.
(2)Assume the process p selected by the speculation
scheme is associated with n small files respectively,
say F1, ... Fn. Read in files in the order of F1, ...,
Fn. For each file read in, record its size as sj, then
searches for a container with its current capacity + sj
 maximum capacity, locks it once found and then
pushes the content in. If no available container
found, the file content is set aside and we increased
our failure rate by 1 (failure rate initially is set to 0).
Continue to fetch next file until it reaches the
condition as spelled in (3).
(3)The pre-fetching process ends when all containers
reach certain percentage full (i.e. at least 80% full)
or when the failure rate reaches to a certain number
(say 3). Note: one failure doesn’t mean the
containers are full. It could be the scenario that we
fetched a very large dataset that couldn’t fit into any
of current containers. However, in this case we may
further fetch the files next in the list as these might
be smaller files.
5 Conclusions
In this research work, we studied the possibility of
congregating small datasets dynamically to form large
data chunk suitable for MapReduce task on Hadoop
HDFS. We proposed task speculation and file
prefetching models to speed up overall processing tasks.
We have setup a primitive simulation test suite to assess
Int'l Conf. Grid, Cloud,  Cluster Computing | GCC'19 | 11
ISBN: 1-60132-499-5, CSREA Press ©
the feasibility of the speculation and prefetching
models. Since currently we are designing the schemes
on Slurm multiprocess environments without using
HDFS, no performance gain could be measured. Our
future (and on-going) work is to implement the design
schemes from HPC Slurm processes onto Hadoop
HDFS system and measure the effectiveness using real-
world big data applications.
6 References
[1] Jeffrey Dean and Sanjay Ghemawat, MapReduce:
Simplified Data Processing on Large Clusters, Google
Research,
https://research.google.com/archive/mapreduce-
osdi04.pdf
[2] Konstantin Shvachko, Hairong Kuang, Sanjay
Radia, Robert Chansler, The Hadoop Distributed File
System, 2010 IEEE 26th Symposium on Mass Storage
Systems and Technologies (MSST)
[3] Apache Hadoop https://hadoop.apache.org/
[4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak
Leung, The Google File System,
https://static.googleusercontent.com/media/research.goo
gle.com/en//archive/gfs-sosp2003.pdf
[5] DATAFLAIR Team, 13 Big Limitations of Hadoop
 Solution To Hadoop Drawbacks, https://data-
flair.training/blogs/13-limitations-of-hadoop/, March 7,
2019.
[6] Apache Spark https://spark.apache.org/
[7] Matei Zaharia, Mosharaf Chowdhury, Michael
Franklin, Scott Shenker, Ion Stoica, Spark: Cluster
Computing with Working Sets, Proceedings of the 2nd
USENIX conference on Hot topics in cloud computing,
2010.
[8] Slurm job scheduler, https://slurm.schedmd.com/
[9] Seung Woo Son, Mahmut Kandemir, Mustafa
Karakoy, Dhruva Chakrabarti, A compiler-directed data
prefetching scheme for chip multiprocessors,
Proceedings of the 14th ACM SIGPLAN symposium on
Principles and practice of parallel programming (PPoPP
'09)
[10] Ricardo Bianchinia, Beng-Hong Limb, Evaluating
the Performance of Multithreading and Prefetching in
Multiprocessors, https://doi.org/10.1006/jpdc.1996.0109
12 Int'l Conf. Grid, Cloud,  Cluster Computing | GCC'19 |
ISBN: 1-60132-499-5, CSREA Press ©
Discovering Diverse Content Through
Random Scribd Documents
THE LOVELY CHILD
Lilies are both pure and fair,
Growing ’midst the roses there—
Roses, too, both red and pink,
Are quite beautiful, I think.
But of all bright blossoms—best—
Purest—fairest—loveliest,—
Could there be a sweeter thing
Than a primrose, blossoming?
THE YELLOWBIRD
Hey! my little Yellowbird,
What you doing there?
Like a flashing sun-ray,
Flitting everywhere:
Dangling down the tall weeds
And the hollyhocks,
And the lordly sunflowers
Along the garden-walks.
Ho! my gallant Golden-bill,
Pecking ’mongst the weeds,
You must have for breakfast
Golden flower-seeds:
Won’t you tell a little fellow
What you have for tea?—
’Spect a peck o’ yellow, mellow
Pippin on the tree.
ENVOY
When but a little boy, it seemed
My dearest rapture ran
In fancy ever, when I dreamed
I was a man—a man!
Now—sad perversity!—my theme
Of rarest, purest joy
Is when, in fancy blest, I dream
I am a little boy.
*** END OF THE PROJECT GUTENBERG EBOOK ARMAZINDY ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookfinal.com

More Related Content

PDF
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
PDF
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
PDF
Internet Computing and Internet of Things 1st Edition Hamid R. Arabnia
PDF
Grid Cloud and Cluster Computing and Applications 1st Edition Hamid R. Arabnia
PDF
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia Leonida...
PDF
Internet Computing and Internet of Things 1st Edition Hamid R. Arabnia
PDF
Grid Cloud And Cluster Computing And Applications 1st Edition Hamid R Arabnia...
PDF
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia
Internet Computing and Internet of Things 1st Edition Hamid R. Arabnia
Grid Cloud and Cluster Computing and Applications 1st Edition Hamid R. Arabnia
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia Leonida...
Internet Computing and Internet of Things 1st Edition Hamid R. Arabnia
Grid Cloud And Cluster Computing And Applications 1st Edition Hamid R Arabnia...
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia

Similar to Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia (20)

PDF
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia
PDF
Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia
PDF
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia
PDF
Embedded Systems Cyber Physical Systems and Applications 1st Edition Hamid R....
PDF
Embedded Systems Cyber Physical Systems and Applications 1st Edition Hamid R....
PDF
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
PDF
Software Engineering Research And Practice 1st Edition Hamid R Arabnia Leonid...
PDF
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
PDF
Embedded Systems Cyberphysical Systems And Applications 1st Edition Hamid R A...
PDF
Embedded Systems Cyberphysical Systems And Applications 1st Edition Hamid R A...
PDF
Parallel And Distributed Processing Techniques And Applications 1st Edition H...
PDF
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
PDF
Information And Knowledge Engineering 1st Edition Hamid R Arabnia Leonidas De...
PDF
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia
PDF
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia
PDF
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
PDF
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
PDF
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
PDF
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
PDF
Security And Management 1st Edition Kevin Daimi Hamid R Arabnia
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia
Image Processing Computer Vision and Pattern Recognition Hamid R. Arabnia
Health Informatics and Medical Systems 1st Edition Hamid R. Arabnia
Embedded Systems Cyber Physical Systems and Applications 1st Edition Hamid R....
Embedded Systems Cyber Physical Systems and Applications 1st Edition Hamid R....
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
Software Engineering Research And Practice 1st Edition Hamid R Arabnia Leonid...
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
Embedded Systems Cyberphysical Systems And Applications 1st Edition Hamid R A...
Embedded Systems Cyberphysical Systems And Applications 1st Edition Hamid R A...
Parallel And Distributed Processing Techniques And Applications 1st Edition H...
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
Information And Knowledge Engineering 1st Edition Hamid R Arabnia Leonidas De...
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia
Internet Computing And Internet Of Things 1st Edition Hamid R Arabnia
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
Information and Knowledge Engineering 1st Edition Hamid R. Arabnia
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
Frontiers In Education Computer Science And Computer Engineering 1st Edition ...
Security And Management 1st Edition Kevin Daimi Hamid R Arabnia
Ad

Recently uploaded (20)

PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
HVAC Specification 2024 according to central public works department
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Module on health assessment of CHN. pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PPTX
Education and Perspectives of Education.pptx
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
Hazard Identification & Risk Assessment .pdf
Virtual and Augmented Reality in Current Scenario
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
HVAC Specification 2024 according to central public works department
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Introduction to pro and eukaryotes and differences.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Module on health assessment of CHN. pptx
Paper A Mock Exam 9_ Attempt review.pdf.
Uderstanding digital marketing and marketing stratergie for engaging the digi...
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Education and Perspectives of Education.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Ad

Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia

  • 1. Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia pdf download https://ebookfinal.com/download/grid-cloud-and-cluster- computing-1st-edition-hamid-r-arabnia/ Explore and download more ebooks or textbooks at ebookfinal.com
  • 2. Here are some recommended products for you. Click the link to download, or explore more at ebookfinal Parallel and Distributed Processing Techniques and Applications 1st Edition Hamid R. Arabnia https://ebookfinal.com/download/parallel-and-distributed-processing- techniques-and-applications-1st-edition-hamid-r-arabnia/ Embedded Systems Cyber Physical Systems and Applications 1st Edition Hamid R. Arabnia https://ebookfinal.com/download/embedded-systems-cyber-physical- systems-and-applications-1st-edition-hamid-r-arabnia/ Grid Computing A Research Monograph 1st Edition Janakiram https://ebookfinal.com/download/grid-computing-a-research- monograph-1st-edition-janakiram/ Introduction to Grid Computing 1st Edition Frederic Magoules https://ebookfinal.com/download/introduction-to-grid-computing-1st- edition-frederic-magoules/
  • 3. Cloud Computing Solutions 1st Edition Souvik Pal https://ebookfinal.com/download/cloud-computing-solutions-1st-edition- souvik-pal/ Grid Computing The New Frontier of High Performance Computing 1st Edition Lucio Grandinetti (Eds.) https://ebookfinal.com/download/grid-computing-the-new-frontier-of- high-performance-computing-1st-edition-lucio-grandinetti-eds/ Microsoft Private Cloud Computing 1st Edition Aidan Finn https://ebookfinal.com/download/microsoft-private-cloud-computing-1st- edition-aidan-finn/ Fundamentals of Grid Computing Theory Algorithms and Technologies 1st Edition Frederic Magoules https://ebookfinal.com/download/fundamentals-of-grid-computing-theory- algorithms-and-technologies-1st-edition-frederic-magoules/ OpenStack Cloud Computing Cookbook 3rd Edition Over 110 effective recipes to help you build and operate OpenStack cloud computing storage networking and automation Kevin Jackson https://ebookfinal.com/download/openstack-cloud-computing- cookbook-3rd-edition-over-110-effective-recipes-to-help-you-build-and- operate-openstack-cloud-computing-storage-networking-and-automation- kevin-jackson/
  • 5. Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia Digital Instant Download Author(s): Hamid R. Arabnia; Leonidas Deligiannidis; Fernando G. Tinetti ISBN(s): 9781683925699, 1683925696 Edition: 1 File Details: PDF, 1.79 MB Year: 2017 Language: english
  • 6. P u b l i c a t i o n o f t h e 2 0 1 9 W o r l d C o n g r e s s i n C o m p u t e r S c i e n c e , C o m p u t e r E n g i n e e r i n g , & A p p l i e d C o m p u t i n g ( C S C E ’ 1 9 ) J u l y 2 9 - A u g u s t 0 1 , 2 0 1 9 | L a s V e g a s , N e v a d a , U S A h t t p s : / / a m e r i c a n c s e . o r g / e v e n t s / c s c e 2 0 1 9 Copyright © 2019 CSREA Press GCC’19 PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON GRID, CLOUD, & CLUSTER COMPUTING Editors Hamid R. Arabnia Leonidas Deligiannidis, Fernando G. Tinetti WORLDCOMP’19 Grid, Cloud, and Cluster Computing Arabnia 324993 781601 9 ISBN 9781601324993 54995 U.S. $49.95 PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON GRID, CLOUD, & CLUSTER COMPUTING EMBD-GCC19_Full-Cover.indd All Pages 18-Feb-20 5:28:50 PM
  • 7. American Council on Science and Education (ACSE) Copyright © 2019 CSREA Press ISBN: 1-60132-499-5 Printed in the United States of America https://americancse.org/events/csce2019/proceedings This volume contains papers presented at the 2019 International Conference on Grid, Cloud, & Cluster Computing. Their inclusion in this publication does not necessarily constitute endorsements by editors or by the publisher. Copyright and Reprint Permission Copying without a fee is permitted provided that the copies are not made or distributed for direct commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source. Please contact the publisher for other copying, reprint, or republication permission.
  • 8. Foreword It gives us great pleasure to introduce this collection of papers to be presented at the 2019 International Conference on Grid, Cloud, and Cluster Computing (GCC’19), July 29 – August 1, 2019, at Luxor Hotel (a property of MGM Resorts International), Las Vegas, USA. The preliminary edition of this book (available in July 2019 for distribution on site at the conference) includes only a small subset of the accepted research articles. The final edition (available in August 2019) will include all accepted research articles. This is due to deadline extension requests received from most authors who wished to continue enhancing the write-up of their papers (by incorporating the referees’ suggestions). The final edition of the proceedings will be made available at https://americancse.org/events/csce2019/proceedings . An important mission of the World Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a unique platform for a diverse community of constituents composed of scholars, researchers, developers, educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated with diverse entities (such as: universities, institutions, corporations, government agencies, and research centers/labs) from all over the world. The congress also attempts to connect participants from institutions that have teaching as their main mission with those who are affiliated with institutions that have research as their main mission. The congress uses a quota system to achieve its institution and geography diversity objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in USA. We are proud to report that this federated congress has authors and participants from 67 different nations representing variety of personal and scientific experiences that arise from differences in culture and values. As can be seen (see below), the program committee of this conference as well as the program committee of all other tracks of the federated congress are as diverse as its authors and participants. The program committee would like to thank all those who submitted papers for consideration. About 70% of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory recommendations, a member of the conference program committee was charged to make the final decision; often, this involved seeking help from additional referees. In addition, papers whose authors included a member of the conference program committee were evaluated using the double-blinded review process. One exception to the above evaluation process was for papers that were submitted directly to chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers was 18%; 20% of the remaining papers were accepted as poster papers (at the time of this writing, we had not yet received the acceptance rate for a couple of individual tracks.) We are very grateful to the many colleagues who offered their services in organizing the conference. In particular, we would like to thank the members of Program Committee of GCC’19, members of the congress Steering Committee, and members of the committees of federated congress tracks that have topics within the scope of GCC. Many individuals listed below, will be requested after the conference to provide their expertise and services for selecting papers for publication (extended versions) in journal special issues as well as for publication in a set of research books (to be prepared for publishers including: Springer, Elsevier, BMC journals, and others).  Prof. Emeritus Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of Detroit Mercy, Detroit, Michigan, USA  Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS, MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer); Editor-in-Chief, Transactions of Computational Science & Computational Intelligence (Springer); Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research (CENTRIC).
  • 9.  Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV), Department of Computer Engineering (DISCA), Valencia, Spain  Prof. Emeritus Kevin Daimi (Congress Steering Committee); Director, Computer Science and Software Engineering Programs, Department of Mathematics, Computer Science and Software Engineering, University of Detroit Mercy, Detroit, Michigan, USA  Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting Professor, MIT, USA  Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of Engineering Practice, University of Southern California, California, USA; Adjunct Professor, Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California, USA  Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of Engineering, Adamson University, Manila, Philippines; Senior Member, International Association of Computer Science and Information Technology (IACSIT), Singapore; Member, International Association of Online Engineering (IAOE), Austria  Prof. Hyo Jong Lee; Director, Center for Advanced Image and Information Technology, Division of Computer Science and Engineering, Chonbuk National University, South Korea  Dr. Ali Mostafaeipour; Industrial Engineering Department, Yazd University, Yazd, Iran  Dr. Houssem Eddine Nouri; Informatics Applied in Management, Institut Superieur de Gestion de Tunis, University of Tunis, Tunisia  Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli University, Edo State, Nigeria  Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer, Maverick Technologies America Inc.  Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science, Universidad Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones Cientificas de la Prov. de Bs. As., Argentina  Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA  Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong  Dr. Farhana H. Zulkernine; Coordinator of the Cognitive Science Program, School of Computing, Queen's University, Kingston, ON, Canada We would like to extend our appreciation to the referees, the members of the program committees of individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on the web sites of individual tracks. As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons) provided help for at least one track of the Congress: Computer Science Research, Education, and Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science & Education & Federated Research Council (http://www.americancse.org/). In addition, a number of university faculty members and their staff (names appear on the cover of the set of proceedings), several publishers of computer science and computer engineering books and journals, chapters and/or task forces of computer science associations/organizations from 3 regions, and developers of high-performance machines and systems provided significant help in organizing the conference as well as providing some resources. We are grateful to them all. We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS (Universal Conference Management Systems & Support, California, USA) for managing all aspects of the
  • 10. conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last but not least, we would like to thank the Co-Editors of GCC’19: Prof. Hamid R. Arabnia, Prof. Leonidas Deligiannidis, and Prof. Fernando G. Tinetti. We present the proceedings of GCC’19. Steering Committee, 2019 http://americancse.org/
  • 12. Contents SESSION: HIGH-PERFORMANCE COMPUTING - CLOUD COMPUTING The Design and Implementation of Astronomical Data Analysis System on HPC Cloud 3 Jaegyoon Hahm, Ju-Won Park, Hyeyoung Cho, Min-Su Shin, Chang Hee Ree SESSION: HIGH-PERFORMANCE COMPUTING - HADOOP FRAMEWORK A Speculation and Prefetching Model for Efficient Computation of MapReduce Tasks on Hadoop HDFS System 9 Lan Yang 15 SESSION: LATE BREAKING PAPER: CLOUD MIGRATION Critical Risk Management Practices to Mitigate Cloud Migration Misconfigurations Michael Atadika, Karen Burke, Neil Rowe
  • 14. SESSION HIGH-PERFORMANCE COMPUTING - CLOUD COMPUTING Chair(s) TBA Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 1 ISBN: 1-60132-499-5, CSREA Press ©
  • 15. 2 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 16. The Design and Implementation of Astronomical Data Analysis System on HPC Cloud Jaegyoon Hahm1 , Ju-Won Park1 , Hyeyoung Cho1 , Min-Su Shin2 , and Chang Hee Ree2 1 Supercomputing Infrastructure Center, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea 2 Galaxy Evolution Research Group, Korea Astronomy and Space Science Institute, Daejeon, Republic of Korea Abstract - Astronomy is a representative data-intensive science that can take advantage of cloud computing because it requires flexible infrastructure services for variable workloads and various data analysis tools. The purpose of this study is to show the usefulness of cloud computing as a research environment required to analyze large scale data in science, such as astronomy. We implemented an OpenStack cloud and a Kubernetes-based orchestration service for scientific data analysis. On the cloud, we have successfully constructed data analysis systems with a task scheduler and an in-memory database tool to support the task processing and data I/O environment which are required in astronomical researches. Furthermore, we aim to construct high- performance cloud service for various data-intensive research in more scientific fields. Keywords: cloud computing, astronomical data analysis, data analysis platform, openstack, kubernetes 1 Introduction Recently, in the field of science and technology, more and more data is generated through advanced data-capturing sources [1]. And naturally, researchers are increasingly using cutting-edge data analysis techniques, such as big data analysis and machine learning. Astronomy is a typical field of collecting and analyzing large amounts of data through various observation tools, such as astronomical telescopes, and data growth rate will increase rapidly in the near future. As a notable example, Large Synoptic Survey Telescope (LSST) will start to produce large volume of datasets up to 20TB per day from observing large area of the sky in full operations from 2023. Total database for ten years is expected to be 60 PB for the raw data, and 15 PB for the catalog database [2]. As another big data project, Square Kilometer Array (SKA), which will be constructed as the largest in the world radio telescope until 2024, is also projected to generate and archive 130-300PB per year [3]. In this era of data deluge, there is a growing demand for utilizing cloud computing for data intensive sciences. Particular, the astronomical research has demands to utilize cloud computing, which is the ability to acquire resources for simulation-driven numerical experiments or mass data analysis in an immediate and dynamic way. Therefore, the type of cloud service that is expected by astronomical science researchers will be an Infrastructure as a Service (IaaS) for flexible resources for running with existing software and research methodologies and a Platform as a Service (PaaS) to be applied with new data analytic tools. In this paper, we propose a methodology and feasibility of cloud computing that focuses on flexible use of resources and astronomical science researchers' problems when using cloud services. Section 2 introduces related researches, and Section 3 describes the features and requirements of the target application. In Section 4 we describe the implementation of the data analysis system for the target application. Finally, in Section 5 we provide the conclusions and future plans. 2 Related Works There have been several examples of cloud applications for astronomical research. The Gemini Observatory has been building a new archive using EC2, EBS, S3 and GLACIER from the Amazon Web Services (AWS) cloud to replace the existing Gemini Science Archive (GSA) [4]. In addition, Williams et al.(2018) have conducted studies to reduce the Panchromatic Hubble Andromeda Treasury (PHAT) photometric data set using Amazon EC2 [5]. Unlike the cases of using public clouds, there are also studies that use an private cloud environment to be built to perform astronomical researches. AstroCloud [6] is a distributed cloud platform which integrates lots of data management and processing tasks for Chinese Virtual Observatory (China- VO). In addition, Hahm et al.(2012) developed a platform for constructing virtual machine-based condor clusters for analyzing astronomical time-series data in a private cloud [7]. The purpose of this study was to confirm the possibility of constructing a cluster type analysis platform to perform mass astronomical data analysis in a cloud environment. Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 3 ISBN: 1-60132-499-5, CSREA Press ©
  • 17. 3 Application Requirements and Design The application used in this study is MAGPHYS SED fitting code, which reads and analyzes the data of the brightness and color of galaxies to estimate their physical properties. The used data is the large-scale survey data of Galaxy And Mass Assembly (GAMA), which is a project to exploit the latest generation of ground-based and space-borne survey facilities to study cosmology and galaxy formation and evolution [8]. On a single processor, MAGPHYS will typically take 10 min to run for a single galaxy. In Figure 1, the data analysis in the application starts with the data obtained by analyzing original image data collected from the telescope through preprocessing. The preprocessed data is a text file DB, which is input data for the analysis. The application extracts the data one line at a time from the input file, submits it to the task queue together with the spectral analysis code, and creates a new DB by storing the analyzed result in the output file. In a traditional research environment, the analysis will be done by building its own clusters for data analysis, or through a job scheduler in a shared cluster. Fig. 1. Data Analysis Workflow The main technical challenge of the application is to achieve faster data I/O and use their own task queue for convenience. GAMA dataset has information of 197,000 galaxies approximately. However, file-based I/O is too slow and hard to manage in this size of dataset. Therefore, a fast data I/O tools and a job scheduler for high-throughput batch processing are required. To satisfy these requirements, we designed two types of lightweight data analysis systems. First, data is read through file I/O as usual and data processing environment is configured by using asynchronous task scheduler for analysis work (see Figure 2). In this case, we need a shared file system that can be accessed by a large number of workers performing analysis tasks. Second, as shown in Figure 3, data input is performed through in-memory DB instead of file reading for faster I/O, and the output of the analysis result is also stored in the in-memory DB. 4 Cloud Data Analysis System Implementation 4.1 KISTI HPC Infrastructure as a Service Korea Institute of Science and Technology Information (KISTI) is building a high-performance cloud service in order to support data-intensive researches in various science and technology research fields. This is because emerging data-centric sciences require more flexible and dynamic computing environment than traditional HPC service. Especially big data and deep learning researches need customized HPC resources in a flexible manner. So, KISTI cloud will be a service providing customizable high- performance computing and storage resources, such as supercomputer, GPU cluster, etc. In the first stage, the cloud service will be implemented on a supercomputer. KISTI's newly introduced supercomputer NURION is a national strategic research infrastructure to Fig. 2. Data Analysis with Task Scheduler and File I/O Fig. 3. Data Analysis with In-memory DB Fig. 4. OpenStack Cloud Testbed 4 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 18. support R&Ds in various fields. In particular, it has a plan to utilize it for data intensive computing and artificial intelligence. In order to build such a service environment, we will be leverage cloud computing technologies. In order to design the service and verify the required skills, we have constructed an OPENSTACK-based IaaS cloud testbed system using a computational cluster (see Figure 4). OPENSTACK [9] is a cloud deployment platform that is used as a de facto standard in industry and research, and is well suited to cloud deployments for high-performance computing too. The cluster used for the testbed has thirteen Intel Xeon-based servers: one deployment node, one controller node, three storage nodes and compute nodes for the rest. OPENSTACK services implemented here are NOVA (Compute), GLANCE (Image), NEUTRON (Network), CINDER (Block Storage), SWIFT (Object Storage), KEYNOTE (Identity), HEAT (Orchestration), HORIZON (Dashboard), MANILA (Network Filesystem) and MAGNUM (Container). In the case of storage, CEPH [10] storage was configured using three servers and used as a backend for GLANCE, CINDER, SWIFT, and MANILA services. Apart from this, we have configured the Docker-based KUBERNETES orchestration environment using MAGNUM. KUBERNETES is an open source platform for automating Linux container operations [11]. In this study, it is composed with one KUBERNETES master and four workers. 4.2 Implementation of Data Analysis Platform in the Cloud The data analysis system constructed in this study focuses on how to configure task scheduler and data I/O environment for task processing. We describe the architecture of the analysis system in Figure 5. First, the task scheduler should efficiently distribute and process individual tasks asynchronously. We adopted a lightweight task scheduler so that it can be dynamically configured and used independently, differently from the shared job schedulers, such as PBS and SLURM in the conventional HPC system. In particular, tasks for astronomical data analysis, which require long research time, are often asynchronous processing rather than synchronous processing. In the experiments, we used DASK [12] and CELERY [13] as task schedulers, which are readily available to scientific and technological researchers and are likely to be used in a common data analysis environment. The structure of the scheduler is very simple, consisting of a scheduler and workers. We write a Python code to submit tasks to the scheduler and manage data. The difference between DASK and CELERY is that DASK allocates and monitors tasks in its own scheduler module, whereas CELERY worker’s tasks are assigned from a separate message queue, such as RABBITMQ. In the data I/O environment configuration, the initial experiment was conducted by configuring a shared file system using OPENSTACK MANILA for the file-based I/O processing used in the existing analysis environment. However, in data processing, file I/O is significantly slower than computing performance, which causes severe performance degradation in analyzing the entire data. In order to solve this bottleneck problem and improve the overall performance, we used an in-memory DB tool called REDIS [14]. REDIS is a memory-based key-value store that is known to handle more than 100,000 transactions per second. heat_template_version: queens … parameters: worker_num: default: 16 … … resources: scheduler: type: OS::Nova::Server properties: name: dask-scheduler image: Ubuntu 18.04 Redis with Dask … template: | #!/bin/bash pip3 install dask distributed --upgrade … dask-scheduler & workers: type: OS::Heat::ResourceGroup properties: count: { get_param: worker_num } resource_def: type: OS::Nova::Server properties: name: dask-worker%index% image: Ubuntu 18.04 Redis with Dask … template: | #!/bin/bash apt-get install redis-server -y pip3 install dask distributed --upgrade … dask-worker dask-scheduler:8786 & outputs: instance_name: … instance_ip: … Fig. 6. HEAT Template for Analysis Platform with REDIS & DASK Fig. 5. Data Analytics System Architecture Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 5 ISBN: 1-60132-499-5, CSREA Press © .
  • 19. A combination of task scheduler and data I/O environment can be created and configured automatically in an orchestration environment through OPENSTACK HEAT or KUBERNETES in our self-contained cloud. Figure 6 is the structure of one of the HEAT templates used in this experiment. The template is structured with parameters and resources. The resource is composed of scheduler and workers, and the required softwares are installed and configured for each scheduler and workers after boot-up. 5 Conclusion and Future Work Through experiments, we have successfully analyzed about 5,300 galaxy brightness and color data in a parallel distributed processing environment consisting of DASK or CELERY with REDIS. Figure 7 shows one of the example galaxy from the GAMA data showing a result of the MAGPHYS analysis in the cloud. With OPENSTACK-based cloud, we confirmed that the research environment, especially data analysis system with tools like task scheduler and in-memory DB, can be automatically configured and well-utilized. In addition, we confirmed the availability of an elastic service environment through the cloud to meet the demand for large- scale data analysis with volatility. Fig. 7. An example result of the MAGPHYS analysis on the cloud In this study, we have identified some useful aspects of the cloud for data-driven research. First, we confirmed that it is easy to build an independent execution environment that provides the necessary software stack for research through the cloud. Also in a cloud environment, researchers can easily reuse the same research environment and share research experience by reusing virtual machines or container images deployed by the research community. In the next step, we will configure an environment for real- time processing of in-memory cache data. For practical real- time data processing, it is necessary to construct an optimal environment for data I/O as well as memory-based data processing in stream, and various experiments need be performed through the cloud. Based on the experiences of building astronomical big data processing environment in this study, we will provide more flexible and high performance cloud service and let researchers utilize it in various fields of data-centric researches. 6 References [1] T. Hey, S. Tansley and K. Tolle, The Fourth Paradigm: Data-intensive Scientific Discovery, Microsoft Research, 2009. [2] LSST Corporation. About LSST: Data Management. [Online]. Available from: https://www.lsst.org/about/dm/ 2019.03.10 [3] P. Diamond, SKA Community Briefing. [Online]. Available from https://www.skatelescope.org/ska- community-briefing-18jan2017/ 2019.03.10 [4] P. Hirest and R. Cardenes, “The new Gemini Observatory archvieve: a fast and low cost observatory data archive running in the cloud”, Proc. SPIE 9913, Software and Cyberinfrastructure for Astronomy IV, 99131E (8 August 2016); doi: 10.1117/12.2231833 [5] B. F. Williams, K. Olsen, R. Khan, D. Pirone and K. Rosema, “Reducing and analyzing the PHAT survey with the cloud”, The Astrophysical Journal Supplemement Series, Volume 236, Number 1 [6] C. Cui et al., “AstroCloud: a distributed cloud computing and application platform for astronomy”, Proc. WCSN2016 [7] J. Hahm et al., “Astronomical time series data analysis leveraging sceince cloud”, Proc. Embedded and Multimedia Computing Tehnology and Service, pp493-500, 2012 [8] S. P. Driver et al., “Galaxy And Mass Assembly (GAMA): Panchromatic Data Release (far-UV-far-IR) and the low-z energy budget”, MNRAS 455, 3911-3942, 2016. [9] OpenStack Foundation. OpenStack Overview. [Online]. Available from: https://www.openstack.org/software/ 2019.03.10 [10] Red Hat Inc. Ceph Introduction. [Online]. Available from: https://ceph.com/ceph-storage/ 2019.03.10 [11] The Kubernetes Authors. What is Kubernetes?. [Online]. Available from: https://kubernetes.io/docs/concepts/overview/what- is-kubernetes/2019.03.10 [12] Dask Core Developers, Why Dask?. [Online]. Available from: https://docs.dask.org/en/latest/why.html 2019.03.10 [13] A. Solem, Celery - Distributed Task Queue. [Online]. Available from: http://docs.celeryproject.org/en/latest/index.html 2019.03.10 [14] S. Sanfilippo, Introduction to Redis. [Online]. Available from: https://redis.io/topics/introduction 2019.03.10. 6 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 20. SESSION HIGH-PERFORMANCE COMPUTING - HADOOP FRAMEWORK Chair(s) TBA Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 7 ISBN: 1-60132-499-5, CSREA Press ©
  • 21. 8 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 22. A Speculation and Prefetching Model for Efficient Computation of MapReduce Tasks on Hadoop HDFS System Lan Yang Computer Science Department California State Polytechnic University, Pomona Pomona, CA 91768, USA Abstract - MapReduce programming model and Hadoop software framework are keys to big data processing on high performance computing (HPC) clusters. The Hadoop Distributed File System (HDFS) is designed to stream large data sets at high bandwidth. However, Hadoop suffers from a set of drawbacks, particularly having issues with small files as well as dynamic datasets. In this research we target big data applications working with many on-demand datasets of varying sizes. We propose a speculation model that prefetches anticipated datasets for upcoming tasks in support of efficient big data processing on HPC clusters. Keywords: Prefetching, Speculation, Hadoop, MapReduce, High performance computing cluster. 1 Introduction Along with the emerging technology of cloud computing, Google proposed the MapReduce programming model [1] that allows for massive scalability of unstructured data across hundreds or thousands of high performance computing nodes. Hadoop is an open source software framework that performs distributed processing for huge data sets across the cluster of commodity servers simultaneously. [2] Now distributed as Apache Hadoop [3] many cloud services such as AWS, Cloudera, HortonWorks, and IBM InfoSphere Insights employ Apace Hadoop to offer big data solutions. The Hadoop Distributed File System (HDFS) [2], inspired by Google File System (GFS) [4], is a reliable filesystem of Hadoop designed for storing very large files running on a cluster of commodity hardware. To process big data in Apache Hadoop, the client submits data and program to Hadoop. HDFS stores the data while MapReduce processes the data. While Hadoop is a powerful tool for processing massive data it suffers from a set of drawbacks including issues with small files, no real time data processing and for batch processing only [5]. The Apache Spark [6] partially solved Hadoop’s real time and batch processing problems by introducing in- memory processing [7]. As a model of Hadoop ecosystem Spark doesn’t have its own distributed filesystem, though it can use HDFS. Hadoop does not suit for small data due to the factor that HDFS lacks the ability to efficiently support the random reading of small files because of its high capacity design. Small files are the major problem in HDFS. In this research, we study a special type of iterative MapReduce tasks working on HDFS with input datasets coming from many small files dynamically, i.e. on- demand. We propose a data prefetching speculation model aiming at improving the performance and flexibility of big data processing on Hadoop HDFS for that special type of MapReduce tasks. 2 Background 2.1 Description of a special type of MapReduce tasks In today’s big data world, MapReduce programming model and Hadoop software framework remain as popular tools for big data processing. Based on a number of big data applications performed on Hadoop we observed the following: (1) An HDFS file splits into chunks, typically of 64- 128MB in size. To benefit from Hadoop’s parallel processing ability an HDFS file must be large enough to be divided into multiple chunks. Therefore, a file is considered as small if it is significantly smaller than the HDFS chunk size. (2) While many big data applications use large data files that could be pushed to HDFS input directory prior to task execution, some applications use many small datasets distributed across a wide range. Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | 9 ISBN: 1-60132-499-5, CSREA Press ©
  • 23. (3) With the increasing demand of big data processing, more and more applications now require multiple rounds (or iterations) of processing with each round requiring new datasets determined on the outcome of previous computation. For example, in a data processing application for a legal system, the first round MapReduce computation uses prest ored case documents, while the second round might require accessing to certain assets or utilities datasets based on the case outcomes resulted from the first-round analysis. The assets or utilities datasets could consist of hundreds to thousands of files ranging from 1KB to 10MB with only dozens of files relevant depending on the outcome of the first round. It would be very inefficient or inflexible if we have to divide these two rounds into separate client requests. Also, if we could overlap computation and data access time by speculating and prefetching data we could reduce the overall processing time significantly. Here we refer to big data applications with one or more of the above characteristics (i.e. requiring iterative or multiple passes of MapReduce computation, using many small files to form a HDFS chunk, dynamic datasets that are dependent on the outcome of previous rounds of computation) as a special type of MapReduce tasks. 2.2 Observation: execution time and HDFS chunks We conducted several dozens of big data applications using Hadoop on a high-performance computing cluster. Table 1 summarizes the MapReduce performance of three relatively large big data analytics tasks. % ! ! ! ! # ' , ,*, +. # $' 0-. --* -+ # ($ /.0 .0 . Table 1: Performance data for some big data applications (*requires multi-phase analysis) 2.3 Computation time vs. data fetch time In this research, we first tested and analyzed data accessing time ranging from 1K to 16MB on an HPC cluster which consists of 2 DL360 management nodes, 20 DL160 compute nodes, 3.3 TB RAM, 40GBit InfiniBand, 10GBit external Ethernet connection with overall system throughput at 36.6 Tflp at double prevision mode and 149.6 Tflp. Slurm job scheduler [8] is the primary software we use for our testing. The performance data shown in Figure 1 serve as our basis for deriving the performance of our speculation algorithms. Figure 1: Data Access Performance Base 3 Speculation and Prefetching Models 3.1 Speculation model We establish a connection graph (CG) to represent relations of commonly used tasks with tasks as nodes and edges as links between tasks. For example, link a birthday party planning task to restaurant reservation tasks as well as entertainment or recreation tasks. An address change task is linked with moving or furniture shopping tasks. The links on CG are prioritized, for example, for birthday task, the restaurant task initially is set with higher priority than the movie ticketing task. The priorities are in 0.0 to 1.0 range and are dynamically updated based on the outcome of our prediction. For example, based on the connection in CG graph and priorities of the links we predict the top two tasks following the birthday task are in order of restaurant task and movie task. If for that particular application it turns out movie task is the correct choice thus we will increase the priority by a small fraction, say 0.1 and capped to 1.0 maximum. 10 Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 24. 3.2 Prefetching algorithm Prefetching concept is inspired by the compiler- directed instruction/data prefetching technique that speculates and prefetches instructions for multiprocessing [9] [10]. Our basic fetching strategy is: overlapping with the computation of current task, we prefetch associated datasets for the next round of computation based on the task speculation. The association between task and data files can be represented as a many-to-many relations between tasks and data files. Each task is pre-associated with a list of files in the order of established ranks. For example, for the restaurant task could be associated with pizza delivery files, restaurant location files etc. The ranks are initialized based on the popularity of the service with a value between 0.0 to 1.0 range with higher value as most popular or most recommended services. The ranks are then adjusted based on the network distance of file locations with priority given to local or nearby files. Again, after the task execution, if a prefetched file turned out to be irrelevant (i.e. the whole file was filtered out at easy MapReduce stage) the rank of that file with regard to that task is reduced. Based on the system configuration we also preset two constant values K and S with K as the optimized/recommended number of containers and S the size of each container (suggest S to be the HDFS chunk size and K to be the desired number of chunks with regard to requested compute nodes.) When prefetching datasets for a set of speculated tasks, the prefetching process repeatedly reads files until it fills up all the containers. 4 Simulation Design We used a Python dictionary to implement the connection graph CG with each distinct task name as a key. The value for a key is a list of task links sorted in descending order of priorities. The task and data relations are also represented as a Python dictionary with task names as keys and a list of data file names sorted in descending order of ranks as values. Currently we simulate the effectiveness of prefetching by using parallel processes created by Slurm directly. Once the approaches are validated we will test it on Hadoop/HDFS. 4.1 Speculation Model For any current task t, the simulated speculation model always fetches the top candidate task p from the CG directory, i.e. CG[t][0] as p and starts the prefetching process. When the t completes it will choose the next task t’. If t’ is the same as p, let t be p and the process continues. If t’s is different from t, restart the prefetching process, reduce the priority for p by one level (currently 0.1) but not less than 0.0, and increase the priority of t’ by 0.1 (capped at 1.0) if it’s already in t’s connect link or add to t’s connect link with a randomly assigned priority (between 0.1 and 0.5) if it’s not in t’s connection link yet. 4.2 Prefetching Model (1)Configuration: one file node N (i.e. a process that only reads data in and writes to certain shared location), created four shared storages (arrays or dictionaries) representing the containers, C1 to C4. Initially all Ci’s are empty and each container has a current capacity and a maximum capacity (all containers may have the same maximum capacity.) It’s easily expendable to multiple file nodes and larger number of containers. (2)Assume the process p selected by the speculation scheme is associated with n small files respectively, say F1, ... Fn. Read in files in the order of F1, ..., Fn. For each file read in, record its size as sj, then searches for a container with its current capacity + sj maximum capacity, locks it once found and then pushes the content in. If no available container found, the file content is set aside and we increased our failure rate by 1 (failure rate initially is set to 0). Continue to fetch next file until it reaches the condition as spelled in (3). (3)The pre-fetching process ends when all containers reach certain percentage full (i.e. at least 80% full) or when the failure rate reaches to a certain number (say 3). Note: one failure doesn’t mean the containers are full. It could be the scenario that we fetched a very large dataset that couldn’t fit into any of current containers. However, in this case we may further fetch the files next in the list as these might be smaller files. 5 Conclusions In this research work, we studied the possibility of congregating small datasets dynamically to form large data chunk suitable for MapReduce task on Hadoop HDFS. We proposed task speculation and file prefetching models to speed up overall processing tasks. We have setup a primitive simulation test suite to assess Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | 11 ISBN: 1-60132-499-5, CSREA Press ©
  • 25. the feasibility of the speculation and prefetching models. Since currently we are designing the schemes on Slurm multiprocess environments without using HDFS, no performance gain could be measured. Our future (and on-going) work is to implement the design schemes from HPC Slurm processes onto Hadoop HDFS system and measure the effectiveness using real- world big data applications. 6 References [1] Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Google Research, https://research.google.com/archive/mapreduce- osdi04.pdf [2] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) [3] Apache Hadoop https://hadoop.apache.org/ [4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, The Google File System, https://static.googleusercontent.com/media/research.goo gle.com/en//archive/gfs-sosp2003.pdf [5] DATAFLAIR Team, 13 Big Limitations of Hadoop Solution To Hadoop Drawbacks, https://data- flair.training/blogs/13-limitations-of-hadoop/, March 7, 2019. [6] Apache Spark https://spark.apache.org/ [7] Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica, Spark: Cluster Computing with Working Sets, Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, 2010. [8] Slurm job scheduler, https://slurm.schedmd.com/ [9] Seung Woo Son, Mahmut Kandemir, Mustafa Karakoy, Dhruva Chakrabarti, A compiler-directed data prefetching scheme for chip multiprocessors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '09) [10] Ricardo Bianchinia, Beng-Hong Limb, Evaluating the Performance of Multithreading and Prefetching in Multiprocessors, https://doi.org/10.1006/jpdc.1996.0109 12 Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | ISBN: 1-60132-499-5, CSREA Press ©
  • 26. Discovering Diverse Content Through Random Scribd Documents
  • 27. THE LOVELY CHILD Lilies are both pure and fair, Growing ’midst the roses there— Roses, too, both red and pink, Are quite beautiful, I think. But of all bright blossoms—best— Purest—fairest—loveliest,— Could there be a sweeter thing Than a primrose, blossoming?
  • 28. THE YELLOWBIRD Hey! my little Yellowbird, What you doing there? Like a flashing sun-ray, Flitting everywhere: Dangling down the tall weeds And the hollyhocks, And the lordly sunflowers Along the garden-walks. Ho! my gallant Golden-bill, Pecking ’mongst the weeds, You must have for breakfast Golden flower-seeds: Won’t you tell a little fellow What you have for tea?— ’Spect a peck o’ yellow, mellow Pippin on the tree.
  • 29. ENVOY When but a little boy, it seemed My dearest rapture ran In fancy ever, when I dreamed I was a man—a man! Now—sad perversity!—my theme Of rarest, purest joy Is when, in fancy blest, I dream I am a little boy.
  • 30. *** END OF THE PROJECT GUTENBERG EBOOK ARMAZINDY *** Updated editions will replace the previous one—the old editions will be renamed. Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution. START: FULL LICENSE
  • 31. THE FULL PROJECT GUTENBERG LICENSE
  • 32. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license. Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works 1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8. 1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
  • 33. 1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others. 1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States. 1.E. Unless you have removed all references to Project Gutenberg: 1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project
  • 34. Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed: This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. 1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9. 1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work. 1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files
  • 35. containing a part of this work or any other work associated with Project Gutenberg™. 1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty
  • 36. payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright
  • 37. law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund.
  • 38. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™
  • 39. Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non- profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
  • 40. Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and
  • 41. credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 42. Welcome to our website – the ideal destination for book lovers and knowledge seekers. With a mission to inspire endlessly, we offer a vast collection of books, ranging from classic literary works to specialized publications, self-development books, and children's literature. Each book is a new journey of discovery, expanding knowledge and enriching the soul of the reade Our website is not just a platform for buying books, but a bridge connecting readers to the timeless values of culture and wisdom. With an elegant, user-friendly interface and an intelligent search system, we are committed to providing a quick and convenient shopping experience. Additionally, our special promotions and home delivery services ensure that you save time and fully enjoy the joy of reading. Let us accompany you on the journey of exploring knowledge and personal growth! ebookfinal.com