Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia

Grid Cloud and Cluster Computing 1st Edition
Hamid R. Arabnia pdf download
https://ebookfinal.com/download/grid-cloud-and-cluster-
computing-1st-edition-hamid-r-arabnia/
Explore and download more ebooks or textbooks
at ebookfinal.com

Here are some recommended products for you. Click the link to
download, or explore more at ebookfinal
Parallel and Distributed Processing Techniques and
Applications 1st Edition Hamid R. Arabnia
https://ebookfinal.com/download/parallel-and-distributed-processing-
techniques-and-applications-1st-edition-hamid-r-arabnia/
Embedded Systems Cyber Physical Systems and Applications
1st Edition Hamid R. Arabnia
https://ebookfinal.com/download/embedded-systems-cyber-physical-
systems-and-applications-1st-edition-hamid-r-arabnia/
Grid Computing A Research Monograph 1st Edition Janakiram
https://ebookfinal.com/download/grid-computing-a-research-
monograph-1st-edition-janakiram/
Introduction to Grid Computing 1st Edition Frederic
Magoules
https://ebookfinal.com/download/introduction-to-grid-computing-1st-
edition-frederic-magoules/

Cloud Computing Solutions 1st Edition Souvik Pal
https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/
Grid Computing The New Frontier of High Performance
Computing 1st Edition Lucio Grandinetti (Eds.)
https://ebookfinal.com/download/grid-computing-the-new-frontier-of-
high-performance-computing-1st-edition-lucio-grandinetti-eds/
Microsoft Private Cloud Computing 1st Edition Aidan Finn
https://ebookfinal.com/download/microsoft-private-cloud-computing-1st-
edition-aidan-finn/
Fundamentals of Grid Computing Theory Algorithms and
Technologies 1st Edition Frederic Magoules
https://ebookfinal.com/download/fundamentals-of-grid-computing-theory-
algorithms-and-technologies-1st-edition-frederic-magoules/
OpenStack Cloud Computing Cookbook 3rd Edition Over 110
effective recipes to help you build and operate OpenStack
cloud computing storage networking and automation Kevin
Jackson
https://ebookfinal.com/download/openstack-cloud-computing-
cookbook-3rd-edition-over-110-effective-recipes-to-help-you-build-and-
operate-openstack-cloud-computing-storage-networking-and-automation-
kevin-jackson/

Grid Cloud and Cluster Computing 1st Edition Hamid R.
Arabnia Digital Instant Download
Author(s): Hamid R. Arabnia; Leonidas Deligiannidis; Fernando G. Tinetti
ISBN(s): 9781683925699, 1683925696
Edition: 1
File Details: PDF, 1.79 MB
Year: 2017
Language: english

P u b l i c a t i o n o f t h e 2 0 1 9 W o r l d C o n g r e s s i n C o m p u t e r S c i e n c e ,
C o m p u t e r E n g i n e e r i n g , & A p p l i e d C o m p u t i n g ( C S C E ’ 1 9 )
J u l y 2 9 - A u g u s t 0 1 , 2 0 1 9 | L a s V e g a s , N e v a d a , U S A
h t t p s : / / a m e r i c a n c s e . o r g / e v e n t s / c s c e 2 0 1 9
Copyright © 2019 CSREA Press
GCC’19
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON
GRID, CLOUD, & CLUSTER COMPUTING
Editors
Hamid R. Arabnia
Leonidas Deligiannidis, Fernando G. Tinetti
WORLDCOMP’19
Grid,
Cloud,
and
Cluster
Computing
Arabnia
324993
781601
9
ISBN 9781601324993
54995
U.S. $49.95
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON
GRID, CLOUD, & CLUSTER COMPUTING
EMBD-GCC19_Full-Cover.indd All Pages 18-Feb-20 5:28:50 PM

American Council on Science and Education (ACSE)
Copyright © 2019 CSREA Press
ISBN: 1-60132-499-5
Printed in the United States of America
https://americancse.org/events/csce2019/proceedings
This volume contains papers presented at the 2019 International Conference on Grid, Cloud, & Cluster
Computing. Their inclusion in this publication does not necessarily constitute endorsements by editors or by the
publisher.
Copyright and Reprint Permission
Copying without a fee is permitted provided that the copies are not made or distributed for direct
commercial advantage, and credit to source is given. Abstracting is permitted with credit to the source.
Please contact the publisher for other copying, reprint, or republication permission.

Foreword
It gives us great pleasure to introduce this collection of papers to be presented at the 2019 International
Conference on Grid, Cloud, and Cluster Computing (GCC’19), July 29 – August 1, 2019, at Luxor Hotel (a
property of MGM Resorts International), Las Vegas, USA. The preliminary edition of this book (available
in July 2019 for distribution on site at the conference) includes only a small subset of the accepted research
articles. The final edition (available in August 2019) will include all accepted research articles. This is due
to deadline extension requests received from most authors who wished to continue enhancing the write-up
of their papers (by incorporating the referees’ suggestions). The final edition of the proceedings will be
made available at https://americancse.org/events/csce2019/proceedings .
An important mission of the World Congress in Computer Science, Computer Engineering, and Applied
Computing, CSCE (a federated congress to which this conference is affiliated with) includes "Providing a
unique platform for a diverse community of constituents composed of scholars, researchers, developers,
educators, and practitioners. The Congress makes concerted effort to reach out to participants affiliated
with diverse entities (such as: universities, institutions, corporations, government agencies, and research
centers/labs) from all over the world. The congress also attempts to connect participants from institutions
that have teaching as their main mission with those who are affiliated with institutions that have research
as their main mission. The congress uses a quota system to achieve its institution and geography diversity
objectives." By any definition of diversity, this congress is among the most diverse scientific meeting in
USA. We are proud to report that this federated congress has authors and participants from 67 different
nations representing variety of personal and scientific experiences that arise from differences in culture and
values. As can be seen (see below), the program committee of this conference as well as the program
committee of all other tracks of the federated congress are as diverse as its authors and participants.
The program committee would like to thank all those who submitted papers for consideration. About 70%
of the submissions were from outside the United States. Each submitted paper was peer-reviewed by two
experts in the field for originality, significance, clarity, impact, and soundness. In cases of contradictory
recommendations, a member of the conference program committee was charged to make the final decision;
often, this involved seeking help from additional referees. In addition, papers whose authors included a
member of the conference program committee were evaluated using the double-blinded review process.
One exception to the above evaluation process was for papers that were submitted directly to
chairs/organizers of pre-approved sessions/workshops; in these cases, the chairs/organizers were
responsible for the evaluation of such submissions. The overall paper acceptance rate for regular papers
was 18%; 20% of the remaining papers were accepted as poster papers (at the time of this writing, we had
not yet received the acceptance rate for a couple of individual tracks.)
We are very grateful to the many colleagues who offered their services in organizing the conference. In
particular, we would like to thank the members of Program Committee of GCC’19, members of the
congress Steering Committee, and members of the committees of federated congress tracks that have topics
within the scope of GCC. Many individuals listed below, will be requested after the conference to provide
their expertise and services for selecting papers for publication (extended versions) in journal special
issues as well as for publication in a set of research books (to be prepared for publishers including:
Springer, Elsevier, BMC journals, and others).
 Prof. Emeritus Nizar Al-Holou (Congress Steering Committee); Professor and Chair, Electrical
and Computer Engineering Department; Vice Chair, IEEE/SEM-Computer Chapter; University of
Detroit Mercy, Detroit, Michigan, USA
 Prof. Hamid R. Arabnia (Congress Steering Committee); Graduate Program Director (PhD, MS,
MAMS); The University of Georgia, USA; Editor-in-Chief, Journal of Supercomputing (Springer);
Editor-in-Chief, Transactions of Computational Science & Computational Intelligence (Springer);
Fellow, Center of Excellence in Terrorism, Resilience, Intelligence & Organized Crime Research
(CENTRIC).

 Prof. Dr. Juan-Vicente Capella-Hernandez; Universitat Politecnica de Valencia (UPV),
Department of Computer Engineering (DISCA), Valencia, Spain
 Prof. Emeritus Kevin Daimi (Congress Steering Committee); Director, Computer Science and
Software Engineering Programs, Department of Mathematics, Computer Science and Software
Engineering, University of Detroit Mercy, Detroit, Michigan, USA
 Prof. Leonidas Deligiannidis (Congress Steering Committee); Department of Computer
Information Systems, Wentworth Institute of Technology, Boston, Massachusetts, USA; Visiting
Professor, MIT, USA
 Prof. Mary Mehrnoosh Eshaghian-Wilner (Congress Steering Committee); Professor of
Engineering Practice, University of Southern California, California, USA; Adjunct Professor,
Electrical Engineering, University of California Los Angeles, Los Angeles (UCLA), California,
USA
 Prof. Louie Lolong Lacatan; Chairperson, Computer Engineerig Department, College of
Engineering, Adamson University, Manila, Philippines; Senior Member, International Association
of Computer Science and Information Technology (IACSIT), Singapore; Member, International
Association of Online Engineering (IAOE), Austria
 Prof. Hyo Jong Lee; Director, Center for Advanced Image and Information Technology, Division
of Computer Science and Engineering, Chonbuk National University, South Korea
 Dr. Ali Mostafaeipour; Industrial Engineering Department, Yazd University, Yazd, Iran
 Dr. Houssem Eddine Nouri; Informatics Applied in Management, Institut Superieur de Gestion de
Tunis, University of Tunis, Tunisia
 Prof. Dr., Eng. Robert Ehimen Okonigene (Congress Steering Committee); Department of
Electrical & Electronics Engineering, Faculty of Engineering and Technology, Ambrose Alli
University, Edo State, Nigeria
 Ashu M. G. Solo (Publicity), Fellow of British Computer Society, Principal/R&D Engineer,
Maverick Technologies America Inc.
 Prof. Fernando G. Tinetti (Congress Steering Committee); School of Computer Science,
Universidad Nacional de La Plata, La Plata, Argentina; also at Comision Investigaciones
Cientificas de la Prov. de Bs. As., Argentina
 Prof. Layne T. Watson (Congress Steering Committee); Fellow of IEEE; Fellow of The National
Institute of Aerospace; Professor of Computer Science, Mathematics, and Aerospace and Ocean
Engineering, Virginia Polytechnic Institute & State University, Blacksburg, Virginia, USA
 Prof. Jane You (Congress Steering Committee); Associate Head, Department of Computing, The
Hong Kong Polytechnic University, Kowloon, Hong Kong
 Dr. Farhana H. Zulkernine; Coordinator of the Cognitive Science Program, School of Computing,
Queen's University, Kingston, ON, Canada
We would like to extend our appreciation to the referees, the members of the program committees of
individual sessions, tracks, and workshops; their names do not appear in this document; they are listed on
the web sites of individual tracks.
As Sponsors-at-large, partners, and/or organizers each of the followings (separated by semicolons)
provided help for at least one track of the Congress: Computer Science Research, Education, and
Applications Press (CSREA); US Chapter of World Academy of Science; American Council on Science &
Education & Federated Research Council (http://www.americancse.org/). In addition, a number of
university faculty members and their staff (names appear on the cover of the set of proceedings), several
publishers of computer science and computer engineering books and journals, chapters and/or task forces of
computer science associations/organizations from 3 regions, and developers of high-performance machines
and systems provided significant help in organizing the conference as well as providing some resources.
We are grateful to them all.
We express our gratitude to keynote, invited, and individual conference/tracks and tutorial speakers - the
list of speakers appears on the conference web site. We would also like to thank the followings: UCMSS
(Universal Conference Management Systems & Support, California, USA) for managing all aspects of the

conference; Dr. Tim Field of APC for coordinating and managing the printing of the proceedings; and the
staff of Luxor Hotel (Convention department) at Las Vegas for the professional service they provided. Last
but not least, we would like to thank the Co-Editors of GCC’19: Prof. Hamid R. Arabnia, Prof. Leonidas
Deligiannidis, and Prof. Fernando G. Tinetti.
We present the proceedings of GCC’19.
Steering Committee, 2019
http://americancse.org/

Contents
SESSION: HIGH-PERFORMANCE COMPUTING - CLOUD COMPUTING
The Design and Implementation of Astronomical Data Analysis System on HPC Cloud 3
Jaegyoon Hahm, Ju-Won Park, Hyeyoung Cho, Min-Su Shin, Chang Hee Ree
SESSION: HIGH-PERFORMANCE COMPUTING - HADOOP FRAMEWORK
A Speculation and Prefetching Model for Efficient Computation of MapReduce Tasks on
Hadoop HDFS System
9
Lan Yang
15
SESSION: LATE BREAKING PAPER: CLOUD MIGRATION
Critical Risk Management Practices to Mitigate Cloud Migration Misconfigurations
Michael Atadika, Karen Burke, Neil Rowe

SESSION
HIGH-PERFORMANCE COMPUTING - CLOUD
COMPUTING
Chair(s)
TBA
Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 | 1
ISBN: 1-60132-499-5, CSREA Press ©

2 Int'l Conf. Grid, Cloud, & Cluster Computing | GCC'19 |

The Design and Implementation of Astronomical Data
Analysis System on HPC Cloud
Jaegyoon Hahm1
, Ju-Won Park1
, Hyeyoung Cho1
, Min-Su Shin2
, and Chang Hee Ree2
1
Supercomputing Infrastructure Center, Korea Institute of Science and Technology Information, Daejeon,
Republic of Korea
2
Galaxy Evolution Research Group, Korea Astronomy and Space Science Institute, Daejeon, Republic of
Korea
Abstract - Astronomy is a representative data-intensive
science that can take advantage of cloud computing because
it requires flexible infrastructure services for variable
workloads and various data analysis tools. The purpose of
this study is to show the usefulness of cloud computing as a
research environment required to analyze large scale data in
science, such as astronomy. We implemented an OpenStack
cloud and a Kubernetes-based orchestration service for
scientific data analysis. On the cloud, we have successfully
constructed data analysis systems with a task scheduler and
an in-memory database tool to support the task processing
and data I/O environment which are required in astronomical
researches. Furthermore, we aim to construct high-
performance cloud service for various data-intensive
research in more scientific fields.
Keywords: cloud computing, astronomical data analysis,
data analysis platform, openstack, kubernetes
1 Introduction
Recently, in the field of science and technology, more
and more data is generated through advanced data-capturing
sources [1]. And naturally, researchers are increasingly using
cutting-edge data analysis techniques, such as big data
analysis and machine learning. Astronomy is a typical field
of collecting and analyzing large amounts of data through
various observation tools, such as astronomical telescopes,
and data growth rate will increase rapidly in the near future.
As a notable example, Large Synoptic Survey Telescope
(LSST) will start to produce large volume of datasets up to
20TB per day from observing large area of the sky in full
operations from 2023. Total database for ten years is
expected to be 60 PB for the raw data, and 15 PB for the
catalog database [2]. As another big data project, Square
Kilometer Array (SKA), which will be constructed as the
largest in the world radio telescope until 2024, is also
projected to generate and archive 130-300PB per year [3].
In this era of data deluge, there is a growing demand for
utilizing cloud computing for data intensive sciences.
Particular, the astronomical research has demands to utilize
cloud computing, which is the ability to acquire resources for
simulation-driven numerical experiments or mass data
analysis in an immediate and dynamic way. Therefore, the
type of cloud service that is expected by astronomical science
researchers will be an Infrastructure as a Service (IaaS) for
flexible resources for running with existing software and
research methodologies and a Platform as a Service (PaaS) to
be applied with new data analytic tools.
In this paper, we propose a methodology and feasibility of
cloud computing that focuses on flexible use of resources and
astronomical science researchers' problems when using cloud
services. Section 2 introduces related researches, and Section
3 describes the features and requirements of the target
application. In Section 4 we describe the implementation of
the data analysis system for the target application. Finally, in
Section 5 we provide the conclusions and future plans.
2 Related Works
There have been several examples of cloud applications
for astronomical research. The Gemini Observatory has been
building a new archive using EC2, EBS, S3 and GLACIER from
the Amazon Web Services (AWS) cloud to replace the
existing Gemini Science Archive (GSA) [4]. In addition,
Williams et al.(2018) have conducted studies to reduce the
Panchromatic Hubble Andromeda Treasury (PHAT)
photometric data set using Amazon EC2 [5].
Unlike the cases of using public clouds, there are also studies
that use an private cloud environment to be built to perform
astronomical researches. AstroCloud [6] is a distributed
cloud platform which integrates lots of data management and
processing tasks for Chinese Virtual Observatory (China-
VO). In addition, Hahm et al.(2012) developed a platform for
constructing virtual machine-based condor clusters for
analyzing astronomical time-series data in a private cloud
[7]. The purpose of this study was to confirm the possibility
of constructing a cluster type analysis platform to perform
mass astronomical data analysis in a cloud environment.

3 Application Requirements and Design
The application used in this study is MAGPHYS SED
fitting code, which reads and analyzes the data of the
brightness and color of galaxies to estimate their physical
properties. The used data is the large-scale survey data of
Galaxy And Mass Assembly (GAMA), which is a project to
exploit the latest generation of ground-based and space-borne
survey facilities to study cosmology and galaxy formation
and evolution [8]. On a single processor, MAGPHYS will
typically take 10 min to run for a single galaxy. In Figure 1,
the data analysis in the application starts with the data
obtained by analyzing original image data collected from the
telescope through preprocessing. The preprocessed data is a
text file DB, which is input data for the analysis. The
application extracts the data one line at a time from the input
file, submits it to the task queue together with the spectral
analysis code, and creates a new DB by storing the analyzed
result in the output file. In a traditional research
environment, the analysis will be done by building its own
clusters for data analysis, or through a job scheduler in a
shared cluster.
Fig. 1. Data Analysis Workflow
The main technical challenge of the application is to achieve
faster data I/O and use their own task queue for convenience.
GAMA dataset has information of 197,000 galaxies
approximately. However, file-based I/O is too slow and hard
to manage in this size of dataset. Therefore, a fast data I/O
tools and a job scheduler for high-throughput batch
processing are required.
To satisfy these requirements, we designed two types of
lightweight data analysis systems. First, data is read through
file I/O as usual and data processing environment is
configured by using asynchronous task scheduler for analysis
work (see Figure 2). In this case, we need a shared file system
that can be accessed by a large number of workers
performing analysis tasks. Second, as shown in Figure 3,
data input is performed through in-memory DB instead of
file reading for faster I/O, and the output of the analysis
result is also stored in the in-memory DB.
4 Cloud Data Analysis System
Implementation
4.1 KISTI HPC Infrastructure as a Service
Korea Institute of Science and Technology Information
(KISTI) is building a high-performance cloud service in
order to support data-intensive researches in various science
and technology research fields. This is because emerging
data-centric sciences require more flexible and dynamic
computing environment than traditional HPC service.
Especially big data and deep learning researches need
customized HPC resources in a flexible manner. So, KISTI
cloud will be a service providing customizable high-
performance computing and storage resources, such as
supercomputer, GPU cluster, etc.
In the first stage, the cloud service will be implemented on a
supercomputer. KISTI's newly introduced supercomputer
NURION is a national strategic research infrastructure to
Fig. 2. Data Analysis with Task Scheduler and File I/O
Fig. 3. Data Analysis with In-memory DB
Fig. 4. OpenStack Cloud Testbed

support R&Ds in various fields. In particular, it has a plan to
utilize it for data intensive computing and artificial
intelligence. In order to build such a service environment, we
will be leverage cloud computing technologies.
In order to design the service and verify the required skills,
we have constructed an OPENSTACK-based IaaS cloud testbed
system using a computational cluster (see Figure 4).
OPENSTACK [9] is a cloud deployment platform that is used as a
de facto standard in industry and research, and is well suited
to cloud deployments for high-performance computing too.
The cluster used for the testbed has thirteen Intel Xeon-based
servers: one deployment node, one controller node, three
storage nodes and compute nodes for the rest. OPENSTACK
services implemented here are NOVA (Compute), GLANCE
(Image), NEUTRON (Network), CINDER (Block Storage), SWIFT
(Object Storage), KEYNOTE (Identity), HEAT (Orchestration),
HORIZON (Dashboard), MANILA (Network Filesystem) and MAGNUM
(Container). In the case of storage, CEPH [10] storage was
configured using three servers and used as a backend for
GLANCE, CINDER, SWIFT, and MANILA services. Apart from this,
we have configured the Docker-based KUBERNETES
orchestration environment using MAGNUM. KUBERNETES is an
open source platform for automating Linux container
operations [11]. In this study, it is composed with one
KUBERNETES master and four workers.
4.2 Implementation of Data Analysis Platform
in the Cloud
The data analysis system constructed in this study
focuses on how to configure task scheduler and data I/O
environment for task processing. We describe the
architecture of the analysis system in Figure 5. First, the task
scheduler should efficiently distribute and process individual
tasks asynchronously. We adopted a lightweight task
scheduler so that it can be dynamically configured and used
independently, differently from the shared job schedulers,
such as PBS and SLURM in the conventional HPC system. In
particular, tasks for astronomical data analysis, which
require long research time, are often asynchronous
processing rather than synchronous processing. In the
experiments, we used DASK [12] and CELERY [13] as task
schedulers, which are readily available to scientific and
technological researchers and are likely to be used in a
common data analysis environment. The structure of the
scheduler is very simple, consisting of a scheduler and
workers. We write a Python code to submit tasks to the
scheduler and manage data. The difference between DASK
and CELERY is that DASK allocates and monitors tasks in its
own scheduler module, whereas CELERY worker’s tasks are
assigned from a separate message queue, such as RABBITMQ.
In the data I/O environment configuration, the initial
experiment was conducted by configuring a shared file
system using OPENSTACK MANILA for the file-based I/O
processing used in the existing analysis environment.
However, in data processing, file I/O is significantly slower
than computing performance, which causes severe
performance degradation in analyzing the entire data. In
order to solve this bottleneck problem and improve the
overall performance, we used an in-memory DB tool called
REDIS [14]. REDIS is a memory-based key-value store that is
known to handle more than 100,000 transactions per second.
heat_template_version: queens
…
parameters:
worker_num:
default: 16
…
…
resources:
scheduler:
type: OS::Nova::Server
properties:
name: dask-scheduler
image: Ubuntu 18.04 Redis with Dask
…
template: |
#!/bin/bash
pip3 install dask distributed --upgrade
…
dask-scheduler &
workers:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: worker_num }
resource_def:
type: OS::Nova::Server
properties:
name: dask-worker%index%
image: Ubuntu 18.04 Redis with Dask
…
template: |
#!/bin/bash
apt-get install redis-server -y
pip3 install dask distributed --upgrade
…
dask-worker dask-scheduler:8786 &
outputs:
instance_name:
…
instance_ip:
…
Fig. 6. HEAT Template for Analysis Platform with REDIS & DASK
Fig. 5. Data Analytics System Architecture
.

A combination of task scheduler and data I/O environment
can be created and configured automatically in an
orchestration environment through OPENSTACK HEAT or
KUBERNETES in our self-contained cloud. Figure 6 is the
structure of one of the HEAT templates used in this
experiment. The template is structured with parameters and
resources. The resource is composed of scheduler and
workers, and the required softwares are installed and
configured for each scheduler and workers after boot-up.
5 Conclusion and Future Work
Through experiments, we have successfully analyzed
about 5,300 galaxy brightness and color data in a parallel
distributed processing environment consisting of DASK or
CELERY with REDIS. Figure 7 shows one of the example galaxy
from the GAMA data showing a result of the MAGPHYS
analysis in the cloud. With OPENSTACK-based cloud, we
confirmed that the research environment, especially data
analysis system with tools like task scheduler and in-memory
DB, can be automatically configured and well-utilized. In
addition, we confirmed the availability of an elastic service
environment through the cloud to meet the demand for large-
scale data analysis with volatility.
Fig. 7. An example result of the MAGPHYS analysis on the cloud
In this study, we have identified some useful aspects of the
cloud for data-driven research. First, we confirmed that it is
easy to build an independent execution environment that
provides the necessary software stack for research through
the cloud. Also in a cloud environment, researchers can
easily reuse the same research environment and share
research experience by reusing virtual machines or container
images deployed by the research community.
In the next step, we will configure an environment for real-
time processing of in-memory cache data. For practical real-
time data processing, it is necessary to construct an optimal
environment for data I/O as well as memory-based data
processing in stream, and various experiments need be
performed through the cloud. Based on the experiences of
building astronomical big data processing environment in
this study, we will provide more flexible and high
performance cloud service and let researchers utilize it in
various fields of data-centric researches.
6 References
[1] T. Hey, S. Tansley and K. Tolle, The Fourth Paradigm:
Data-intensive Scientific Discovery, Microsoft Research,
2009.
[2] LSST Corporation. About LSST: Data Management.
[Online]. Available from: https://www.lsst.org/about/dm/
2019.03.10
[3] P. Diamond, SKA Community Briefing. [Online].
Available from https://www.skatelescope.org/ska-
community-briefing-18jan2017/ 2019.03.10
[4] P. Hirest and R. Cardenes, “The new Gemini
Observatory archvieve: a fast and low cost observatory data
archive running in the cloud”, Proc. SPIE 9913, Software
and Cyberinfrastructure for Astronomy IV, 99131E (8
August 2016); doi: 10.1117/12.2231833
[5] B. F. Williams, K. Olsen, R. Khan, D. Pirone and K.
Rosema, “Reducing and analyzing the PHAT survey with the
cloud”, The Astrophysical Journal Supplemement Series,
Volume 236, Number 1
[6] C. Cui et al., “AstroCloud: a distributed cloud
computing and application platform for astronomy”, Proc.
WCSN2016
[7] J. Hahm et al., “Astronomical time series data analysis
leveraging sceince cloud”, Proc. Embedded and Multimedia
Computing Tehnology and Service, pp493-500, 2012
[8] S. P. Driver et al., “Galaxy And Mass Assembly
(GAMA): Panchromatic Data Release (far-UV-far-IR) and
the low-z energy budget”, MNRAS 455, 3911-3942, 2016.
[9] OpenStack Foundation. OpenStack Overview. [Online].
Available from: https://www.openstack.org/software/
2019.03.10
[10] Red Hat Inc. Ceph Introduction. [Online]. Available
from: https://ceph.com/ceph-storage/ 2019.03.10
[11] The Kubernetes Authors. What is Kubernetes?. [Online].
Available from: https://kubernetes.io/docs/concepts/overview/what-
is-kubernetes/2019.03.10
[12] Dask Core Developers, Why Dask?. [Online]. Available
from: https://docs.dask.org/en/latest/why.html 2019.03.10
[13] A. Solem, Celery - Distributed Task Queue. [Online].
Available from: http://docs.celeryproject.org/en/latest/index.html
2019.03.10
[14] S. Sanfilippo, Introduction to Redis. [Online]. Available
from: https://redis.io/topics/introduction 2019.03.10.

SESSION
HIGH-PERFORMANCE COMPUTING - HADOOP
FRAMEWORK
Chair(s)
TBA

A Speculation and Prefetching Model for Efficient
Computation of MapReduce Tasks on Hadoop HDFS
System
Lan Yang
Computer Science Department
California State Polytechnic University, Pomona
Pomona, CA 91768, USA
Abstract - MapReduce programming model and Hadoop
software framework are keys to big data processing on high
performance computing (HPC) clusters. The Hadoop
Distributed File System (HDFS) is designed to stream large
data sets at high bandwidth. However, Hadoop suffers from a
set of drawbacks, particularly having issues with small files
as well as dynamic datasets. In this research we target big
data applications working with many on-demand datasets of
varying sizes. We propose a speculation model that
prefetches anticipated datasets for upcoming tasks in support
of efficient big data processing on HPC clusters.
Keywords: Prefetching, Speculation, Hadoop, MapReduce,
High performance computing cluster.
1 Introduction
Along with the emerging technology of cloud
computing, Google proposed the MapReduce
programming model [1] that allows for massive
scalability of unstructured data across hundreds or
thousands of high performance computing nodes.
Hadoop is an open source software framework that
performs distributed processing for huge data sets across
the cluster of commodity servers simultaneously. [2]
Now distributed as Apache Hadoop [3] many cloud
services such as AWS, Cloudera, HortonWorks, and
IBM InfoSphere Insights employ Apace Hadoop to offer
big data solutions. The Hadoop Distributed File System
(HDFS) [2], inspired by Google File System (GFS) [4],
is a reliable filesystem of Hadoop designed for storing
very large files running on a cluster of commodity
hardware. To process big data in Apache Hadoop, the
client submits data and program to Hadoop. HDFS
stores the data while MapReduce processes the data.
While Hadoop is a powerful tool for processing
massive data it suffers from a set of drawbacks
including issues with small files, no real time data
processing and for batch processing only [5]. The
Apache Spark [6] partially solved Hadoop’s real time
and batch processing problems by introducing in-
memory processing [7]. As a model of Hadoop
ecosystem Spark doesn’t have its own distributed
filesystem, though it can use HDFS. Hadoop does not
suit for small data due to the factor that HDFS lacks the
ability to efficiently support the random reading of
small files because of its high capacity design. Small
files are the major problem in HDFS.
In this research, we study a special type of iterative
MapReduce tasks working on HDFS with input datasets
coming from many small files dynamically, i.e. on-
demand. We propose a data prefetching speculation
model aiming at improving the performance and
flexibility of big data processing on Hadoop HDFS for
that special type of MapReduce tasks.
2 Background
2.1 Description of a special type of MapReduce
tasks
In today’s big data world, MapReduce programming
model and Hadoop software framework remain as
popular tools for big data processing. Based on a
number of big data applications performed on Hadoop
we observed the following:
(1) An HDFS file splits into chunks, typically of 64-
128MB in size. To benefit from Hadoop’s parallel
processing ability an HDFS file must be large enough to
be divided into multiple chunks. Therefore, a file is
considered as small if it is significantly smaller than the
HDFS chunk size.
(2) While many big data applications use large data
files that could be pushed to HDFS input directory prior
to task execution, some applications use many small
datasets distributed across a wide range.
Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | 9

(3) With the increasing demand of big data processing,
more and more applications now require multiple
rounds (or iterations) of processing with each round
requiring new datasets determined on the outcome of
previous computation. For example, in a data processing
application for a legal system, the first round
MapReduce computation uses prest ored case
documents, while the second round might require
accessing to certain assets or utilities datasets based on
the case outcomes resulted from the first-round analysis.
The assets or utilities datasets could consist of hundreds
to thousands of files ranging from 1KB to 10MB with
only dozens of files relevant depending on the outcome
of the first round. It would be very inefficient or
inflexible if we have to divide these two rounds into
separate client requests. Also, if we could overlap
computation and data access time by speculating and
prefetching data we could reduce the overall processing
time significantly. Here we refer to big data applications
with one or more of the above characteristics (i.e.
requiring iterative or multiple passes of MapReduce
computation, using many small files to form a HDFS
chunk, dynamic datasets that are dependent on the
outcome of previous rounds of computation) as a special
type of MapReduce tasks.
2.2 Observation: execution time and HDFS
chunks
We conducted several dozens of big data applications
using Hadoop on a high-performance computing cluster.
Table 1 summarizes the MapReduce performance of
three relatively large big data analytics tasks.

%
!

!
! !

# '
, ,*, +.
#
$'
0-.

--* -+
#
($
/.0

.0 .
Table 1: Performance data for some big data
applications (*requires multi-phase analysis)
2.3 Computation time vs. data fetch time
In this research, we first tested and analyzed data
accessing time ranging from 1K to 16MB on an HPC
cluster which consists of 2 DL360 management nodes,
20 DL160 compute nodes, 3.3 TB RAM, 40GBit
InfiniBand, 10GBit external Ethernet connection with
overall system throughput at 36.6 Tflp at double
prevision mode and 149.6 Tflp. Slurm job scheduler [8]
is the primary software we use for our testing. The
performance data shown in Figure 1 serve as our basis
for deriving the performance of our speculation
algorithms.
Figure 1: Data Access Performance Base
3 Speculation and Prefetching Models
3.1 Speculation model
We establish a connection graph (CG) to represent
relations of commonly used tasks with tasks as nodes
and edges as links between tasks. For example, link a
birthday party planning task to restaurant reservation
tasks as well as entertainment or recreation tasks. An
address change task is linked with moving or furniture
shopping tasks. The links on CG are prioritized, for
example, for birthday task, the restaurant task initially is
set with higher priority than the movie ticketing task.
The priorities are in 0.0 to 1.0 range and are
dynamically updated based on the outcome of our
prediction. For example, based on the connection in CG
graph and priorities of the links we predict the top two
tasks following the birthday task are in order of
restaurant task and movie task. If for that particular
application it turns out movie task is the correct choice
thus we will increase the priority by a small fraction,
say 0.1 and capped to 1.0 maximum.
10 Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 |

3.2 Prefetching algorithm
Prefetching concept is inspired by the compiler-
directed instruction/data prefetching technique that
speculates and prefetches instructions for
multiprocessing [9] [10]. Our basic fetching strategy is:
overlapping with the computation of current task, we
prefetch associated datasets for the next round of
computation based on the task speculation.
The association between task and data files can be
represented as a many-to-many relations between tasks
and data files. Each task is pre-associated with a list of
files in the order of established ranks. For example, for
the restaurant task could be associated with pizza
delivery files, restaurant location files etc. The ranks are
initialized based on the popularity of the service with a
value between 0.0 to 1.0 range with higher value as
most popular or most recommended services. The ranks
are then adjusted based on the network distance of file
locations with priority given to local or nearby files.
Again, after the task execution, if a prefetched file
turned out to be irrelevant (i.e. the whole file was
filtered out at easy MapReduce stage) the rank of that
file with regard to that task is reduced.
Based on the system configuration we also preset two
constant values K and S with K as the
optimized/recommended number of containers and S the
size of each container (suggest S to be the HDFS chunk
size and K to be the desired number of chunks with
regard to requested compute nodes.) When prefetching
datasets for a set of speculated tasks, the prefetching
process repeatedly reads files until it fills up all the
containers.
4 Simulation Design
We used a Python dictionary to implement the
connection graph CG with each distinct task name as a
key. The value for a key is a list of task links sorted in
descending order of priorities. The task and data
relations are also represented as a Python dictionary
with task names as keys and a list of data file names
sorted in descending order of ranks as values. Currently
we simulate the effectiveness of prefetching by using
parallel processes created by Slurm directly. Once the
approaches are validated we will test it on
Hadoop/HDFS.
4.1 Speculation Model
For any current task t, the simulated speculation
model always fetches the top candidate task p from the
CG directory, i.e. CG[t][0] as p and starts the
prefetching process. When the t completes it will choose
the next task t’. If t’ is the same as p, let t be p and the
process continues. If t’s is different from t, restart the
prefetching process, reduce the priority for p by one
level (currently 0.1) but not less than 0.0, and increase
the priority of t’ by 0.1 (capped at 1.0) if it’s already in
t’s connect link or add to t’s connect link with a
randomly assigned priority (between 0.1 and 0.5) if it’s
not in t’s connection link yet.
4.2 Prefetching Model
(1)Configuration: one file node N (i.e. a process that
only reads data in and writes to certain shared
location), created four shared storages (arrays or
dictionaries) representing the containers, C1 to C4.
Initially all Ci’s are empty and each container has a
current capacity and a maximum capacity (all
containers may have the same maximum capacity.)
It’s easily expendable to multiple file nodes and
larger number of containers.
(2)Assume the process p selected by the speculation
scheme is associated with n small files respectively,
say F1, ... Fn. Read in files in the order of F1, ...,
Fn. For each file read in, record its size as sj, then
searches for a container with its current capacity + sj
maximum capacity, locks it once found and then
pushes the content in. If no available container
found, the file content is set aside and we increased
our failure rate by 1 (failure rate initially is set to 0).
Continue to fetch next file until it reaches the
condition as spelled in (3).
(3)The pre-fetching process ends when all containers
reach certain percentage full (i.e. at least 80% full)
or when the failure rate reaches to a certain number
(say 3). Note: one failure doesn’t mean the
containers are full. It could be the scenario that we
fetched a very large dataset that couldn’t fit into any
of current containers. However, in this case we may
further fetch the files next in the list as these might
be smaller files.
5 Conclusions
In this research work, we studied the possibility of
congregating small datasets dynamically to form large
data chunk suitable for MapReduce task on Hadoop
HDFS. We proposed task speculation and file
prefetching models to speed up overall processing tasks.
We have setup a primitive simulation test suite to assess
Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 | 11

the feasibility of the speculation and prefetching
models. Since currently we are designing the schemes
on Slurm multiprocess environments without using
HDFS, no performance gain could be measured. Our
future (and on-going) work is to implement the design
schemes from HPC Slurm processes onto Hadoop
HDFS system and measure the effectiveness using real-
world big data applications.
6 References
[1] Jeffrey Dean and Sanjay Ghemawat, MapReduce:
Simplified Data Processing on Large Clusters, Google
Research,
https://research.google.com/archive/mapreduce-
osdi04.pdf
[2] Konstantin Shvachko, Hairong Kuang, Sanjay
Radia, Robert Chansler, The Hadoop Distributed File
System, 2010 IEEE 26th Symposium on Mass Storage
Systems and Technologies (MSST)
[3] Apache Hadoop https://hadoop.apache.org/
[4] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak
Leung, The Google File System,
https://static.googleusercontent.com/media/research.goo
gle.com/en//archive/gfs-sosp2003.pdf
[5] DATAFLAIR Team, 13 Big Limitations of Hadoop
Solution To Hadoop Drawbacks, https://data-
flair.training/blogs/13-limitations-of-hadoop/, March 7,
2019.
[6] Apache Spark https://spark.apache.org/
[7] Matei Zaharia, Mosharaf Chowdhury, Michael
Franklin, Scott Shenker, Ion Stoica, Spark: Cluster
Computing with Working Sets, Proceedings of the 2nd
USENIX conference on Hot topics in cloud computing,
2010.
[8] Slurm job scheduler, https://slurm.schedmd.com/
[9] Seung Woo Son, Mahmut Kandemir, Mustafa
Karakoy, Dhruva Chakrabarti, A compiler-directed data
prefetching scheme for chip multiprocessors,
Proceedings of the 14th ACM SIGPLAN symposium on
Principles and practice of parallel programming (PPoPP
'09)
[10] Ricardo Bianchinia, Beng-Hong Limb, Evaluating
the Performance of Multithreading and Prefetching in
Multiprocessors, https://doi.org/10.1006/jpdc.1996.0109
12 Int'l Conf. Grid, Cloud, Cluster Computing | GCC'19 |

Discovering Diverse Content Through
Random Scribd Documents

THE LOVELY CHILD
Lilies are both pure and fair,
Growing ’midst the roses there—
Roses, too, both red and pink,
Are quite beautiful, I think.
But of all bright blossoms—best—
Purest—fairest—loveliest,—
Could there be a sweeter thing
Than a primrose, blossoming?

THE YELLOWBIRD
Hey! my little Yellowbird,
What you doing there?
Like a flashing sun-ray,
Flitting everywhere:
Dangling down the tall weeds
And the hollyhocks,
And the lordly sunflowers
Along the garden-walks.
Ho! my gallant Golden-bill,
Pecking ’mongst the weeds,
You must have for breakfast
Golden flower-seeds:
Won’t you tell a little fellow
What you have for tea?—
’Spect a peck o’ yellow, mellow
Pippin on the tree.

ENVOY
When but a little boy, it seemed
My dearest rapture ran
In fancy ever, when I dreamed
I was a man—a man!
Now—sad perversity!—my theme
Of rarest, purest joy
Is when, in fancy blest, I dream
I am a little boy.

*** END OF THE PROJECT GUTENBERG EBOOK ARMAZINDY ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project

Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files

containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty

payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright

law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.

If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and

credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookfinal.com

Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia

More Related Content

Similar to Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia (20)

Recently uploaded (20)

Grid Cloud and Cluster Computing 1st Edition Hamid R. Arabnia