SlideShare a Scribd company logo
PERFORMING INITIATIVE DATA PREFETCHING IN
DISTRIBUTED FILE SYSTEMS FOR CLOUD COMPUTING
ABSTRACT
This paper presents an initiative data prefetching scheme on the storage servers in distributed file
systems for cloud computing. In this prefetching technique, the client machines are not
substantially involved in the process of data prefetching, but the storage servers can directly
prefetch the data after analyzing the history of disk I/O access events, and then send the
prefetched data to the relevant client machines proactively. To put this technique to work, the
information about client nodes is piggybacked onto the real client I/O requests, and then
forwarded to the relevant storage server. Next, two prediction algorithms have been proposed to
forecast future block access operations for directing what data should be fetched on storage
servers in advance. Finally, the prefetched data can be pushed to the relevant client machine
from the storage server. Through a series of evaluation experiments with a collection of
application benchmarks, we have demonstrated that our presented initiative prefetching
technique can benefit distributed file systems for cloud environments to achieve better I/O
performance. In particular, configuration-limited client machines in the cloud are not responsible
for predicting I/O access operations, which can definitely contribute to preferable system
performance on them.
Proposed system:
The proposed mechanism first analyzes disk I/O tracks to predict the future disk I/O access so
that the storage servers can fetch data in advance, and then forward the prefetched data to
relevant client file systems for future potential usages. In short, this paper makes the following
two contributions:
1) Chaotic time series prediction and linear regression prediction to forecast disk I/O access.
We have modeled the disk I/O access operations, and classified them into two kinds of access
patterns, i.e. the random access pattern and the sequential access pattern. Therefore, in order to
predict the future I/O access that belongs to the different access patterns as accurately as possible
(note that the future I/O access indicates what data will be requested in the near future), two
prediction algorithms including the chaotic time series prediction algorithm and the linear
regression prediction algorithm have been proposed respectively.
2) Initiative data prefetching on storage servers.
Without any intervention from client file systems except for piggybacking their information onto
relevant I/O requests to the storage servers. The storage servers are supposed to log disk I/O
access and classify access patterns after modeling disk I/O events. Next, by properly using two
proposed prediction algorithms, the storage servers can predict the future disk I/O access to
guide prefetching data. Finally, the storage servers proactively forward the prefetched data to the
relevant client filesystems for satisfying future application’s requests.
MODULES
ASSUMPTIONS IN APPLICATION CONTEXTS
PIGGYBACKING CLIENT INFORMATION
I/O ACCESS PREDICTION
INITIATIVE DATA PREFETCHING
ASSUMPTIONS IN APPLICATION CONTEXTS
This newly presented prefetching mechanism cannot work well for all workloads in the real
world, and its target application contexts must meet two assumptions:
Assumption 1:
Resource-limited client machines. This newly proposed prefetching mechanism can be used
primarily for the clouds that have many resource limited client machines, not for generic cloud
environments.
This is a reasonable assumption given that mobile cloud computing, which employs powerful
cloud infrastructures to offer computing and storage services on demand, for alleviating resource
utilization in mobile devices .
Assumption 2:
On-Line Transaction Processing (OLTP) applications. It is true that all prefetching schemes in
distributed file systems make sense for a limited number of read-intensive applications such as
database-related OLTP and server-like applications.
PIGGYBACKING CLIENT INFORMATION
Most of the I/O tracing approaches proposed by other researchers focus on the logical I/O access
events occurred on the client file systems, which might be useful for affirming application’s I/O
access patterns. Nevertheless, without relevant information about physical I/O access, it is
difficult to build the connection between the applications and the distributed file system for
improving the I/O performance to a great extent.
In this newly presented initiative prefetching approach, the data is prefetched by storage servers
after analyzing disk I/O traces, and the data is then proactively pushed to the relevant client file
system for satisfying potential application’s requests. Thus, for the storage servers, it is necessary
to understand the information about client file systems and applications. To this end, we leverage
a piggybacking mechanism, which is illustrated in Figure 1, to transfer related information from
the client node to storage servers for contributing to modeling disk I/O access patterns and
forwarding the prefetched data.
I/O ACCESS PREDICTION
Many heuristic algorithms have been proposed to shepherd distributing file data on disk storage;
as a result, data stripes that are expected to be used together will be located close to one another.
An automatic time series modeling and prediction framework to direct prefetching adaptively,
this work employs ARIMA (Autoregressive Integrated Moving Average model) time series
models to forecast the future I/O access by resorting to temporal patterns of I/O requests.
However, none of the mentioned techniques aims to guide predicting physical access via
analyzing disk I/O traces, since two neighbor I/O operations in the raw disk trace might not have
any dependency. This section will illustrate the specifications about how to predict disk I/O
operations, including modeling disk I/Os and two prediction algorithms adopted by our newly
presented initiative prefetching mechanism.
INITIATIVE DATA PREFETCHING
The scheme of initiative data prefetching is a novel idea presented in this paper, and the
architecture of this scheme is demonstrated, while it handles read requests (the assumed synopsis
of a read operation is read (int fields, size t size, off t off)).the storage server can predict the
future read operation by analyzing the history of disk I/Os, so that it can directly issue a physical
read request to fetch data in advance.
The most attractive idea in the figure is that the perfected data will be forwarded to the relevant
client file system proactively, but the client file system is not involved in both prediction and
prefetching procedures.
Finally, the client file system can respond to the hit read request sent by the application with the
buffered data. As a result, the read latency on the application side can be reduced significantly.
Furthermore, the client machine, which might have limited computing power and energy supply,
can focus on its own work rather than predict in related tasks.

More Related Content

PDF
Performing initiative data prefetching in distributed file systems for cloud ...
PDF
Performing initiative data prefetching
PPT
PDF
L04302088092
PDF
Security Check in Cloud Computing through Third Party Auditor
PPTX
Dont look at this
DOC
PERFORMING INITIATIVE DATA PERFECTING IN DISTRIBUTED FILE SYSTEMS FOR CLOUD C...
PDF
Homomorphic authentication with random masking technique ensuring privacy
Performing initiative data prefetching in distributed file systems for cloud ...
Performing initiative data prefetching
L04302088092
Security Check in Cloud Computing through Third Party Auditor
Dont look at this
PERFORMING INITIATIVE DATA PERFECTING IN DISTRIBUTED FILE SYSTEMS FOR CLOUD C...
Homomorphic authentication with random masking technique ensuring privacy

What's hot (6)

PPTX
Towards secure and dependable storage
PDF
Towards Secure and Dependable Storage Services in Cloud Computing
PPTX
Ensuring data integrity on cloud data storage
DOCX
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
DOCX
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
PDF
Implementing Proof of Retriavaibility for Multiple Replica of Data File using...
Towards secure and dependable storage
Towards Secure and Dependable Storage Services in Cloud Computing
Ensuring data integrity on cloud data storage
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Implementing Proof of Retriavaibility for Multiple Replica of Data File using...
Ad

Viewers also liked (13)

DOCX
PERFORMING INITIATIVE DATA PREFETCHING IN DISTRIBUTED FILE SYSTEMS FOR CLOUD ...
PDF
The Lundeen PLan-Installment Eleven
PDF
Airport Cooperative Research Program
PDF
TheLundeenPlan-InstallmentTwelve
PDF
Healthy Habits to Protect Your Heart
DOCX
Agent based interactions and economic
DOCX
A computational dynamic trust model for user authorization
DOCX
Achieving flatness selecting the honeywords
DOCX
Anonymity based privacy-preserving data
DOC
A distributed three hop routing protocol to increase the
DOCX
Audit free cloud storage via deniable attribute based encryption
DOCX
Charm a cost efficient multi cloud data hosting scheme with high availability
DOCX
A secure anti collusion data sharing scheme for dynamic groups in the cloud
PERFORMING INITIATIVE DATA PREFETCHING IN DISTRIBUTED FILE SYSTEMS FOR CLOUD ...
The Lundeen PLan-Installment Eleven
Airport Cooperative Research Program
TheLundeenPlan-InstallmentTwelve
Healthy Habits to Protect Your Heart
Agent based interactions and economic
A computational dynamic trust model for user authorization
Achieving flatness selecting the honeywords
Anonymity based privacy-preserving data
A distributed three hop routing protocol to increase the
Audit free cloud storage via deniable attribute based encryption
Charm a cost efficient multi cloud data hosting scheme with high availability
A secure anti collusion data sharing scheme for dynamic groups in the cloud
Ad

Similar to Performing initiative data prefetching (20)

PDF
A Taxonomy of Data Prefetching Mechanisms
PPTX
Flashy prefetching for high performance flash drives
PDF
Cad report
PDF
thilaganga journal 2
PPTX
Thilaganga mphil cs viva presentation ppt
PPTX
Gives an overview of intelligent storage system
PDF
thilaganga journal 1
PPTX
05. performance-concepts
PDF
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
PDF
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
PDF
Using Semi-supervised Classifier to Forecast Extreme CPU Utilization
PPTX
Enery efficient data prefetching
PDF
Top 6 Reasons to Use a Distributed Data Grid
KEY
Optimization Techniques at the I/O Forwarding Layer
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
GPFS Solution Brief
PDF
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
PPTX
ngs07.data-center.ssadasdasdasdlides.pptx
PDF
Machine Learning-Based Prefetch Optimization for Data Center ...
PDF
A compendium on load forecasting approaches and models
A Taxonomy of Data Prefetching Mechanisms
Flashy prefetching for high performance flash drives
Cad report
thilaganga journal 2
Thilaganga mphil cs viva presentation ppt
Gives an overview of intelligent storage system
thilaganga journal 1
05. performance-concepts
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
USING SEMI-SUPERVISED CLASSIFIER TO FORECAST EXTREME CPU UTILIZATION
Using Semi-supervised Classifier to Forecast Extreme CPU Utilization
Enery efficient data prefetching
Top 6 Reasons to Use a Distributed Data Grid
Optimization Techniques at the I/O Forwarding Layer
International Journal of Engineering Research and Development (IJERD)
GPFS Solution Brief
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
ngs07.data-center.ssadasdasdasdlides.pptx
Machine Learning-Based Prefetch Optimization for Data Center ...
A compendium on load forecasting approaches and models

Recently uploaded (20)

PPTX
Lecture Notes Electrical Wiring System Components
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
DOCX
573137875-Attendance-Management-System-original
PPTX
web development for engineering and engineering
PPTX
Geodesy 1.pptx...............................................
PPT
Project quality management in manufacturing
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT 4 Total Quality Management .pptx
Lecture Notes Electrical Wiring System Components
R24 SURVEYING LAB MANUAL for civil enggi
CYBER-CRIMES AND SECURITY A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Internet of Things (IOT) - A guide to understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
573137875-Attendance-Management-System-original
web development for engineering and engineering
Geodesy 1.pptx...............................................
Project quality management in manufacturing
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT 4 Total Quality Management .pptx

Performing initiative data prefetching

  • 1. PERFORMING INITIATIVE DATA PREFETCHING IN DISTRIBUTED FILE SYSTEMS FOR CLOUD COMPUTING ABSTRACT This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloud computing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but the storage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched data to the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto the real client I/O requests, and then forwarded to the relevant storage server. Next, two prediction algorithms have been proposed to forecast future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetched data can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with a collection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed file systems for cloud environments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud are not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them. Proposed system: The proposed mechanism first analyzes disk I/O tracks to predict the future disk I/O access so that the storage servers can fetch data in advance, and then forward the prefetched data to relevant client file systems for future potential usages. In short, this paper makes the following two contributions: 1) Chaotic time series prediction and linear regression prediction to forecast disk I/O access. We have modeled the disk I/O access operations, and classified them into two kinds of access patterns, i.e. the random access pattern and the sequential access pattern. Therefore, in order to predict the future I/O access that belongs to the different access patterns as accurately as possible (note that the future I/O access indicates what data will be requested in the near future), two
  • 2. prediction algorithms including the chaotic time series prediction algorithm and the linear regression prediction algorithm have been proposed respectively. 2) Initiative data prefetching on storage servers. Without any intervention from client file systems except for piggybacking their information onto relevant I/O requests to the storage servers. The storage servers are supposed to log disk I/O access and classify access patterns after modeling disk I/O events. Next, by properly using two proposed prediction algorithms, the storage servers can predict the future disk I/O access to guide prefetching data. Finally, the storage servers proactively forward the prefetched data to the relevant client filesystems for satisfying future application’s requests. MODULES ASSUMPTIONS IN APPLICATION CONTEXTS PIGGYBACKING CLIENT INFORMATION I/O ACCESS PREDICTION INITIATIVE DATA PREFETCHING ASSUMPTIONS IN APPLICATION CONTEXTS This newly presented prefetching mechanism cannot work well for all workloads in the real world, and its target application contexts must meet two assumptions: Assumption 1: Resource-limited client machines. This newly proposed prefetching mechanism can be used primarily for the clouds that have many resource limited client machines, not for generic cloud environments. This is a reasonable assumption given that mobile cloud computing, which employs powerful cloud infrastructures to offer computing and storage services on demand, for alleviating resource utilization in mobile devices . Assumption 2: On-Line Transaction Processing (OLTP) applications. It is true that all prefetching schemes in distributed file systems make sense for a limited number of read-intensive applications such as database-related OLTP and server-like applications.
  • 3. PIGGYBACKING CLIENT INFORMATION Most of the I/O tracing approaches proposed by other researchers focus on the logical I/O access events occurred on the client file systems, which might be useful for affirming application’s I/O access patterns. Nevertheless, without relevant information about physical I/O access, it is difficult to build the connection between the applications and the distributed file system for improving the I/O performance to a great extent. In this newly presented initiative prefetching approach, the data is prefetched by storage servers after analyzing disk I/O traces, and the data is then proactively pushed to the relevant client file system for satisfying potential application’s requests. Thus, for the storage servers, it is necessary to understand the information about client file systems and applications. To this end, we leverage a piggybacking mechanism, which is illustrated in Figure 1, to transfer related information from the client node to storage servers for contributing to modeling disk I/O access patterns and forwarding the prefetched data. I/O ACCESS PREDICTION Many heuristic algorithms have been proposed to shepherd distributing file data on disk storage; as a result, data stripes that are expected to be used together will be located close to one another. An automatic time series modeling and prediction framework to direct prefetching adaptively, this work employs ARIMA (Autoregressive Integrated Moving Average model) time series models to forecast the future I/O access by resorting to temporal patterns of I/O requests. However, none of the mentioned techniques aims to guide predicting physical access via analyzing disk I/O traces, since two neighbor I/O operations in the raw disk trace might not have any dependency. This section will illustrate the specifications about how to predict disk I/O operations, including modeling disk I/Os and two prediction algorithms adopted by our newly presented initiative prefetching mechanism.
  • 4. INITIATIVE DATA PREFETCHING The scheme of initiative data prefetching is a novel idea presented in this paper, and the architecture of this scheme is demonstrated, while it handles read requests (the assumed synopsis of a read operation is read (int fields, size t size, off t off)).the storage server can predict the future read operation by analyzing the history of disk I/Os, so that it can directly issue a physical read request to fetch data in advance. The most attractive idea in the figure is that the perfected data will be forwarded to the relevant client file system proactively, but the client file system is not involved in both prediction and prefetching procedures. Finally, the client file system can respond to the hit read request sent by the application with the buffered data. As a result, the read latency on the application side can be reduced significantly. Furthermore, the client machine, which might have limited computing power and energy supply, can focus on its own work rather than predict in related tasks.