Application level optimization of big data transfers through pipelining, parallelism and concurrency

Application-Level Optimization of Big Data Transfers through Pipelining,
Parallelism and Concurrency
Abstract:
In end-to-end data transfers, there are several factors affecting the data transfer
throughput, such as the network characteristics (e.g., network bandwidth, round-
trip-time, background traffic); end-system characteristics (e.g., NIC capacity,
number of CPU cores and their clock rate, number of disk drives and their I/O
rate); and the dataset characteristics (e.g., average file size, dataset size, file size
distribution). Optimization of big data transfers over inter-cloud and intra-
cloud networks is a challenging task that requires joint-consideration of all of
these parameters. This optimization task becomes even more challenging when
transferring datasets comprised of heterogeneous file sizes (i.e., large files and
small files mixed). Previous work in this area only focuses on the end-system and
network characteristics however does not provide models regarding the dataset
characteristics. In this study, we analyze the effects of the three most important
transfer parameters that are used to enhance data transfer throughput:
pipelining,parallelism and concurrency. We provide models and guidelines to set
the best values for these parameters and present two different transfer
optimization algorithms that use the models developed. The tests conducted over
high-speed networking and cloud testbeds show that our algorithms outperform
the most popular data transfer tools like Globus Online and UDT in majority of the
cases.

Application level optimization of big data transfers through pipelining, parallelism and concurrency

More Related Content

Viewers also liked (7)

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency (20)

More from ieeepondy (20)

Recently uploaded (20)

Application level optimization of big data transfers through pipelining, parallelism and concurrency