SlideShare a Scribd company logo
Application-Level Optimization of Big Data Transfers through Pipelining,
Parallelism and Concurrency
Abstract:
In end-to-end data transfers, there are several factors affecting the data transfer
throughput, such as the network characteristics (e.g., network bandwidth, round-
trip-time, background traffic); end-system characteristics (e.g., NIC capacity,
number of CPU cores and their clock rate, number of disk drives and their I/O
rate); and the dataset characteristics (e.g., average file size, dataset size, file size
distribution). Optimization of big data transfers over inter-cloud and intra-
cloud networks is a challenging task that requires joint-consideration of all of
these parameters. This optimization task becomes even more challenging when
transferring datasets comprised of heterogeneous file sizes (i.e., large files and
small files mixed). Previous work in this area only focuses on the end-system and
network characteristics however does not provide models regarding the dataset
characteristics. In this study, we analyze the effects of the three most important
transfer parameters that are used to enhance data transfer throughput:
pipelining,parallelism and concurrency. We provide models and guidelines to set
the best values for these parameters and present two different transfer
optimization algorithms that use the models developed. The tests conducted over
high-speed networking and cloud testbeds show that our algorithms outperform
the most popular data transfer tools like Globus Online and UDT in majority of the
cases.

More Related Content

PDF
Dynamic adaptation balman
PDF
Ijcatr04071003
DOCX
EXPLOITING EFFICIENT AND SCALABLE SHUFFLE TRANSFERS IN FUTURE DATA CENTER NET...
PPTX
Presentation on osi layer
DOC
Networking
PDF
Layers and Peer to Peer Process - DCCN
PPTX
Parallelism aware batch scheduling
PPTX
Node level parallism in Hadoop
Dynamic adaptation balman
Ijcatr04071003
EXPLOITING EFFICIENT AND SCALABLE SHUFFLE TRANSFERS IN FUTURE DATA CENTER NET...
Presentation on osi layer
Networking
Layers and Peer to Peer Process - DCCN
Parallelism aware batch scheduling
Node level parallism in Hadoop

Viewers also liked (7)

PPT
PMSCS 657_Parallel and Distributed processing
PPT
Computer Architecture: A quantitative approach - Cap4 - Section 8
PPT
DATA SCIENCE Lesson 2 Parallelism Computing Data Processing Performance Measu...
PPTX
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
PPTX
INSTRUCTION LEVEL PARALLALISM
PDF
Instruction Level Parallelism (ILP) Limitations
PPT
Parallel Computing
PMSCS 657_Parallel and Distributed processing
Computer Architecture: A quantitative approach - Cap4 - Section 8
DATA SCIENCE Lesson 2 Parallelism Computing Data Processing Performance Measu...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
INSTRUCTION LEVEL PARALLALISM
Instruction Level Parallelism (ILP) Limitations
Parallel Computing
Ad

Similar to Application level optimization of big data transfers through pipelining, parallelism and concurrency (20)

PDF
In network aggregation techniques for wireless sensor networks - a survey
PDF
Traffic-aware adaptive server load balancing for softwaredefined networks
DOCX
Transfer reliability and congestion control strategies in opportunistic netwo...
DOCX
JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...
PDF
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
PDF
Web based-distributed-sesnzer-using-service-oriented-architecture
DOCX
Optimal configuration of network
PDF
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
PDF
A New Architecture for Group Replication in Data Grid
PDF
Internet data mining 2006
DOCX
Dynamic control of coding for progressive packet arrivals in dtns
PPT
2005-03-17 Air Quality Cluster TechTrack
PPT
Ws Stuff
PDF
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
PPTX
Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases
PDF
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
PDF
Green wsn optimization of energy use
PDF
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...
PDF
50120130406035
PDF
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
In network aggregation techniques for wireless sensor networks - a survey
Traffic-aware adaptive server load balancing for softwaredefined networks
Transfer reliability and congestion control strategies in opportunistic netwo...
JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Web based-distributed-sesnzer-using-service-oriented-architecture
Optimal configuration of network
Survey on Synchronizing File Operations Along with Storage Scalable Mechanism
A New Architecture for Group Replication in Data Grid
Internet data mining 2006
Dynamic control of coding for progressive packet arrivals in dtns
2005-03-17 Air Quality Cluster TechTrack
Ws Stuff
Cross Layer- Performance Enhancement Architecture (CL-PEA) for MANET
Workload-Aware Data Management in Shared-Nothing Distributed OLTP Databases
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Green wsn optimization of energy use
GREEN WSN- OPTIMIZATION OF ENERGY USE THROUGH REDUCTION IN COMMUNICATION WORK...
50120130406035
A Platform for Large-Scale Grid Data Service on Dynamic High-Performance Netw...
Ad

More from ieeepondy (20)

PDF
Demand aware network function placement
PDF
Service description in the nfv revolution trends, challenges and a way forward
PDF
Secure optimization computation outsourcing in cloud computing a case study o...
PDF
Spatial related traffic sign inspection for inventory purposes using mobile l...
PDF
Standards for hybrid clouds
PDF
Rfhoc a random forest approach to auto-tuning hadoop's configuration
PDF
Resource and instance hour minimization for deadline constrained dag applicat...
PDF
Reliable and confidential cloud storage with efficient data forwarding functi...
PDF
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
PDF
Scalable cloud–sensor architecture for the internet of things
PDF
Scalable algorithms for nearest neighbor joins on big trajectory data
PDF
Robust workload and energy management for sustainable data centers
PDF
Privacy preserving deep computation model on cloud for big data feature learning
PDF
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
PDF
Protection of big data privacy
PDF
Power optimization with bler constraint for wireless fronthauls in c ran
PDF
Performance aware cloud resource allocation via fitness-enabled auction
PDF
Performance limitations of a text search application running in cloud instances
PDF
Performance analysis and optimal cooperative cluster size for randomly distri...
PDF
Predictive control for energy aware consolidation in cloud datacenters
Demand aware network function placement
Service description in the nfv revolution trends, challenges and a way forward
Secure optimization computation outsourcing in cloud computing a case study o...
Spatial related traffic sign inspection for inventory purposes using mobile l...
Standards for hybrid clouds
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Resource and instance hour minimization for deadline constrained dag applicat...
Reliable and confidential cloud storage with efficient data forwarding functi...
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
Scalable cloud–sensor architecture for the internet of things
Scalable algorithms for nearest neighbor joins on big trajectory data
Robust workload and energy management for sustainable data centers
Privacy preserving deep computation model on cloud for big data feature learning
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
Protection of big data privacy
Power optimization with bler constraint for wireless fronthauls in c ran
Performance aware cloud resource allocation via fitness-enabled auction
Performance limitations of a text search application running in cloud instances
Performance analysis and optimal cooperative cluster size for randomly distri...
Predictive control for energy aware consolidation in cloud datacenters

Recently uploaded (20)

PPTX
Construction Project Organization Group 2.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
web development for engineering and engineering
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
composite construction of structures.pdf
DOCX
573137875-Attendance-Management-System-original
PPTX
Sustainable Sites - Green Building Construction
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
Construction Project Organization Group 2.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
web development for engineering and engineering
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
CYBER-CRIMES AND SECURITY A guide to understanding
Geodesy 1.pptx...............................................
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Embodied AI: Ushering in the Next Era of Intelligent Systems
CH1 Production IntroductoryConcepts.pptx
composite construction of structures.pdf
573137875-Attendance-Management-System-original
Sustainable Sites - Green Building Construction
Operating System & Kernel Study Guide-1 - converted.pdf

Application level optimization of big data transfers through pipelining, parallelism and concurrency

  • 1. Application-Level Optimization of Big Data Transfers through Pipelining, Parallelism and Concurrency Abstract: In end-to-end data transfers, there are several factors affecting the data transfer throughput, such as the network characteristics (e.g., network bandwidth, round- trip-time, background traffic); end-system characteristics (e.g., NIC capacity, number of CPU cores and their clock rate, number of disk drives and their I/O rate); and the dataset characteristics (e.g., average file size, dataset size, file size distribution). Optimization of big data transfers over inter-cloud and intra- cloud networks is a challenging task that requires joint-consideration of all of these parameters. This optimization task becomes even more challenging when transferring datasets comprised of heterogeneous file sizes (i.e., large files and small files mixed). Previous work in this area only focuses on the end-system and network characteristics however does not provide models regarding the dataset characteristics. In this study, we analyze the effects of the three most important transfer parameters that are used to enhance data transfer throughput: pipelining,parallelism and concurrency. We provide models and guidelines to set the best values for these parameters and present two different transfer optimization algorithms that use the models developed. The tests conducted over high-speed networking and cloud testbeds show that our algorithms outperform the most popular data transfer tools like Globus Online and UDT in majority of the cases.