SlideShare a Scribd company logo
Adaptive Transfer Adjustment in Efficient Bulk
Data Transfer Management for Climate
Datasets
Alex Sim, Mehmet Balman, Dean Williams,
Arie Shoshani, Vijaya Natarajan
Lawrence Berkeley National Laboratory
Lawrence Livermore National Laboratory
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
ESG (Earth System Grid)
Supports the infrastructure for climate research
Provide technology to access, distributed, transport, catalog climate
simulation data
Production since 2004
ESG-I (1999-2001)
ESG-II (2002-2006)
ESG-CET(2006-present)
ANL, LANL, LBNL, LLNL,
NCAR, ORNL, NERSC, …
NCAR CCSM ESG portal
237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL)
965,551 files
Includes the past 7 years of joint DOE/NSF climate modeling experiments
LLNL CMIP-3 (IPCC AR4) ESG portal
35 TB of data at one location
83,337 files
model data from 13 countries
Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change (IPCC)
Over 565 scientific peer-review publications
ESG (Earth System Grid) / Climate Simulation Data
ESG (Earth System Grid) / Climate Simulation Data
ESG web portals distributed worldwide
Over 2,700 sites
120 countries
25,000 users
Over 1 PB downloaded
Climate Data : ever-increasing sizes
Early 1990’s (e.g., AMIP1, PMIP, CMIP1)
modest collection of monthly mean 2D files: ~1 GB
Late 1990’s (e.g., AMIP2)
large collection of monthly mean and 6-hourly 2D and 3D fields:
~500 GB
2000’s (e.g., IPCC/CMIP3)
fairly comprehensive output from both ocean and atmospheric
components; monthly, daily, and 3 hourly: ~35 TB
2011:
The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB
The Climate Science Computational End Station (CCES) project at
ORNL: expected around 3 PB
The North American Regional Climate Change Assessment Program
(NARCCAP): expected around 1 PB
The Cloud Feedback Model Intercomparison Project (CFMIP) archives:
expected to be .3 PB
ESG (Earth System Grid) / Climate Simulation Data
Results from the Parallel Climate Model (PCM)
depicting wind vectors, surface pressure, sea
surface temperature, and sea ice concentration.
Prepared from data published in the ESG using the
FERRET analysis tool by Gary Strand, NCAR.
Massive data collections:
● shared by thousands of
researchers
● distributed among many data
nodes around the world
Replication of published core
collection (for scalability and
availability)
Bulk Data Movement in ESG
● Move terabytes to petabytes (many thousands of files)
● Extreme variance in file sizes
● Reliability and Robustness
● Asynchronous long-lasting operation
● Recovery from transient failures and automatic restart
● Support for checksum verification
● On-demand transfer request status
● Estimation of request completion time
Replication Use-case
Compute Nodes
Compute Nodes
Data Flow
Data Nodes
Gateways
Compute Nodes
Clients
Data Nodes
Data Node
Japan Gateway
Germany Gateway
Data Node
Data Nodes
Data Node
Data Node
Data Nodes Data Nodes
Data Nodes
Australia Gateway
NCAR Gateway
LLNL Gateway
ORNL Gateway
Canada Gateway
LBNL Gateway
UK Gateway
Compute Nodes
Compute Nodes
WAN
Distributed File System
. . .
: : : : : :
BeStMan . . .
Data Node
Load Balancing Module
Parallelism / Concurrency
Module
File Replication Module
Transfer Servers
. . .
. . .
. . .
. . .
. . .
Hot File Replications
Transfer Servers Transfer Servers
Data Flow
Faster Data Transfers
End-to-end bulk data transfer (latency wall)
 TCP based solutions
 Fast TCP, Scalable TCP etc
 UDP based solutions
 RBUDP, UDT etc
 Most of these solutions require kernel level changes
 Not preferred by most domain scientists
Application Level Tuning
 Take an application-level transfer protocol (i.e.
GridFTP) and tune-up for better performance:
 Using Multiple (Parallel) streams
 Tuning Buffer size
(efficient utilization of available network capacity)
Level of Parallelism in End-to-end Data Transfer
 number of parallel data streams connected to a data transfer
service for increasing the utilization of network bandwidth
 number of concurrent data transfer operations that are
initiated at the same time for better utilization of system
resources.
Parallel TCP Streams
 Instead of a single connection at a time, multiple TCP
streams are opened to a single data transfer service in
the destination host.
 We gain larger bandwidth in TCP especially in a
network with less packet loss rate; parallel connections
better utilize the TCP buffer available to the data
transfer, such that N connections might be N times
faster than a single connection
 Multiple TCP streams puts extra system overhead
Parallel TCP Streams
Bulk Data Mover (BDM)
● Monitoring and statistics collection
● Support for multiple protocols (GridFTP, HTTP)
● Load balancing between multiple servers
● Data channel connection caching, pipelining
● Parallel TCP stream, concurrent data transfer operations
● Multi-threaded transfer queue management
● Adaptability to the available bandwidth
Bulk Data Mover (BDM)
DBFileFile File
…
FileFileFileFile
File File File
…
WAN
Network Connection and
Concurrency Manager
Transfer
Queue
Network
Connections
Transfer
Control Manager
Transfer Queue
Monitor and Manager
Concurrent Connections
…
DB/Storage
Queue Manager
Local Storage
…
File
File
…
File
File
…
File
File
…
Transfer Queue Management
* Plots generated from NetLogger
time time
timetime
The number of concurrent
transfers on the left
column shows consistent
over time in well-managed
transfers shown at the
bottom row, compared to
the ill or non-managed
data connections shown at
the top row. It leads to the
higher overall throughput
performance on the lower-
right column.
Adaptive Transfer Management
Bulk Data Mover (BDM)
● number parallel TCP streams?
● number of concurrent data transfer operations?
● Adaptability to the available bandwidth
Parallel Stream vs Concurrent Transfer
16x232x1 4x8
• Same number of total streams, but different number of concurrent connections
Parameter Estimation
 Can we predict this
behavior?
 Yes, we can come up with
a good estimation for the
parallelism level
 Network statistics
 Extra measurement
 Historical data
Parallel Stream Optimization
single stream, theoretical calculation of
throughput based on MSS, RTT and packet loss
rate:
n streams gains as much as total throughput of
n single stream: (not correct)
A better model: a relation is established
between RTT, p and the number of streams n:
Parameter Estimation
 Might not reflect the best possible current settings
(Dynamic Environment)
 What if network condition changes?
 Requires three sample transfers (curve fitting)
 need to probe the system and make
measurements with external profilers
 Does require a complex model for parameter
optimization
Adaptive Tuning
 Instead of predictive sampling, use data from
actual transfer
 transfer data by chunks (partial transfers) and
also set control parameters on the fly.
 measure throughput for every transferred data
chunk
 gradually increase the number of parallel
streams till it comes to an equilibrium point
Adaptive Tuning
 No need to probe the system and make
measurements with external profilers
 Does not require any complex model for
parameter optimization
 Adapts to changing environment
But, overhead in changing parallelism level
Fast start (exponentially increase the number of
parallel streams)
Adaptive Tuning
 Start with single stream (n=1)
 Measure instant throughput for every data chunk transferred
(fast start)
 Increase the number of parallel streams (n=n*2),
 transfer the data chunk
 measure instant throughput
 If current throughput value is better than previous one,
continue
 Otherwise, set n to the old value and gradually increase
parallelism level (n=n+1)
 If no throughput gain by increasing number of streams
(found the equilibrium point)
 Increase chunk size (delay measurement period)
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Parallel Streams (estimate starting point)
Log-log scale
Can we predict
this behavior?
Power-law model
Achievable throughput in percentage over the
number of streams with low/medium/high RTT;
(a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e)
RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT)
Power-law model
T = (n / c) (RTT / k)
80-20 rule-Pareto dist.
0.8 = (n / c) (RTT / k)
n = (e ( k * ln 0.8 / RTT )
) ∙ c
Extend power-law model:
Unlike other models in the literature (trying to find an approximation model for
the multiple streams and throughput relationship), this model only focuses on the
initial behavior of the transfer performance. When RTT is low, the achievable
throughput starts high with the low number of streams and quickly approaches to
the optimal throughput. When RTT is high, more number of streams is needed
for higher achievable throughput.
Get prepared to next-generation networks:
-100Gbps
-RDMA
Future Work
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
Bulk Data Mover
http://sdm.lbl.gov/bdm/
Earth System Grid
http://www.earthsystemgrid.org
http://esg-pcmdi.llnl.gov/
Support emails
esg-support@earthsystemgrid.org
srm@lbl.gov

More Related Content

PDF
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
PDF
Self-adaptive container monitoring with performance-aware Load-Shedding policies
PDF
Self-adaptive container monitoring with performance-aware Load-Shedding policies
PDF
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
PPT
pMatlab on BlueGene
PPTX
Distributed approximate spectral clustering for large scale datasets
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
Migration of groups of virtual machines in distributed data centers to reduce...
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
pMatlab on BlueGene
Distributed approximate spectral clustering for large scale datasets
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Migration of groups of virtual machines in distributed data centers to reduce...

What's hot (19)

PDF
Download-manuals-surface water-waterlevel-40howtocompiledischargedata
PPT
5.1 mining data streams
PPTX
Efficient processing of Rank-aware queries in Map/Reduce
PPT
Many Task Applications for Grids and Supercomputers
PPTX
Paper2_CSE6331_Vivek_1001053883
PDF
FDSE2015
PDF
HACC: Fitting the Universe Inside a Supercomputer
PDF
Hadoop secondary sort and a custom comparator
PPT
Rinfret, Jonathan poster(2)
PPTX
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
PPTX
Pcgrid presentation qos p2p grid
PDF
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
PPTX
Mining high speed data streams: Hoeffding and VFDT
PDF
PIC Tier-1 (LHCP Conference / Barcelona)
PDF
NNPDF3.0: parton distributions for the LHC Run II
PDF
An Enhanced Support Vector Regression Model for Weather Forecasting
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PPTX
829 tdwg-2015-nicolson-kew-strings-to-things
PDF
Continental division of load and balanced ant
Download-manuals-surface water-waterlevel-40howtocompiledischargedata
5.1 mining data streams
Efficient processing of Rank-aware queries in Map/Reduce
Many Task Applications for Grids and Supercomputers
Paper2_CSE6331_Vivek_1001053883
FDSE2015
HACC: Fitting the Universe Inside a Supercomputer
Hadoop secondary sort and a custom comparator
Rinfret, Jonathan poster(2)
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
Pcgrid presentation qos p2p grid
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
Mining high speed data streams: Hoeffding and VFDT
PIC Tier-1 (LHCP Conference / Barcelona)
NNPDF3.0: parton distributions for the LHC Run II
An Enhanced Support Vector Regression Model for Weather Forecasting
Project Matsu: Elastic Clouds for Disaster Relief
829 tdwg-2015-nicolson-kew-strings-to-things
Continental division of load and balanced ant
Ad

Similar to Pdcs2010 balman-presentation (20)

PDF
Presentation southernstork 2009-nov-southernworkshop
PDF
Lblc sseminar jun09-2009-jun09-lblcsseminar
PDF
SUSTAINABLE HIGH-SPEED DATA TRANSFER TECHNIQUES IN SHARED NETWORK ENVIRONMENTS
PPTX
Taming Big Data!
PDF
A location based least-cost scheduling for data-intensive applications
PDF
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
PDF
Dynamic adaptation balman
PPTX
Взгляд на облака с точки зрения HPC
PDF
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
PDF
Streaming exa-scale data over 100Gbps networks
PDF
Dq36708711
PDF
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
PPTX
Managing and monitoring large scale data transfers - Networkshop44
PDF
Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
PDF
Gfarm Fs Tatebe Tip2004
PPTX
Iwsm2014 performance measurement for cloud computing applications using iso...
PPTX
Protocols for Fast Delivery of Large Data Volumes
PDF
A way of managing data center networks
PDF
Optimizing Data Plane Resources for Multipath Flows
PDF
Balman climate-c sc-ads-2011
Presentation southernstork 2009-nov-southernworkshop
Lblc sseminar jun09-2009-jun09-lblcsseminar
SUSTAINABLE HIGH-SPEED DATA TRANSFER TECHNIQUES IN SHARED NETWORK ENVIRONMENTS
Taming Big Data!
A location based least-cost scheduling for data-intensive applications
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Dynamic adaptation balman
Взгляд на облака с точки зрения HPC
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Streaming exa-scale data over 100Gbps networks
Dq36708711
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Managing and monitoring large scale data transfers - Networkshop44
Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
Gfarm Fs Tatebe Tip2004
Iwsm2014 performance measurement for cloud computing applications using iso...
Protocols for Fast Delivery of Large Data Volumes
A way of managing data center networks
Optimizing Data Plane Resources for Multipath Flows
Balman climate-c sc-ads-2011
Ad

More from balmanme (19)

PDF
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
PDF
Experiences with High-bandwidth Networks
PDF
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
PDF
Balman stork cw09
PDF
Available technologies: algorithm for flexible bandwidth reservations for dat...
PDF
Berkeley lab team develops flexible reservation algorithm for advance network...
PDF
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
PDF
Cybertools stork-2009-cybertools allhandmeeting-poster
PDF
Presentation summerstudent 2009-aug09-lbl-summer
PDF
Balman dissertation Copyright @ 2010 Mehmet Balman
PDF
Aug17presentation.v2 2009-aug09-lblc sseminar
PDF
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
PDF
Opening ndm2012 sc12
PDF
Sc10 nov16th-flex res-presentation
PDF
Welcome ndm11
PDF
2011 agu-town hall-100g
PDF
Rdma presentation-kisti-v2
PDF
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
PDF
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Experiences with High-bandwidth Networks
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
Balman stork cw09
Available technologies: algorithm for flexible bandwidth reservations for dat...
Berkeley lab team develops flexible reservation algorithm for advance network...
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Cybertools stork-2009-cybertools allhandmeeting-poster
Presentation summerstudent 2009-aug09-lbl-summer
Balman dissertation Copyright @ 2010 Mehmet Balman
Aug17presentation.v2 2009-aug09-lblc sseminar
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
Opening ndm2012 sc12
Sc10 nov16th-flex res-presentation
Welcome ndm11
2011 agu-town hall-100g
Rdma presentation-kisti-v2
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
HPDC 2012 presentation - June 19, 2012 - Delft, The Netherlands

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mushroom cultivation and it's methods.pdf
PPT
Teaching material agriculture food technology
PDF
Getting Started with Data Integration: FME Form 101
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
TLE Review Electricity (Electricity).pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Approach and Philosophy of On baking technology
PPTX
Tartificialntelligence_presentation.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
1. Introduction to Computer Programming.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Mushroom cultivation and it's methods.pdf
Teaching material agriculture food technology
Getting Started with Data Integration: FME Form 101
Diabetes mellitus diagnosis method based random forest with bat algorithm
TLE Review Electricity (Electricity).pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A Presentation on Artificial Intelligence
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Approach and Philosophy of On baking technology
Tartificialntelligence_presentation.pptx
OMC Textile Division Presentation 2021.pptx
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Programs and apps: productivity, graphics, security and other tools
cloud_computing_Infrastucture_as_cloud_p
A comparative analysis of optical character recognition models for extracting...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
1. Introduction to Computer Programming.pptx

Pdcs2010 balman-presentation

  • 1. Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets Alex Sim, Mehmet Balman, Dean Williams, Arie Shoshani, Vijaya Natarajan Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid
  • 2. ESG (Earth System Grid) Supports the infrastructure for climate research Provide technology to access, distributed, transport, catalog climate simulation data Production since 2004 ESG-I (1999-2001) ESG-II (2002-2006) ESG-CET(2006-present) ANL, LANL, LBNL, LLNL, NCAR, ORNL, NERSC, …
  • 3. NCAR CCSM ESG portal 237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL) 965,551 files Includes the past 7 years of joint DOE/NSF climate modeling experiments LLNL CMIP-3 (IPCC AR4) ESG portal 35 TB of data at one location 83,337 files model data from 13 countries Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change (IPCC) Over 565 scientific peer-review publications ESG (Earth System Grid) / Climate Simulation Data
  • 4. ESG (Earth System Grid) / Climate Simulation Data ESG web portals distributed worldwide Over 2,700 sites 120 countries 25,000 users Over 1 PB downloaded
  • 5. Climate Data : ever-increasing sizes Early 1990’s (e.g., AMIP1, PMIP, CMIP1) modest collection of monthly mean 2D files: ~1 GB Late 1990’s (e.g., AMIP2) large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB 2000’s (e.g., IPCC/CMIP3) fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB 2011: The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be .3 PB
  • 6. ESG (Earth System Grid) / Climate Simulation Data Results from the Parallel Climate Model (PCM) depicting wind vectors, surface pressure, sea surface temperature, and sea ice concentration. Prepared from data published in the ESG using the FERRET analysis tool by Gary Strand, NCAR. Massive data collections: ● shared by thousands of researchers ● distributed among many data nodes around the world Replication of published core collection (for scalability and availability)
  • 7. Bulk Data Movement in ESG ● Move terabytes to petabytes (many thousands of files) ● Extreme variance in file sizes ● Reliability and Robustness ● Asynchronous long-lasting operation ● Recovery from transient failures and automatic restart ● Support for checksum verification ● On-demand transfer request status ● Estimation of request completion time
  • 8. Replication Use-case Compute Nodes Compute Nodes Data Flow Data Nodes Gateways Compute Nodes Clients Data Nodes Data Node Japan Gateway Germany Gateway Data Node Data Nodes Data Node Data Node Data Nodes Data Nodes Data Nodes Australia Gateway NCAR Gateway LLNL Gateway ORNL Gateway Canada Gateway LBNL Gateway UK Gateway Compute Nodes Compute Nodes WAN Distributed File System . . . : : : : : : BeStMan . . . Data Node Load Balancing Module Parallelism / Concurrency Module File Replication Module Transfer Servers . . . . . . . . . . . . . . . Hot File Replications Transfer Servers Transfer Servers Data Flow
  • 9. Faster Data Transfers End-to-end bulk data transfer (latency wall)  TCP based solutions  Fast TCP, Scalable TCP etc  UDP based solutions  RBUDP, UDT etc  Most of these solutions require kernel level changes  Not preferred by most domain scientists
  • 10. Application Level Tuning  Take an application-level transfer protocol (i.e. GridFTP) and tune-up for better performance:  Using Multiple (Parallel) streams  Tuning Buffer size (efficient utilization of available network capacity) Level of Parallelism in End-to-end Data Transfer  number of parallel data streams connected to a data transfer service for increasing the utilization of network bandwidth  number of concurrent data transfer operations that are initiated at the same time for better utilization of system resources.
  • 11. Parallel TCP Streams  Instead of a single connection at a time, multiple TCP streams are opened to a single data transfer service in the destination host.  We gain larger bandwidth in TCP especially in a network with less packet loss rate; parallel connections better utilize the TCP buffer available to the data transfer, such that N connections might be N times faster than a single connection  Multiple TCP streams puts extra system overhead
  • 13. Bulk Data Mover (BDM) ● Monitoring and statistics collection ● Support for multiple protocols (GridFTP, HTTP) ● Load balancing between multiple servers ● Data channel connection caching, pipelining ● Parallel TCP stream, concurrent data transfer operations ● Multi-threaded transfer queue management ● Adaptability to the available bandwidth
  • 14. Bulk Data Mover (BDM) DBFileFile File … FileFileFileFile File File File … WAN Network Connection and Concurrency Manager Transfer Queue Network Connections Transfer Control Manager Transfer Queue Monitor and Manager Concurrent Connections … DB/Storage Queue Manager Local Storage … File File … File File … File File …
  • 15. Transfer Queue Management * Plots generated from NetLogger time time timetime The number of concurrent transfers on the left column shows consistent over time in well-managed transfers shown at the bottom row, compared to the ill or non-managed data connections shown at the top row. It leads to the higher overall throughput performance on the lower- right column.
  • 17. Bulk Data Mover (BDM) ● number parallel TCP streams? ● number of concurrent data transfer operations? ● Adaptability to the available bandwidth
  • 18. Parallel Stream vs Concurrent Transfer 16x232x1 4x8 • Same number of total streams, but different number of concurrent connections
  • 19. Parameter Estimation  Can we predict this behavior?  Yes, we can come up with a good estimation for the parallelism level  Network statistics  Extra measurement  Historical data
  • 20. Parallel Stream Optimization single stream, theoretical calculation of throughput based on MSS, RTT and packet loss rate: n streams gains as much as total throughput of n single stream: (not correct) A better model: a relation is established between RTT, p and the number of streams n:
  • 21. Parameter Estimation  Might not reflect the best possible current settings (Dynamic Environment)  What if network condition changes?  Requires three sample transfers (curve fitting)  need to probe the system and make measurements with external profilers  Does require a complex model for parameter optimization
  • 22. Adaptive Tuning  Instead of predictive sampling, use data from actual transfer  transfer data by chunks (partial transfers) and also set control parameters on the fly.  measure throughput for every transferred data chunk  gradually increase the number of parallel streams till it comes to an equilibrium point
  • 23. Adaptive Tuning  No need to probe the system and make measurements with external profilers  Does not require any complex model for parameter optimization  Adapts to changing environment But, overhead in changing parallelism level Fast start (exponentially increase the number of parallel streams)
  • 24. Adaptive Tuning  Start with single stream (n=1)  Measure instant throughput for every data chunk transferred (fast start)  Increase the number of parallel streams (n=n*2),  transfer the data chunk  measure instant throughput  If current throughput value is better than previous one, continue  Otherwise, set n to the old value and gradually increase parallelism level (n=n+1)  If no throughput gain by increasing number of streams (found the equilibrium point)  Increase chunk size (delay measurement period)
  • 28. Parallel Streams (estimate starting point)
  • 29. Log-log scale Can we predict this behavior?
  • 30. Power-law model Achievable throughput in percentage over the number of streams with low/medium/high RTT; (a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e) RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT) Power-law model T = (n / c) (RTT / k) 80-20 rule-Pareto dist. 0.8 = (n / c) (RTT / k) n = (e ( k * ln 0.8 / RTT ) ) ∙ c
  • 31. Extend power-law model: Unlike other models in the literature (trying to find an approximation model for the multiple streams and throughput relationship), this model only focuses on the initial behavior of the transfer performance. When RTT is low, the achievable throughput starts high with the low number of streams and quickly approaches to the optimal throughput. When RTT is high, more number of streams is needed for higher achievable throughput. Get prepared to next-generation networks: -100Gbps -RDMA Future Work
  • 32. The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid Bulk Data Mover http://sdm.lbl.gov/bdm/ Earth System Grid http://www.earthsystemgrid.org http://esg-pcmdi.llnl.gov/ Support emails esg-support@earthsystemgrid.org srm@lbl.gov