SlideShare a Scribd company logo
High Performance Parallel Computing with Clouds and Cloud TechnologiesCloudComp 09Munich, Germany11,2Jaliya Ekanayake,    Geoffrey Fox{jekanaya,gcf}@indiana.eduSchool of Informatics and ComputingPervasive Technology InstituteIndiana University Bloomington12
Acknowledgements to:Joe Rinkovsky and Jenett Tillotson at IU UITSSALSA Team - Pervasive Technology Institution, Indiana UniversityScott BeasonXiaohong QiuThilina Gunarathne
Computing in CloudsEucalyptus(Open source)Commercial CloudsPrivate CloudsAmazon EC23TeraNimbusGoGridXenSome Benefits:On demand allocation of resources (pay per use)Customizable Virtual Machine (VM)s Any software configurationRoot/administrative privilegesProvisioning happens in minutes Compared to hours in traditional job queuesBetter resource utilizationNo need to allocated a whole 24 core machine to perform a single threaded R analysisAccessibility to a computation power is no longer a barrier.
Cloud Technologies/Parallel RuntimesCloud technologiesE.g. Apache Hadoop (MapReduce)Microsoft DryadLINQ MapReduce++ (earlier known as CGL-MapReduce)Moving computation to dataDistributed file systems (HDFS, GFS)Better quality of service (QoS) supportSimple communication topologiesMost HPC applications use MPIVariety of communication topologiesTypically use fast (or dedicated) network settings
Applications & Different Interconnection PatternsInputmapiterationsInputInputmapmapOutputPijreducereduceDomain of MapReduce and Iterative ExtensionsMPI
MapReduce++ (earlier known as CGL-MapReduce)In memory MapReduceStreaming based communicationAvoids file based communication mechanismsCacheable map/reduce tasksStatic data remains in memoryCombine phase to combine reductionsExtends the MapReduce programming model to iterative MapReduce applications
What I will present nextOur experience in applying cloud technologies to:EST (Expressed Sequence Tag) sequence assembly program -CAP3.HEP Processing large columns of physics data using ROOTK-means ClusteringMatrix MultiplicationPerformance analysis of MPI applications using a private cloud environment
Cluster ConfigurationsDryadLINQHadoop / MPI/ Eucalyptus
Pleasingly Parallel ApplicationsHigh Energy PhysicsCAP3Performance of CAP3Performance of HEP
Iterative ComputationsK-meansMatrix MultiplicationPerformance of K-Means Parallel Overhead  Matrix Multiplication
Performance analysis of MPI applications using a private cloud environmentEucalyptus and Xen based private cloud infrastructure Eucalyptus version 1.4 and Xen version 3.0.3Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memoryAll nodes are connected via a 1 giga-bit connectionsBare-metal and VMs use exactly the same software configurationsRed Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version 1.3.2 with gcc version 4.1.2.
Different Hardware/VM configurationsInvariant used in selecting the number of MPI processesNumber of MPI processes = Number of CPU cores used
MPI ApplicationsnnnC1nndn1d1
Matrix MultiplicationPerformance -  64 CPU coresSpeedup – Fixed matrix size (5184x5184)Implements Cannon’s Algorithm [1]Exchange large messagesMore susceptible to bandwidth than latencyAt least 14% reduction in speedup between bare-metal and 1-VM per node[1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November 12 - 17, 1989). Supercomputing '89. ACM, New York, NY, 326-332. DOI= http://doi.acm.org/10.1145/76263.76298
Kmeans ClusteringPerformance – 128 CPU coresOverhead = (P * T(P) –T(1))/T(1)Up to 40 million 3D data pointsAmount of communication depends only on the number of cluster centersAmount of communication  << Computation and the amount of data processedAt the highest granularity VMs show at least ~33%  of total overheadExtremely large overheads for smaller grain sizes
Concurrent Wave Equation Solver Overhead = (P * T(P) –T(1))/T(1)Performance -  64 CPU coresClear difference in performance and overheads between VMs and bare-metalVery small messages (the message size in each MPI_Sendrecv() call is only 8 bytes)More susceptible to latencyAt 40560 data points, at least ~37% of total overhead in VMs
Higher latencies -11-VM per node 8 MPI processes inside the VM8-VMs per node 1 MPI process inside each VMdomUs (VMs that run on top of Xenpara-virtualization) are not capable of performing I/O operationsdom0 (privileged OS) schedules and execute I/O operations on behalf of domUsMore VMs per node => more scheduling => higher latencies
Higher latencies -2Kmeans ClusteringLack of support for in-node communication => “Sequentializing” parallel communicationBetter support for in-node communication in OpenMPIsm BTL (shared memory byte transfer layer)Both OpenMPI and LAM-MPI perform equally well in 8-VMs per node configuration
Conclusions and Future WorksCloud technologies works for most pleasingly parallel applicationsRuntimes such as MapReduce++ extends MapReduce to iterative MapReduce domainMPI applications experience moderate to high performance degradation (10% ~ 40%) in private cloudDr. Edward walker noticed  (40% ~ 1000%) performance degradations in commercial clouds [1]Applications sensitive to latencies experience higher overheadsBandwidth does not seem to be an issue in private cloudsMore VMs per node => Higher overheadsIn-node communication support is crucialApplications such as MapReduce may perform well on VMs ?[1] Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing, http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf
Questions?
Thank You!

More Related Content

PPTX
Scalable Parallel Computing on Clouds
PPTX
Architecture and Performance of Runtime Environments for Data Intensive Scala...
PPTX
Applications of paralleL processing
DOCX
Introduction to parallel computing
PDF
Balman climate-c sc-ads-2011
PPTX
Parallel Processing
PPTX
Applications of PARALLEL PROCESSING
PPT
Parallel Computing
Scalable Parallel Computing on Clouds
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Applications of paralleL processing
Introduction to parallel computing
Balman climate-c sc-ads-2011
Parallel Processing
Applications of PARALLEL PROCESSING
Parallel Computing

What's hot (20)

PPTX
Parallel processing
PPT
Parallel Computing
PPT
Parallel Computing 2007: Bring your own parallel application
PDF
Lecture 1 introduction to parallel and distributed computing
PDF
Chapter 1 - introduction - parallel computing
PDF
Solution(1)
PPTX
Application of Parallel Processing
PPTX
HPC with Clouds and Cloud Technologies
PPTX
Patterns For Parallel Computing
PPTX
Introduction to Parallel Computing
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
DOCX
INTRODUCTION TO PARALLEL PROCESSING
PPT
Nbvtalkatjntuvizianagaram
PPT
Migration To Multi Core - Parallel Programming Models
PDF
IRJET- Latin Square Computation of Order-3 using Open CL
PPTX
Introduction to Parallel and Distributed Computing
PPT
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
PPTX
Communication costs in parallel machines
PDF
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
PDF
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
Parallel processing
Parallel Computing
Parallel Computing 2007: Bring your own parallel application
Lecture 1 introduction to parallel and distributed computing
Chapter 1 - introduction - parallel computing
Solution(1)
Application of Parallel Processing
HPC with Clouds and Cloud Technologies
Patterns For Parallel Computing
Introduction to Parallel Computing
Accelerating Real Time Applications on Heterogeneous Platforms
INTRODUCTION TO PARALLEL PROCESSING
Nbvtalkatjntuvizianagaram
Migration To Multi Core - Parallel Programming Models
IRJET- Latin Square Computation of Order-3 using Open CL
Introduction to Parallel and Distributed Computing
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Communication costs in parallel machines
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
Ad

Viewers also liked (20)

PPTX
Comparing Big Data and Simulation Applications and Implications for Software ...
PDF
Geoff Rothman Presentation on Parallel Processing
PDF
R workshop xx -- Parallel Computing with R
PDF
Genetic Approach to Parallel Scheduling
PDF
Bi criteria scheduling on parallel machines under fuzzy processing time
PDF
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
PDF
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
PDF
Nephele efficient parallel data processing in the cloud
PPTX
Full introduction to_parallel_computing
PPTX
Cloud Computing
PDF
Parallel and Distributed Computing: BOINC Grid Implementation Paper
PDF
Parallel Computing with R
PPTX
Parallel computing in india
PDF
MapReduce in Cloud Computing
PPTX
network ram parallel computing
PPTX
Task scheduling Survey in Cloud Computing
PDF
Application of MapReduce in Cloud Computing
PPTX
cloud scheduling
PPT
Cloud Computing Ppt
ODP
Distributed Computing
Comparing Big Data and Simulation Applications and Implications for Software ...
Geoff Rothman Presentation on Parallel Processing
R workshop xx -- Parallel Computing with R
Genetic Approach to Parallel Scheduling
Bi criteria scheduling on parallel machines under fuzzy processing time
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
Nephele efficient parallel data processing in the cloud
Full introduction to_parallel_computing
Cloud Computing
Parallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel Computing with R
Parallel computing in india
MapReduce in Cloud Computing
network ram parallel computing
Task scheduling Survey in Cloud Computing
Application of MapReduce in Cloud Computing
cloud scheduling
Cloud Computing Ppt
Distributed Computing
Ad

Similar to High Performance Parallel Computing with Clouds and Cloud Technologies (20)

PPTX
Slide 1
PPTX
Slide 1
PDF
A Strategic Evaluation of Energy-Consumption and Total Execution Time for Clo...
PDF
Distributed Services Scheduling and Cloud Provisioning
PDF
High Performance Cloud Computing
PDF
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
DOCX
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Dynamic resource allocation using virtu...
PDF
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
PPTX
Introduction to Cloud Data Center and Network Issues
PPTX
Private cloud infrastructure configure and deploy 24 hiapc fabrizio volpe
PDF
Introduction To Apache Mesos
ODP
VMware vSphere 5.1 Overview
PPTX
Cloud Computing
PPTX
Cloud infrastructure, Virtualization tec
PDF
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
PDF
Classification of Virtualization Environment for Cloud Computing
PDF
Datacenter Computing with Apache Mesos - BigData DC
PPT
Design and implementation of a reliable and cost-effective cloud computing in...
Slide 1
Slide 1
A Strategic Evaluation of Energy-Consumption and Total Execution Time for Clo...
Distributed Services Scheduling and Cloud Provisioning
High Performance Cloud Computing
2 Sessione - Macchine virtuali per la scalabilità di calcolo per velocizzare ...
Dynamic resource allocation using virtual machines for cloud computing enviro...
JAVA 2013 IEEE CLOUDCOMPUTING PROJECT Dynamic resource allocation using virtu...
Strata SC 2014: Apache Mesos as an SDK for Building Distributed Frameworks
Introduction to Cloud Data Center and Network Issues
Private cloud infrastructure configure and deploy 24 hiapc fabrizio volpe
Introduction To Apache Mesos
VMware vSphere 5.1 Overview
Cloud Computing
Cloud infrastructure, Virtualization tec
Datacenter Computing with Apache Mesos - シリコンバレー日本人駐在員Meetup
Classification of Virtualization Environment for Cloud Computing
Datacenter Computing with Apache Mesos - BigData DC
Design and implementation of a reliable and cost-effective cloud computing in...

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Dropbox Q2 2025 Financial Results & Investor Presentation

High Performance Parallel Computing with Clouds and Cloud Technologies

  • 1. High Performance Parallel Computing with Clouds and Cloud TechnologiesCloudComp 09Munich, Germany11,2Jaliya Ekanayake, Geoffrey Fox{jekanaya,gcf}@indiana.eduSchool of Informatics and ComputingPervasive Technology InstituteIndiana University Bloomington12
  • 2. Acknowledgements to:Joe Rinkovsky and Jenett Tillotson at IU UITSSALSA Team - Pervasive Technology Institution, Indiana UniversityScott BeasonXiaohong QiuThilina Gunarathne
  • 3. Computing in CloudsEucalyptus(Open source)Commercial CloudsPrivate CloudsAmazon EC23TeraNimbusGoGridXenSome Benefits:On demand allocation of resources (pay per use)Customizable Virtual Machine (VM)s Any software configurationRoot/administrative privilegesProvisioning happens in minutes Compared to hours in traditional job queuesBetter resource utilizationNo need to allocated a whole 24 core machine to perform a single threaded R analysisAccessibility to a computation power is no longer a barrier.
  • 4. Cloud Technologies/Parallel RuntimesCloud technologiesE.g. Apache Hadoop (MapReduce)Microsoft DryadLINQ MapReduce++ (earlier known as CGL-MapReduce)Moving computation to dataDistributed file systems (HDFS, GFS)Better quality of service (QoS) supportSimple communication topologiesMost HPC applications use MPIVariety of communication topologiesTypically use fast (or dedicated) network settings
  • 5. Applications & Different Interconnection PatternsInputmapiterationsInputInputmapmapOutputPijreducereduceDomain of MapReduce and Iterative ExtensionsMPI
  • 6. MapReduce++ (earlier known as CGL-MapReduce)In memory MapReduceStreaming based communicationAvoids file based communication mechanismsCacheable map/reduce tasksStatic data remains in memoryCombine phase to combine reductionsExtends the MapReduce programming model to iterative MapReduce applications
  • 7. What I will present nextOur experience in applying cloud technologies to:EST (Expressed Sequence Tag) sequence assembly program -CAP3.HEP Processing large columns of physics data using ROOTK-means ClusteringMatrix MultiplicationPerformance analysis of MPI applications using a private cloud environment
  • 9. Pleasingly Parallel ApplicationsHigh Energy PhysicsCAP3Performance of CAP3Performance of HEP
  • 10. Iterative ComputationsK-meansMatrix MultiplicationPerformance of K-Means Parallel Overhead Matrix Multiplication
  • 11. Performance analysis of MPI applications using a private cloud environmentEucalyptus and Xen based private cloud infrastructure Eucalyptus version 1.4 and Xen version 3.0.3Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memoryAll nodes are connected via a 1 giga-bit connectionsBare-metal and VMs use exactly the same software configurationsRed Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version 1.3.2 with gcc version 4.1.2.
  • 12. Different Hardware/VM configurationsInvariant used in selecting the number of MPI processesNumber of MPI processes = Number of CPU cores used
  • 14. Matrix MultiplicationPerformance - 64 CPU coresSpeedup – Fixed matrix size (5184x5184)Implements Cannon’s Algorithm [1]Exchange large messagesMore susceptible to bandwidth than latencyAt least 14% reduction in speedup between bare-metal and 1-VM per node[1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November 12 - 17, 1989). Supercomputing '89. ACM, New York, NY, 326-332. DOI= http://doi.acm.org/10.1145/76263.76298
  • 15. Kmeans ClusteringPerformance – 128 CPU coresOverhead = (P * T(P) –T(1))/T(1)Up to 40 million 3D data pointsAmount of communication depends only on the number of cluster centersAmount of communication << Computation and the amount of data processedAt the highest granularity VMs show at least ~33% of total overheadExtremely large overheads for smaller grain sizes
  • 16. Concurrent Wave Equation Solver Overhead = (P * T(P) –T(1))/T(1)Performance - 64 CPU coresClear difference in performance and overheads between VMs and bare-metalVery small messages (the message size in each MPI_Sendrecv() call is only 8 bytes)More susceptible to latencyAt 40560 data points, at least ~37% of total overhead in VMs
  • 17. Higher latencies -11-VM per node 8 MPI processes inside the VM8-VMs per node 1 MPI process inside each VMdomUs (VMs that run on top of Xenpara-virtualization) are not capable of performing I/O operationsdom0 (privileged OS) schedules and execute I/O operations on behalf of domUsMore VMs per node => more scheduling => higher latencies
  • 18. Higher latencies -2Kmeans ClusteringLack of support for in-node communication => “Sequentializing” parallel communicationBetter support for in-node communication in OpenMPIsm BTL (shared memory byte transfer layer)Both OpenMPI and LAM-MPI perform equally well in 8-VMs per node configuration
  • 19. Conclusions and Future WorksCloud technologies works for most pleasingly parallel applicationsRuntimes such as MapReduce++ extends MapReduce to iterative MapReduce domainMPI applications experience moderate to high performance degradation (10% ~ 40%) in private cloudDr. Edward walker noticed (40% ~ 1000%) performance degradations in commercial clouds [1]Applications sensitive to latencies experience higher overheadsBandwidth does not seem to be an issue in private cloudsMore VMs per node => Higher overheadsIn-node communication support is crucialApplications such as MapReduce may perform well on VMs ?[1] Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing, http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf