ACCELERATE RESEARCH
NVIDIA TESLA
Lift the Barriers of HPC
     Faster /              Maximum           Greater Budget &
  More Research           Performance        Power Efficiencies

Faster, More Discovery,   More Performance     More Performance
   Higher Accuracy           per dollar            per watt
GPU Impact to Computational Research

          More
        Research                    +     Maximum
                                         Performance                  +              Efficient
                                                                                      Power




88ns/day, 6x Faster                 318% Higher Performance                    2.5x Flops / Watt
                                        54% Added Cost                        Tianhe-1A: CPU + GPU
  JAC simulation time
  23,558 Atoms DHFR                           AMBER 11                          Jaguar: CPU only
                                        CPU: Dual socket Intel Xeon
Axel Kohlmeyer: Temple University                                         Tianhe-1A: #2 Top500; Jaguar: #3 Top500
                                        X5670, 2.93 GHz (12 cores)
GPU Computing by Numbers

          60                                           583
   Universities                                        Universities


       150K                                    1.5M
CUDA Downloads                                 CUDA Downloads



      4,000                  22,500
Academic Papers              Academic Papers



                  1                                                   52
   Supercomputer                                       Supercomputers


             2008     2012
UCLA
Department of Physics and Astronomy
Challenge
   Accelerate Plasma Research with innovative Particle-in-Cell (PIC) Simulations
   Overcome space and power constraints in data centers
   Integrate into shared computing strategy across institutes and centers at UCLA

Solution
    GPU cluster
       96 server nodes
       288 NVIDIA Tesla GPUs
    Upgraded GPUs to NVIDIA Tesla M2090s (from M2070)
Impact
   Upgrades resulted in 20% higher performance with same power cost
   GPUs extended to new groups within department for greatly accelerated modeling
   Solves faster performance requirements within limited space and power constraints
   #235 on prestigious Top500 list with only 6 Racks
Add GPUs: Accelerate Science Applications

       CPU                 GPU
207 GPU-Accelerated Applications
              www.nvidia.com/appscatalog
3 Ways to Accelerate Applications

                 Applications

                 OpenACC               Programming
Libraries
                 Directives             Languages
 “Drop-in”       Easily Accelerate       Maximum
Acceleration       Applications          Flexibility

    THRUST                                       C
  BLAS, LAPACK                                  C++
      FFT            PGI Accelerator          Fortran
      NPP             CAPS HMPP               OpenCL
     Sparse               CRAY            DirectCompute
    Imaging                                    Java
      RNG                                     Python
GPU-Accelerated MATLAB Results




 10x speedup in data clustering via K-   14x speedup in template matching routine      3x speedup in estimating 7.6 million
      means clustering algorithm            (part of cancer cell image analysis)     contract prices using Black-Scholes model




17x speedup in simulating the movement    4x speedup in adaptive filtering routine   4x speedup in wave equation solving (part
        of 3072 celestial objects           (part of acoustic tracking algorithm)      of seismic data processing algorithm)
AMBER 12 - Extreme Performance with K20
                                       DHRF JAC 23K Atoms (NVE)                          Running AMBER 12 GPU Support Revision 12.1
                                                                                         SPFP with CUDA 4.2.9 ECC Off
                    120

                                                                                         The blue node contains 2x Intel E5-2687W CPUs
                                                                     95.59               (8 Cores per CPU)
                    100

                                                                                         Each green node contains 2x Intel E5-2687W
                                                                                         CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU
Nanoseconds / Day




                     80


                     60


                     40


                     20                 12.47


                      0
                                       1 Node                       1 Node
                                                                                                         DHFR

                          Gain > 7.5X throughput/performance by adding just 2 K20 GPUs
                                     when compared to dual CPU performance
NAMD 2.9
                    Outstanding Strong Scaling with Multi-STMV                              Running NAMD version 2.9
                                                                                            Each blue XE6 CPU node contains 1x AMD
                                     100 STMV on Hundreds of Nodes                          1600 Opteron (16 Cores per CPU).
                    1.2

                                  Fermi XK6                                                 Each green XK6 CPU+GPU node contains
                                                                                            1x AMD 1600 Opteron (16 Cores per CPU)
                     1                                                                      and an additional 1x NVIDIA X2090 GPU.
                                  CPU XK6
                                                                                     2.7x
Nanoseconds / Day




                    0.8

                                                                      2.9x
                    0.6



                    0.4



                    0.2
                                                3.6x
                          3.8x                                                                       Concatenation of 100
                     0                                                                           Satellite Tobacco Mosaic Virus
                             32      64       128          256      512      640   768
                                                       # of Nodes


                    Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
Try NVIDIA GPUs

        Available Applications   Applications Catalog
                                 www.nvidia.com/appscatalog



Quick Application Acceleration   OpenACC Directives
                                 www.nvidia.com/gpudirectives


   Easy & Free GPU Test Drive    GPU Test Drive Cluster
                                 www.nvidia.com/gputestdrive
THANK YOU

More Related Content

PPTX
GPU Accelerated Computational Chemistry Applications
PPTX
GROMACS Molecular Dynamics on GPU
PPTX
AMBER Molecular Dynamics on GPU
PPTX
LAMMPS Molecular Dynamics on GPU
PDF
Mateo valero p2
PDF
Ron perrot
PDF
Mateo valero p1
PPTX
NAMD Molecular Dynamics on GPU
GPU Accelerated Computational Chemistry Applications
GROMACS Molecular Dynamics on GPU
AMBER Molecular Dynamics on GPU
LAMMPS Molecular Dynamics on GPU
Mateo valero p2
Ron perrot
Mateo valero p1
NAMD Molecular Dynamics on GPU

What's hot (16)

PDF
MSI N480GTX Lightning Infokit
 
PDF
Accelerating Scientific Discovery V1
PPTX
iMinds The Conference: Jan Lemeire
PPTX
Top500 List June 2012
PPTX
Kindratenko hpc day 2011 Kiev
PDF
Cybertron pc slayer ii gaming pc (blue)
PDF
How To Train Your Calxeda EnergyCore
PPT
Insist On DrMOS v1.0
PDF
Vigor Ex
PDF
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
PDF
Cuda 6 performance_report
PDF
VMware - EMC vs NetApp
PDF
R&D work on pre exascale HPC systems
PDF
Cuda tutorial
PPT
Vpu technology &gpgpu computing
MSI N480GTX Lightning Infokit
 
Accelerating Scientific Discovery V1
iMinds The Conference: Jan Lemeire
Top500 List June 2012
Kindratenko hpc day 2011 Kiev
Cybertron pc slayer ii gaming pc (blue)
How To Train Your Calxeda EnergyCore
Insist On DrMOS v1.0
Vigor Ex
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Cuda 6 performance_report
VMware - EMC vs NetApp
R&D work on pre exascale HPC systems
Cuda tutorial
Vpu technology &gpgpu computing
Ad

Similar to GPU Computing In Higher Education And Research (20)

PDF
Nvidia Cuda Apps Jun27 11
PDF
Exaflop In 2018 Hardware
PDF
Symposium on HPC Applications – IIT Kanpur
PDF
Hp city extern
PDF
PG-Strom - GPU Accelerated Asyncr
PDF
Example Application of GPU
PPT
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
PPT
Current Trends in HPC
PDF
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
PDF
N A G P A R I S280101
PDF
Tesla @ NVIDIA investor day
PDF
PG-Strom
PDF
Amd accelerated computing -ufrj
PPTX
Critical Issues at Exascale for Algorithm and Software Design
PDF
Icme Stanford 20110507 Final
PDF
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
PDF
Hadoop on a personal supercomputer
PPTX
ISBI MPI Tutorial
PDF
計算力学シミュレーションに GPU は役立つのか?
PDF
Gpu Systems
Nvidia Cuda Apps Jun27 11
Exaflop In 2018 Hardware
Symposium on HPC Applications – IIT Kanpur
Hp city extern
PG-Strom - GPU Accelerated Asyncr
Example Application of GPU
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
Current Trends in HPC
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
N A G P A R I S280101
Tesla @ NVIDIA investor day
PG-Strom
Amd accelerated computing -ufrj
Critical Issues at Exascale for Algorithm and Software Design
Icme Stanford 20110507 Final
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Hadoop on a personal supercomputer
ISBI MPI Tutorial
計算力学シミュレーションに GPU は役立つのか?
Gpu Systems
Ad

Recently uploaded (20)

PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
STKI Israel Market Study 2025 version august
PPTX
Tartificialntelligence_presentation.pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Modernising the Digital Integration Hub
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPT
Geologic Time for studying geology for geologist
Developing a website for English-speaking practice to English as a foreign la...
A novel scalable deep ensemble learning framework for big data classification...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
DP Operators-handbook-extract for the Mautical Institute
Taming the Chaos: How to Turn Unstructured Data into Decisions
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
STKI Israel Market Study 2025 version august
Tartificialntelligence_presentation.pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
sustainability-14-14877-v2.pddhzftheheeeee
Modernising the Digital Integration Hub
Getting started with AI Agents and Multi-Agent Systems
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Web Crawler for Trend Tracking Gen Z Insights.pptx
Benefits of Physical activity for teenagers.pptx
Group 1 Presentation -Planning and Decision Making .pptx
1 - Historical Antecedents, Social Consideration.pdf
observCloud-Native Containerability and monitoring.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
Geologic Time for studying geology for geologist

GPU Computing In Higher Education And Research

  • 2. Lift the Barriers of HPC Faster / Maximum Greater Budget & More Research Performance Power Efficiencies Faster, More Discovery, More Performance More Performance Higher Accuracy per dollar per watt
  • 3. GPU Impact to Computational Research More Research + Maximum Performance + Efficient Power 88ns/day, 6x Faster 318% Higher Performance 2.5x Flops / Watt 54% Added Cost Tianhe-1A: CPU + GPU JAC simulation time 23,558 Atoms DHFR AMBER 11 Jaguar: CPU only CPU: Dual socket Intel Xeon Axel Kohlmeyer: Temple University Tianhe-1A: #2 Top500; Jaguar: #3 Top500 X5670, 2.93 GHz (12 cores)
  • 4. GPU Computing by Numbers 60 583 Universities Universities 150K 1.5M CUDA Downloads CUDA Downloads 4,000 22,500 Academic Papers Academic Papers 1 52 Supercomputer Supercomputers 2008 2012
  • 5. UCLA Department of Physics and Astronomy Challenge Accelerate Plasma Research with innovative Particle-in-Cell (PIC) Simulations Overcome space and power constraints in data centers Integrate into shared computing strategy across institutes and centers at UCLA Solution GPU cluster 96 server nodes 288 NVIDIA Tesla GPUs Upgraded GPUs to NVIDIA Tesla M2090s (from M2070) Impact Upgrades resulted in 20% higher performance with same power cost GPUs extended to new groups within department for greatly accelerated modeling Solves faster performance requirements within limited space and power constraints #235 on prestigious Top500 list with only 6 Racks
  • 6. Add GPUs: Accelerate Science Applications CPU GPU
  • 7. 207 GPU-Accelerated Applications www.nvidia.com/appscatalog
  • 8. 3 Ways to Accelerate Applications Applications OpenACC Programming Libraries Directives Languages “Drop-in” Easily Accelerate Maximum Acceleration Applications Flexibility THRUST C BLAS, LAPACK C++ FFT PGI Accelerator Fortran NPP CAPS HMPP OpenCL Sparse CRAY DirectCompute Imaging Java RNG Python
  • 9. GPU-Accelerated MATLAB Results 10x speedup in data clustering via K- 14x speedup in template matching routine 3x speedup in estimating 7.6 million means clustering algorithm (part of cancer cell image analysis) contract prices using Black-Scholes model 17x speedup in simulating the movement 4x speedup in adaptive filtering routine 4x speedup in wave equation solving (part of 3072 celestial objects (part of acoustic tracking algorithm) of seismic data processing algorithm)
  • 10. AMBER 12 - Extreme Performance with K20 DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 120 The blue node contains 2x Intel E5-2687W CPUs 95.59 (8 Cores per CPU) 100 Each green node contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU Nanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain > 7.5X throughput/performance by adding just 2 K20 GPUs when compared to dual CPU performance
  • 11. NAMD 2.9 Outstanding Strong Scaling with Multi-STMV Running NAMD version 2.9 Each blue XE6 CPU node contains 1x AMD 100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU). 1.2 Fermi XK6 Each green XK6 CPU+GPU node contains 1x AMD 1600 Opteron (16 Cores per CPU) 1 and an additional 1x NVIDIA X2090 GPU. CPU XK6 2.7x Nanoseconds / Day 0.8 2.9x 0.6 0.4 0.2 3.6x 3.8x Concatenation of 100 0 Satellite Tobacco Mosaic Virus 32 64 128 256 512 640 768 # of Nodes Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
  • 12. Try NVIDIA GPUs Available Applications Applications Catalog www.nvidia.com/appscatalog Quick Application Acceleration OpenACC Directives www.nvidia.com/gpudirectives Easy & Free GPU Test Drive GPU Test Drive Cluster www.nvidia.com/gputestdrive

Editor's Notes

  • #2: Welcome, today I am excited to show you how NVIDIA Tesla GPU solutions are having a profound impact on science by breaking new barriers in computing performance. Researchers all over the world have embraced computing as the third pillar of science. Now with Tesla GPU Computing, explosive performance gains are allowing academic researchers to discover new theories, build more robust models and publish more papers.I will share highlights of successful academic institutions and researchers achieving their goals of faster, better science while doing so within academic budget constraints.
  • #3: With the growing need to use computing to achieve new frontiers in science and research, we quickly identified barriers to growing this need. First of all, we need to enable the researchers and scientists to do faster and more discovery with higher amounts of accuracy. We need to also do that with maximum performance per dollar, because we all have budgets. We need to do it in the most efficient manner, whether that be efficiency of power, or even efficiency in space.
  • #4: It’s exciting to show that GPU computing can address all of the most important barriers of delivering game changing ability in computational research.For example: AMBER – a very popular computational chemistry application can allow researchers to see 6x more simulation data per day, achieving 88 nanoseconds in a day, what would take a week to simulate on CPUs alone.Now let’s see how much does that actually cost, well by adding just 50% cost to a system, you are getting over a 300% performance gain.And finally GPUs are very power efficient. The #2 and #3 most powerful supercomputers in the world are a great example. China’s Tianhe-1A, taking the #2 spot, is 2.5x more power efficient than oak ridge’s Jaguar CPU only system.
  • #5: We have certainly reached the inflection point of broad adoption of GPU computing.Over 580 universities are teaching GPU computing as part of their regular curriculum. In fact, this year the Chinese Ministry of education will be requiring 200 of their higher education institutions to make NVIDIA’s CUDA parallel programming part of the curriculum.It’s been a growing trend for more and more government funding being awarded to GPU projects by the NIH, NSF or DOE.Not only large projects, like Oak Ridge’s Titan project which incorporates some 18 thousand GPUs, but also university infrastructure grants and department/research grants to develop GPU computing applications are being regularly awarded.
  • #6: UCLA was faced with many of challenges or barriers of HPC. The challenges they faced were that they needed to accelerate a new innovative Plasma simulation. And they also needed to overcome space and power constraints. So their solution was a cluster with 96 nodes and 288 NVIDIA Tesla GPUs. The impact was considerable. The GPUs resulted in 20% higher performance with the same power cost. Additionally, the GPUs extended to new groups within departments for greater accelerated modeling.So here they were able to offer faster and more performance as well as fitting within a budget they had for both space and power.
  • #8: NVIDIA’s GPU accelerated application footprint is growing exponentially year over year. Computational scientists and developers have realized that the future is in parallel computing.Native GPU acceleration has now made its way into the most widely used and published against scientific applications. This breadth of applications enables each school and department’s domain scientist population, specifically those who aren’t programmers, to reap the benefits of GPU acceleration.
  • #9: Equally important to applications, enabling domain scientists, we have been developing easier and easier approaches to develop your own applications for GPUs.For fastest and easiest approach we have our “drop in” libraries.Many scientific applications make wide use of standard templates or math libraries. NVIDIA makes freely available the most commonly used such as Thrust, a templated library and many math libraries such as BLAS, fft and Sparse matrices.Another extremely non-invasive way to get application acceleration is to apply open ACC directives to your existing application. It takes only a few lines of code to get a 2-10 times speedup in just a matter of days or hours.Finally if you are a developer and need the maximum amount of performance, we support you in your native programming language.
  • #10: Engineers and scientists worldwide rely on MATLAB to accelerate the pace of discovery, innovation, and development in disciplines such as automotive, aerospace, electronics, financial services, biotech, and many other industriesEngineers and scientists are successfully employing GPU technology, to accelerate their discipline-specific calculations. With minimal effort and without extensive knowledge of GPUs, you can now use the promising power of GPUs with MATLAB.
  • #11: (previous script from AMBER 11 benchmarks. Slide showsK20 results)I briefly spoke about AMBER’s price performance in our opening. Now that you see how easy it is for researchers and scientists to benefit from GPU computing with ready to go applications or easy to implement developer approaches such as directives, we should revisit price performance. See again, on a single node when applying 2 GPUs, this will essentially increase the node cost by 50%, we get much more than a 50% performance improvement. In fact, with this application we achieve greater than 300% higher performance making GPUs a clear winning investment.Additional Information on K20 Slide:1 CPU node (dual CPUs) = 12.47 ns/day1 CPU+ GPU node (dual CPUs and GPUs) = 95.59 ns/day
  • #12: NAMD, another extremely popular Molecular Dynamics package, here is showing that it gets up to a 2.7x speedup with GPUs. We’ve benchmarked it with a typical STMV benchmark, which is 1 million atoms. So this is a very large system. But these are the systems and simulation times needed for researchers to make breakthroughs in science. 32 64 128 256 512 640 768s/step GPU XK6 1.2414 0.660887 0.342743 0.199465 0.10837 0.089752 0.0774948s/step CPU XK6 4.62633 2.36707 1.19722 0.609124 0.314745 0.255016 0.209511ns/day Fermi XK6 0.069599 0.13073339 0.252084 0.433159 0.797269 0.962655 1.114913517ns/day CPU XK6 0.018676 0.03650082 0.072167 0.141843 0.274508 0.338802 0.412388848
  • #13: Today more than ever, it’s easier for researchers, scientists and academic institutions to benefit from GPU computing. We have ready-to-go GPU accelerated applications (see the Applications Catalog). We are continuously investing in creating the easiest approaches to quickly accelerating your own applications; OpenACC directives being our latest development.And finally, the GPU Test Drive cluster is the ideal solution to easily test how a particular application accelerates with GPUs. The GPU Test Drive clusteris also pre-configured for easy purchase and installations
  • #14: Thank you for following along.I hope we have proved to you that GPU computing is making extraordinary contributions to science and research.Now is the time to reach your next scientific computing achievements by investing in NVIDIA Tesla GPUs which have worldwide adoption and world class developer support.