SlideShare a Scribd company logo
GROMACS 4.6 Pre-Beta
    and 4.6 Beta
Benefits of GPU Accelerated Computing
     Faster than CPU only systems in all tests

     Large performance boost with marginal price increase

     Energy usage cut by more than half

     GPUs scale well within a node and over multiple nodes

     K20 GPU is our fastest and lowest power high performance GPU yet

       Try GPU accelerated GROMACS for free – www.nvidia.com/GPUTestDrive
2
Great Scaling in Small Systems
                    25.00
                                                                                               Running GROMACS 4.6 pre-beta with CUDA 4.1
                                                                            21.68
                                                                                               Each blue node contains 1x Intel X5550 CPU
                    20.00                                               3.2x                   (95W TDP, 4 Cores per CPU)

                                                                 3.2x                          Each green node contains 1x Intel X5550 CPU
Nanoseconds / Day




                                                                                               (95W TDP, 4 Cores per CPU) and 1x NVIDIA
                    15.00                                                                      M2090 (225W TDP per GPU)
                                                         13.01

                                                                                    CPU Only
                    10.00                            3.6x                           With GPU
                                       8.36
                                              3.6x

                     5.00
                            3.7x
                                                                                                   Benchmark systems: RNAse in water
                                                                                                   with 16,816 atoms in truncated
                                                                                                   dodecahedron box
                     0.00
                                   1                 2                  3
                                              Number of Nodes



                     Get up to 3.7x performance compared to CPU-only nodes
Additional Strong Scaling on Larger System
                                          128K Water Molecules
                    160                                                               Running GROMACS 4.6 pre-beta with CUDA 4.1

                                                                                      Each blue node contains 1x Intel X5670 (95W
                    140
                                                                                      TDP, 6 Cores per CPU)

                    120                                                               Each green node contains 1x Intel X5670 (95W
                                                                      2x              TDP, 6 Cores per CPU) and 1x NVIDIA M2070
Nanoseconds / Day




                    100                                                               (225W TDP per GPU)

                     80
                                                                           CPU Only
                     60                                                    With GPU

                                                        2.8x
                     40

                     20
                              3.1x
                      0
                          8          16            32          64   128
                                             Number of Nodes



Up to 128 nodes, NVIDIA GPU-accelerated nodes deliver 2-3x performance
                   when compared to CPU-only nodes
Replace 3 Nodes with 2 GPUs
                                                                            Running GROMACS 4.6 pre-beta with CUDA 4.1
                ADH in Water (134K Atoms)
                                                                            The blue node contains 2x Intel X5550 CPUs
9                                                           4 CPU Nodes
                                                                     9000   (95W TDP, 4 Cores, $1000 per CPU)
                  8.36
                                   $8,000
8                                                                    8000   The green node contains 2x Intel X5550 CPUs
                                                                            (95W TDP, 4 Cores, $1000 per CPU) and 2x
7      6.7                                                           7000
                                                   $6,500                   NVIDIA M2090s as the GPU (225W TDP, $2000
                                                                            per GPU)
6                                                                    6000

5                                                                    5000

4                                                                    4000

3                                                                    3000

2                                                                    2000

1                                                                    1000

0                                                                    0
       Nanoseconds/Day                      Cost



      Save thousands of dollars and perform 25% faster
Greener Science
                                                      ADH in Water (134K Atoms)
                                                                                                      Running GROMACS 4.6 with CUDA 4.1
                                        12000
                                                                                                      The blue nodes contain 2x Intel X5550 CPUs
Energy Expended (KiloJoules Consumed)




                                                                                                      (95W TDP, 4 Cores per CPU)
                                        10000
                                                                                                      The green node contains 2x Intel X5550 CPUs,
                                                                        Lower is better               4 Cores per CPU) and 2x NVIDIA M2090s GPUs
                                        8000                                                          (225W TDP per GPU)


                                        6000



                                        4000                                                                  Energy Expended
                                                                                                              = Power x Time
                                        2000



                                            0
                                                        4 Nodes                   1 Node + 2x M2090
                                                      (760 Watts)                    (640 Watts)




                                         In simulating each nanosecond, the GPU-accelerated system uses 33% less energy
The Power of Kepler
                RNase Solvated Protein 24k Atoms
140

                                                                              Running GROMACS version 4.6 beta
120
                                                                              The grey nodes contain 1 or 2 E5-2687W CPUs
                                                                              (150W each, 8 Cores per CPU) and 1 or 2
100                                                                           NVIDIA M2090s.

                                                                              The green nodes contain 1 or 2 E5-2687W
 80                                                                           CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
                                                                      M2090   K20X GPUs (235W each).
 60                                                                   K20X


 40


 20


  0
      1 CPU + 1 GPU   1 CPU + 2 GPU   2 CPU + 1 GPU   2 CPU + 2 GPU



 Upgrading an M2090 to a K20X increases performance 10-45%
                                                                                      Ribonuclease
K20X – Fast
                                 RNase Solvated Protein 24k Atoms
                    120

                                                                                          Running GROMACS version 4.6 beta
                    100
                                                                                          The blue nodes contain 1 or 2 E5-2687W CPUs
                                                                                          (150W each, 8 Cores per CPU).
                     80
Nanoseconds / Day




                                                                                          The green nodes contain 1 or 2 E5-2687W
                                                                                          CPUs (8 Cores per CPU) and 1 or 2 NVIDIA
                                                                                          K20X GPUs (235W each).
                     60                                                     CPU Only
                                                                            With 1 K20X

                     40



                     20



                      0
                                   1 CPU                   2 CPUs




                          Adding a K20X increases performance by up to 3x
                                                                                                  Ribonuclease
K20X, the Fastest Yet
                                      192K Water Molecules
                    16

                                                                                 Running GROMACS version 4.6-beta2 and
                    14                                                           CUDA 5.0.35

                    12                                                           The blue node contains 2 E5-2687W CPUs
                                                                                 (150W each, 8 Cores per CPU).
Nanoseconds / Day




                    10                                                           The green nodes contain 2 E5-2687W CPUs (8
                                                                                 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs
                     8                                                           (235W each).

                     6


                     4


                     2


                     0
                               CPU              CPU + K20X       CPU + 2x K20X



                         Using K20X nodes increases performance by 2.5x
                                                                                              Water
Recommended GPU Node Configuration for
        GROMACS Computational Chemistry
                      Workstation or Single Node Configuration
             # of CPU sockets                                      2
           Cores per CPU socket                                   6+
             CPU speed (Ghz)                                    2.66+
      System memory per socket (GB)                               32
                                                        Kepler K10, K20, K20X
                  GPUs
                                                     Fermi M2090, M2075, C2075
                                                                   1x
                                       Kepler-based GPUs (K20X, K20 or K10): need fast Sandy
         # of GPUs per CPU socket
                                       Bridge or perhaps the very fastest Westmeres, or high-end
                                                            AMD Opterons
       GPU memory preference (GB)                                  6
          GPU to CPU connection                           PCIe 2.0 or higher
              Server storage                               500 GB or higher

           Network configuration                          Gemini, InfiniBand

10   Scale to multiple nodes with same single node configuration
GPU Test Drive
     Experience GPU Acceleration
     For Computational Chemistry
     Researchers, Biophysicists

     Preconfigured with Molecular
     Dynamics Apps

     Remotely Hosted GPU Servers

     Free & Easy – Sign up, Log in and
     See Results

     www.nvidia.com/gputestdrive
11

More Related Content

PPTX
GPU Accelerated Computational Chemistry Applications
PPTX
AMBER Molecular Dynamics on GPU
PPTX
GPU Computing In Higher Education And Research
PPTX
LAMMPS Molecular Dynamics on GPU
PDF
MSI N480GTX Lightning Infokit
 
PPTX
iMinds The Conference: Jan Lemeire
PDF
Mateo valero p2
PDF
Ron perrot
GPU Accelerated Computational Chemistry Applications
AMBER Molecular Dynamics on GPU
GPU Computing In Higher Education And Research
LAMMPS Molecular Dynamics on GPU
MSI N480GTX Lightning Infokit
 
iMinds The Conference: Jan Lemeire
Mateo valero p2
Ron perrot

What's hot (18)

PDF
Mateo valero p1
PPTX
Pcs hd7850_sales_kit
PPTX
Hd7950 sales kit
PDF
MSI X79 OC Guide
PPT
Insist On DrMOS v1.0
PPTX
Turbo duo hd7790 sales kit
PDF
Vigor Ex
PDF
HPCMPUG2011 cray tutorial
PDF
ICDE2010 Nb-GCLOCK
PDF
VMware - EMC vs NetApp
PDF
CloudStackユーザ会〜仮想ルータの謎に迫る
PPTX
PowerColor PCS+ Vortex II sales kit
PDF
Core I7
PPTX
16 August 2012 - SWUG - Hyper-V in Windows 2012
PDF
54603 vsp vs300_fl5_ccah
PDF
Power7 facts and features 17 aug
PDF
Sun fire x2100 m2 and x2200 m2 technical presentation
PDF
Hyper v.nu-windows serverhyperv-networkingevolved
Mateo valero p1
Pcs hd7850_sales_kit
Hd7950 sales kit
MSI X79 OC Guide
Insist On DrMOS v1.0
Turbo duo hd7790 sales kit
Vigor Ex
HPCMPUG2011 cray tutorial
ICDE2010 Nb-GCLOCK
VMware - EMC vs NetApp
CloudStackユーザ会〜仮想ルータの謎に迫る
PowerColor PCS+ Vortex II sales kit
Core I7
16 August 2012 - SWUG - Hyper-V in Windows 2012
54603 vsp vs300_fl5_ccah
Power7 facts and features 17 aug
Sun fire x2100 m2 and x2200 m2 technical presentation
Hyper v.nu-windows serverhyperv-networkingevolved
Ad

Viewers also liked (14)

PDF
HPC Best Practices: Application Performance Optimization
PPT
Bio Linux
PDF
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
PDF
Gromacs Tutorial
PDF
Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...
PDF
Introduction to Electron Correlation
PPT
Force field analysis april2011
PPTX
Example of force fields
PPTX
Gromacs on Science Gateway
PPT
Force Field Analysis by Slideshop
PPTX
Force field analysis
PDF
Molecular dynamics and Simulations
PPTX
Jak napisać CV, które zapewni Ci pracę? 9 wskazówek
PDF
Force Field Analysis
HPC Best Practices: Application Performance Optimization
Bio Linux
Thomas_Lipscomb_Maximin3_Thesis_approvedShorterAbstractSingleSpace
Gromacs Tutorial
Michelle Groman, "Hot Topics at the Presidential Commission for the Study of ...
Introduction to Electron Correlation
Force field analysis april2011
Example of force fields
Gromacs on Science Gateway
Force Field Analysis by Slideshop
Force field analysis
Molecular dynamics and Simulations
Jak napisać CV, które zapewni Ci pracę? 9 wskazówek
Force Field Analysis
Ad

Similar to GROMACS Molecular Dynamics on GPU (20)

PPTX
NAMD Molecular Dynamics on GPU
PDF
Exaflop In 2018 Hardware
PDF
Accelerating Scientific Discovery V1
PDF
Nvidia Cuda Apps Jun27 11
PDF
Hp city extern
PDF
Symposium on HPC Applications – IIT Kanpur
PDF
Deep Learning Computer Build
PDF
Cuda tutorial
PDF
Gpu Systems
PDF
Amd accelerated computing -ufrj
PDF
Hadoop on a personal supercomputer
PPT
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
PDF
Fordele ved POWER7 og AIX, IBM Power Event
PDF
PG-Strom
PDF
計算力学シミュレーションに GPU は役立つのか?
PPTX
Top500 List June 2012
PDF
Smart camera-currera-g-ximea
PDF
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
PDF
Ga techsusthpc patterson
PDF
NVIDIA GeForece Spec for personal CUDA environment, 27/Aug/2011
NAMD Molecular Dynamics on GPU
Exaflop In 2018 Hardware
Accelerating Scientific Discovery V1
Nvidia Cuda Apps Jun27 11
Hp city extern
Symposium on HPC Applications – IIT Kanpur
Deep Learning Computer Build
Cuda tutorial
Gpu Systems
Amd accelerated computing -ufrj
Hadoop on a personal supercomputer
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
Fordele ved POWER7 og AIX, IBM Power Event
PG-Strom
計算力学シミュレーションに GPU は役立つのか?
Top500 List June 2012
Smart camera-currera-g-ximea
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Ga techsusthpc patterson
NVIDIA GeForece Spec for personal CUDA environment, 27/Aug/2011

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
sap open course for s4hana steps from ECC to s4
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
sap open course for s4hana steps from ECC to s4
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.

GROMACS Molecular Dynamics on GPU

  • 1. GROMACS 4.6 Pre-Beta and 4.6 Beta
  • 2. Benefits of GPU Accelerated Computing Faster than CPU only systems in all tests Large performance boost with marginal price increase Energy usage cut by more than half GPUs scale well within a node and over multiple nodes K20 GPU is our fastest and lowest power high performance GPU yet Try GPU accelerated GROMACS for free – www.nvidia.com/GPUTestDrive 2
  • 3. Great Scaling in Small Systems 25.00 Running GROMACS 4.6 pre-beta with CUDA 4.1 21.68 Each blue node contains 1x Intel X5550 CPU 20.00 3.2x (95W TDP, 4 Cores per CPU) 3.2x Each green node contains 1x Intel X5550 CPU Nanoseconds / Day (95W TDP, 4 Cores per CPU) and 1x NVIDIA 15.00 M2090 (225W TDP per GPU) 13.01 CPU Only 10.00 3.6x With GPU 8.36 3.6x 5.00 3.7x Benchmark systems: RNAse in water with 16,816 atoms in truncated dodecahedron box 0.00 1 2 3 Number of Nodes Get up to 3.7x performance compared to CPU-only nodes
  • 4. Additional Strong Scaling on Larger System 128K Water Molecules 160 Running GROMACS 4.6 pre-beta with CUDA 4.1 Each blue node contains 1x Intel X5670 (95W 140 TDP, 6 Cores per CPU) 120 Each green node contains 1x Intel X5670 (95W 2x TDP, 6 Cores per CPU) and 1x NVIDIA M2070 Nanoseconds / Day 100 (225W TDP per GPU) 80 CPU Only 60 With GPU 2.8x 40 20 3.1x 0 8 16 32 64 128 Number of Nodes Up to 128 nodes, NVIDIA GPU-accelerated nodes deliver 2-3x performance when compared to CPU-only nodes
  • 5. Replace 3 Nodes with 2 GPUs Running GROMACS 4.6 pre-beta with CUDA 4.1 ADH in Water (134K Atoms) The blue node contains 2x Intel X5550 CPUs 9 4 CPU Nodes 9000 (95W TDP, 4 Cores, $1000 per CPU) 8.36 $8,000 8 8000 The green node contains 2x Intel X5550 CPUs (95W TDP, 4 Cores, $1000 per CPU) and 2x 7 6.7 7000 $6,500 NVIDIA M2090s as the GPU (225W TDP, $2000 per GPU) 6 6000 5 5000 4 4000 3 3000 2 2000 1 1000 0 0 Nanoseconds/Day Cost Save thousands of dollars and perform 25% faster
  • 6. Greener Science ADH in Water (134K Atoms) Running GROMACS 4.6 with CUDA 4.1 12000 The blue nodes contain 2x Intel X5550 CPUs Energy Expended (KiloJoules Consumed) (95W TDP, 4 Cores per CPU) 10000 The green node contains 2x Intel X5550 CPUs, Lower is better 4 Cores per CPU) and 2x NVIDIA M2090s GPUs 8000 (225W TDP per GPU) 6000 4000 Energy Expended = Power x Time 2000 0 4 Nodes 1 Node + 2x M2090 (760 Watts) (640 Watts) In simulating each nanosecond, the GPU-accelerated system uses 33% less energy
  • 7. The Power of Kepler RNase Solvated Protein 24k Atoms 140 Running GROMACS version 4.6 beta 120 The grey nodes contain 1 or 2 E5-2687W CPUs (150W each, 8 Cores per CPU) and 1 or 2 100 NVIDIA M2090s. The green nodes contain 1 or 2 E5-2687W 80 CPUs (8 Cores per CPU) and 1 or 2 NVIDIA M2090 K20X GPUs (235W each). 60 K20X 40 20 0 1 CPU + 1 GPU 1 CPU + 2 GPU 2 CPU + 1 GPU 2 CPU + 2 GPU Upgrading an M2090 to a K20X increases performance 10-45% Ribonuclease
  • 8. K20X – Fast RNase Solvated Protein 24k Atoms 120 Running GROMACS version 4.6 beta 100 The blue nodes contain 1 or 2 E5-2687W CPUs (150W each, 8 Cores per CPU). 80 Nanoseconds / Day The green nodes contain 1 or 2 E5-2687W CPUs (8 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs (235W each). 60 CPU Only With 1 K20X 40 20 0 1 CPU 2 CPUs Adding a K20X increases performance by up to 3x Ribonuclease
  • 9. K20X, the Fastest Yet 192K Water Molecules 16 Running GROMACS version 4.6-beta2 and 14 CUDA 5.0.35 12 The blue node contains 2 E5-2687W CPUs (150W each, 8 Cores per CPU). Nanoseconds / Day 10 The green nodes contain 2 E5-2687W CPUs (8 Cores per CPU) and 1 or 2 NVIDIA K20X GPUs 8 (235W each). 6 4 2 0 CPU CPU + K20X CPU + 2x K20X Using K20X nodes increases performance by 2.5x Water
  • 10. Recommended GPU Node Configuration for GROMACS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 Kepler K10, K20, K20X GPUs Fermi M2090, M2075, C2075 1x Kepler-based GPUs (K20X, K20 or K10): need fast Sandy # of GPUs per CPU socket Bridge or perhaps the very fastest Westmeres, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 2.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand 10 Scale to multiple nodes with same single node configuration
  • 11. GPU Test Drive Experience GPU Acceleration For Computational Chemistry Researchers, Biophysicists Preconfigured with Molecular Dynamics Apps Remotely Hosted GPU Servers Free & Easy – Sign up, Log in and See Results www.nvidia.com/gputestdrive 11

Editor's Notes

  • #4: Nodes CPU only gpu1 2.26 8.362 3.58 13.014 6.7 21.68
  • #5: Nodes CPU GPU86.61320.3351611.28237.01632 23.06763.8766442.28496.62812872.694 144.424
  • #6: nanoseconds/day8 X5550 6.72M2090+2X5550 8.36CPU Node: 4 X 2 X $1000 = $8000CPU + GPU Node: 1 X 2 X $1000 + 2 X $2000 = $6000
  • #7: GPU: 640 (watts) * 10,334 (seconds/nanosecond) = 6.6 MegaJoulesCPU: 760 (watts) * 12,895 (seconds/nanosecond) = 9.8 MegaJoules
  • #12: Before we end this session I would like to tell you about GPU Test Drive. It is an excellent resource for computational chemistry researchers such as yourself to evaluate benefits of GPU computing in speeding up your simulations. Most importantly it is free.NVIDIA along with its partners is offering access to remotely hosted GPU cluster. You can run applications such as AMBER and NAMD to find out how your models speed up. You can also try code that you have developed to run on GPU and see how it scales on a 8 GPU cluster. All you need to do is sign up and log in – it is really that easy! We have several partners who are demonstrating the GPU Test Drive on the GTC show floor. Please plan on visiting them.Sign up forms have been given out. If you are interested please fill them out and return them to me.