GPU Computing In Higher Education And Research

ACCELERATE RESEARCH
NVIDIA TESLA

Lift the Barriers of HPC
Faster / Maximum Greater Budget &
More Research Performance Power Efficiencies

Faster, More Discovery, More Performance More Performance
Higher Accuracy per dollar per watt

GPU Impact to Computational Research

More
Research + Maximum
Performance + Efficient
Power

88ns/day, 6x Faster 318% Higher Performance 2.5x Flops / Watt
54% Added Cost Tianhe-1A: CPU + GPU
JAC simulation time
23,558 Atoms DHFR AMBER 11 Jaguar: CPU only
CPU: Dual socket Intel Xeon
Axel Kohlmeyer: Temple University Tianhe-1A: #2 Top500; Jaguar: #3 Top500
X5670, 2.93 GHz (12 cores)

GPU Computing by Numbers

60 583
Universities Universities

150K 1.5M
CUDA Downloads CUDA Downloads

4,000 22,500
Academic Papers Academic Papers

1 52
Supercomputer Supercomputers

2008 2012

UCLA
Department of Physics and Astronomy
Challenge
Accelerate Plasma Research with innovative Particle-in-Cell (PIC) Simulations
Overcome space and power constraints in data centers
Integrate into shared computing strategy across institutes and centers at UCLA

Solution
GPU cluster
96 server nodes
288 NVIDIA Tesla GPUs
Upgraded GPUs to NVIDIA Tesla M2090s (from M2070)
Impact
Upgrades resulted in 20% higher performance with same power cost
GPUs extended to new groups within department for greatly accelerated modeling
Solves faster performance requirements within limited space and power constraints
#235 on prestigious Top500 list with only 6 Racks

Add GPUs: Accelerate Science Applications

CPU GPU

207 GPU-Accelerated Applications
www.nvidia.com/appscatalog

3 Ways to Accelerate Applications

Applications

OpenACC Programming
Libraries
Directives Languages
“Drop-in” Easily Accelerate Maximum
Acceleration Applications Flexibility

THRUST C
BLAS, LAPACK C++
FFT PGI Accelerator Fortran
NPP CAPS HMPP OpenCL
Sparse CRAY DirectCompute
Imaging Java
RNG Python

GPU-Accelerated MATLAB Results

10x speedup in data clustering via K- 14x speedup in template matching routine 3x speedup in estimating 7.6 million
means clustering algorithm (part of cancer cell image analysis) contract prices using Black-Scholes model

17x speedup in simulating the movement 4x speedup in adaptive filtering routine 4x speedup in wave equation solving (part
of 3072 celestial objects (part of acoustic tracking algorithm) of seismic data processing algorithm)

AMBER 12 - Extreme Performance with K20
DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1
SPFP with CUDA 4.2.9 ECC Off
120

The blue node contains 2x Intel E5-2687W CPUs
95.59 (8 Cores per CPU)
100

Each green node contains 2x Intel E5-2687W
CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU
Nanoseconds / Day

80

60

40

20 12.47

0
1 Node 1 Node
DHFR

Gain > 7.5X throughput/performance by adding just 2 K20 GPUs
when compared to dual CPU performance

NAMD 2.9
Outstanding Strong Scaling with Multi-STMV Running NAMD version 2.9
Each blue XE6 CPU node contains 1x AMD
100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU).
1.2

Fermi XK6 Each green XK6 CPU+GPU node contains
1x AMD 1600 Opteron (16 Cores per CPU)
1 and an additional 1x NVIDIA X2090 GPU.
CPU XK6
2.7x
Nanoseconds / Day

0.8

2.9x
0.6

0.4

0.2
3.6x
3.8x Concatenation of 100
0 Satellite Tobacco Mosaic Virus
32 64 128 256 512 640 768
# of Nodes

Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers

Try NVIDIA GPUs

Available Applications Applications Catalog
www.nvidia.com/appscatalog

Quick Application Acceleration OpenACC Directives
www.nvidia.com/gpudirectives

Easy & Free GPU Test Drive GPU Test Drive Cluster
www.nvidia.com/gputestdrive

GPU Computing In Higher Education And Research

More Related Content

What's hot (16)

Similar to GPU Computing In Higher Education And Research (20)

Recently uploaded (20)

GPU Computing In Higher Education And Research

Editor's Notes