SlideShare a Scribd company logo
1
LLNL-PRES-767803
LLNL-PRES-767803
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-
AC52-07NA27344. Lawrence Livermore National Security, LLC
Sierra: Science Unleashed
HPC Advisory Council
Stanford University
Presenter: Rob Neely
Weapons Simulation and Computing Program Coordinator for Computing Environments
Feb 14, 2019
2
LLNL-PRES-767803
Outline
• LLNL’s Mission for Stockpile Stewardship
• Sierra system overview
• Motivation for Sierra
• Application preparation with the “Sierra Center of Excellence”
• Exemplar unclassified NNSA application areas
• Open Science Period Results
• Earthquake modeling with SW4
• Biological modeling of Cancer
• Neutron lifetime
• Extreme-scale RANS modeling
• Future directions: AI/ML in scientific computing
3
LLNL-PRES-767803
LLNL’s Mission Requirements for
Extreme-scale Computing
4
LLNL-PRES-767803
Lawrence Livermore Nat’l Lab (LLNL) is a
multidisciplinary national security laboratory
§ Established in 1952
§ Approximately 7,000 employees
§ 1 square mile, 684 facilities
4
Located in the SF Bay Area
5
LLNL-PRES-767803
Nuclear Security Other National Security
Bi0-Science
and Bio-Engineering
Earth and Atmospheric
Science
High-Energy-Density
Science
Lasers and Optical S&T
Advanced Materials
and Manufacturing
Nuclear, Chemical,
and Isotopic S&T
High-Performance Computing,
Simulation, and Data Science
ScienceandTechnology
6
LLNL-PRES-767803
Nuclear security and the Stockpile Stewardship Program
§ Assess annually the safety, security,
reliability and, effectiveness of the
stockpile
§ Extend life of stockpile warheads;
adapting safety and security features
to evolving requirements
§ Strengthen underpinning science,
technology, and engineering
§ Nuclear nonproliferation and
counterterrorism
The Stockpile Stewardship Program has successfully maintained
the nuclear stockpile without explosive nuclear testing since 1992
7
LLNL-PRES-767803
Life extension programs can improve safety and security while providing no new military
capabilities (consistent with policy)
Legacy System
Improved safety and
security
Still cannot take it on the
track
Remanufacture as close as
possible to original
Remanufacture
approach
Improve safety and
security approach
8
LLNL-PRES-767803
Nuclear security is more than nuclear weapons
Non-proliferation Treaty monitoring Counterterrorism
§ Verificationmonitoring
§ Samplingandanalysis
§ Safetyandsecurityof
special nuclear material
§ Monitoringanddetection
for nuclear test bantreaty
agreements
§ Analysisandcharacterizationof
nuclear tests
§ Forensics
§ Detectionanddiagnostics
§ Incident response
9
LLNL-PRES-767803
Providing simulation capabilities to the end user involves a large,
integrated ecosphere
Development and
debugging tools
PCA
Linux Cluster (128)
PCB
Linux Cluster (88)
ASCI Platform
White
4
Visualization
Servers
DisCom2 WAN
Cisco 8540 Router
SCF Network
4-way striping on jumbo frame GigE
4 OC-12c Fastlane
ATM Encryptors
ASCI Platform
SKY
HPGN
Routers
OC-48c
DisCom2 WAN
(from SNL/NM & LANL)
TBMX
Switch
Frame
1
OC-48c
to SNL/CA
Frame
2
Frame
3
Frame
4
SecureNet
TidalWave
EdgeWater
SecureNet
Router
NES
Encryptor
Fastlane
Encryptor
gw1
gw2
NES
Router
LLNL ATM
Switch
Open
LabNetESnet
ATM Switch
Cisco 8540
ATM switch
Ice
32 VIEWS
nodes
GigE
channel
B451
Area
B113
Area
4
32
4
4
4
2
4
7
8
2
4
4
4
4 LOGIN
nodes
4 LOGIN
nodes
4 LOGIN
nodes
4
subnets
4
subnets
4
subnets
4
subnets
2
2
NFS
4
4
4
NFS
R5
NFS
MTD
NFS
R2
NFS
D2
NFS
E2
22 2
NFS
Maple
NFS
Poplar
22
6
6
Alpha 255
Forest
5
5
5
Forest
Cluster (6)
SC
Cluster (5)
ICF
Cluster (8)
8
8
1
Sun Ultra
Denali
(X
Server)
CLN#2
CLN#1
GigE, External/Internet, std frame
GigE, Internal, std frame
GigE, Internal, jumbo frame
CLN#3
To SCF Server Network
FDDI
Switch
To NSF Servers
D2, E3, R2, R5, MTD
J90
Router
To SCF Server1 Network
To LC Servers
2
Computers
Facilities
Filesystems and Networks
Security access
controls
Visualization
Codes
10
LLNL-PRES-767803
High-performance computing (HPC) is central to sustaining the
nuclear stockpile in the absence of additional nuclear tests
11
LLNL-PRES-767803
What drives super-exascale requirements is 3D high resolution,
enhanced physics calculations done in an uncertainty quantification
regime
11
Advanced	physics	
and	algorithms	(new	
codes?)
Higher	fidelity	
and	resolution	in	
3D	(new	
platforms,	
scalable	codes)
Predictive	simulation	through	
Quantification	of	Uncertainties	
(1000’s	of		managed	simulations	
and	data	analytics)
Our mission demands require
progress along all three axes to
advance our predictive capabilities
12
LLNL-PRES-767803
LLNL has a long history of fielding Top500 systems
Courtesy of Erich Strohmaier (LBL) SC’17 Invited talk
https://www.top500.org/static/media/uploads/presentations/top500_ppt_invited_201711.pdf
Normalized HPL performance
of deployed systems over the
first 50 Top500 lists (25 years)
What is not captured here is the derivative. Like a “moving average”, this will change
over the next decade as non-US/DOE sites dominate the top 3 positions.
13
LLNL-PRES-767803
Sierra System Overview
14
LLNL-PRES-767803
Sierra is a 125 Petaflop peak system based in the Department of
Energy’s Lawrence Livermore National Laboratory supporting
national security mission advancing science in the public interest
15
LLNL-PRES-767803
15
System performance
• Peak performance of 125
petaflops for modeling and
simulation
Each node has
• 2 IBM POWER9 processors
• 4 NVIDIA Tesla V100 GPUs
• 320 GiB of fast memory
• 256 GiB DDR4
• 64 GiB HBM2
• 1.6 TB of NVMe memory
The system includes
• 4,320 nodes
• 2:1 tapered Mellanox EDR InfiniBand
tree topology (50% global bandwidth)
with dual-port HCA per node
• 154 PB IBM Spectrum Scale
file system with 1.54 TB/s R/W
bandwidth
System Overview
16
LLNL-PRES-767803
What makes Sierra so effective?
GPUs: Sierra links more
than 17,000 deep-learning optimized
NVIDIA GPUs with the potential to
deliver exascale-level performance (a
billion-billion calculations per second)
for AI applications.
High-Speed Data Movement: High-speed
Mellanox interconnect and NVLink high-
bandwidth technology built into all of
Sierra’s processors supply the next-
generation information superhighways.
Memory: Sierra’s sizable memory gives
researchers a convenient launching point for
data-intensive tasks, an asset that allows for
greatly improved application performance and
algorithmic accuracy as well as AI training.
CPUs: IBM Power9 processors to rapidly
execute serial code, run storage and I/O
services, and manage data so the compute is
done in the right place.
17
LLNL-PRES-767803
IBM Power9 Processor
• Up to 24 cores
• Sierra’s P9s have 22 cores for yield optimization on
first processors
• PCI-Express 4.0
• Twice as fast as PCIe 3.0
• NVLink 2.0
• Coherent, high-bandwidth links to GPUs
• 14nm FinFET SOI technology
• 8 billion transistors
• Cache
• L1I: 32 KiB per core, 8-way set associative
• L1D: 32KiB per core, 8-way
• L2: 256 KiB per core
• L3: 120 MiB eDRAM, 20-way
18
LLNL-PRES-767803
NVIDIA Volta Details
Tesla V100 for NVLink Tesla V100 for PCIe
Performance with
NVIDIA GPU Boost
DOUBLE-PRECISION
7.8 TeraFLOPS
SINGLE-PRECISION
15.7 TeraFLOPS
DEEP LEARNING
125 TeraFLOPS
DOUBLE-PRECISION
7 TeraFLOPS
SINGLE-PRECISION
14 TeraFLOPS
DEEP LEARNING
112 TeraFLOPS
Interconnect
Bandwidth
Bi-Directional
NVLINK
300GB/s
PCIE
32GB/s
Memory CoWoS
Stacked HBM2
CAPACITY
16GB HBM2
BANDWIDTH
900GB/s
19
LLNL-PRES-767803
Lassen is an unclassified system similar to
Sierra, but smaller in size.
Its peak performance is 20.9 petaflops vs.
Sierra's 125 petaflops peak performance.
Multiple programs and institutional users
will use Lassen for leading-edge science,
such as predictive biology, computer-
aided drug discovery, or exploring novel
materials, on a world-class system.
Lassen System Details (subject to change
on final install)
Lassen
Zone CZ (Unclassified)
Nodes 720
POWER9 Processors per Node 2
GV100 (Volta) GPUs per Node 4
Node Peak (tFLOP/s) 29.1
System Peak (pFLOP/s) 20.9
Node Memory (GiB) 320
System Memory (PiB) 0.225
Interconnect 2x IB EDR
Off-Node Aggregate b/w (GB/s) 45.5
Compute Racks 40
Network and Infrastructure Racks 4
Storage Racks 4
Total Racks 48
Peak Power (MW) ~1.8
20
LLNL-PRES-767803
LLNL‘s Sierra System Installation
21
LLNL-PRES-767803
The Need to Pivot Toward Exascale
22
LLNL-PRES-767803
Advancements in HPC have happened over three major
epochs, and we’re settling on a fourth (heterogeneity)
Each of these eras defines a new common programming model and
transitions are taking longer; DOE/NNSA is entering a fourth era.
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.00E+12
1.00E+13
1.00E+14
1.00E+15
1.00E+16
1.00E+17
1.00E+18
1.00E+19Jan-52
Jun-54
D
ec-56
M
ay-59
N
ov-61
M
ay-64
O
ct-66
A
pr-69
S
ep-71
M
ar-74
A
ug-76
Feb-79
A
ug-81
Jan-84
Jul-86
D
ec-88
Jun-91
N
ov-93
M
ay-96
N
ov-98
A
pr-01
O
ct-03
M
ar-06
S
ep-08
Feb-11
A
ug-13
Jan-16
Jul-18
Jan-21
Jun-23
D
ec-25
MegascaleGigascaleTerascalePetascaleKiloscale
peak
(flops)
Serial Era Vector Era
Blue Pacific
BlueGene/L
White
Roadrunner
Many Core EraDistributed Memory Era
Cielo
ATS-4
Exascale
ATS-5
Sierra
Trinity
Sequoia
Purple
ASCI Red
Red Storm
Crossroads
Transitions are
essential for
progress: between
1952 and 2012
(Sequoia), NNSA saw
a factor of ~1 trillion
improvement in peak
speed
Code
Transition
Period
Serial Era Vector Era Distributed Memory Era
After a decade or
so of uncertainty,
history will most
likely refer to this
next era as the
heterogeneous era
Heterogeneous Era
23
LLNL-PRES-767803
Power
• Unreasonable
operating costs
• Today's tech =
$100M/year –
exceeds capital
investments
Chip /
processor
efficiency
• GPUs /
accelerators
• Simpler cores
• Unreliability at
near-threshold
voltages –
compounded by
scale of systems
On-node
"inscaling"
challenges
• Complex
hardware
• Massive on-
node
concurrency
requirements
• New
programming
and memory
models
Disruptive
Changes
• Code
evolution
• Algorithm
rewrites
• Platform
uncertainty
The drive for more capable high-end computing is driving disruption in
HPC application development
Meanwhile, “Big Data” and AI is driving industry trends and investments
• Social Networking, Machine/Deep Learning, Cloud Computing, …
• Software innovation in data-centric HPC is thriving
• Traditional HPC is facing strong headwinds if it can’t adapt
External
Constraints
Hardware
Designs
Software
Supports
Applications
React
The trickle down…
Assume:
Continued
increased
computational
power is desired
24
LLNL-PRES-767803
Processor trends tell the parallelism story….
Moore’s Law – Alive and well!
Barely hanging on (dynamic clock
management, IPC tricks)
Flat (or even slightly down)
Mostly flat
Exponential growth?
25
LLNL-PRES-767803
The LLNL/ASC procurement strategy relies on
long-lived partnerships with HPC vendors
25
Concept 9 – 10 years Retirement
Pre-
competitive
discussions
with potential
bidders
Machine
Delivery
Machine
Retirement
Request for
Proposals
with target
requirements
Contract
selection
Non-recurring
engineering
contracts
(make system
better, not
bigger)
Establish
representative
evaluation
benchmarks
Early
Delivery
System(s)
Nail down target
requirements as
final deliverables
Establish Center
of Excellence for
Application
Development
Machine
Acceptance
• Long lead times in procurements allow vendor roadmaps to be clear enough to
bid, but still fungible
• Use of target requirements reduces risk to vendor, and thus cost to the Program
• Testbed architectures make up the unofficial 3rd tier of the ASC platform strategy:
evaluation of competing and potential future technologies at smaller scale
~5 years of
effective use
Begin next
procurement
cycle
26
LLNL-PRES-767803
Predicting the life cycle behavior of nuclear warheads requires simulation at a wide
range of spatial and temporal scale
10–8 10–4 0.01 110–10 10–6Distance (m)
Nuclear Physics
Full System
Atomic Physics
Molecular Level
Turbulence
Continuum Models
27
LLNL-PRES-767803
Large, Integrated multi-physics Codes provide the basic simulation
capabilities for broad range of application domains
Average of 60Kloc/year of additional
code to support in a representative IC
Source code
size over time
900 kloc
§ Long life-time projects
§ 15+ years of development by large
teams (10 – 20+ people, ~50% CS &
50% Physicists)
§ Algorithms tuned for minimal turn-
around time based on today’s
hardware!
HE Cookoff
Navy Railguns
Fracture
and
Failure
Blast on Structures
Inertial Confinement Fusion
Additive Manufacturing
28
LLNL-PRES-767803
Sierra Represents a Major Shift For Our Mission-Critical
Applications on Advance Architectures
2004 -
present
• IBM BlueGene, along with copious amounts of
linux-based cluster technology dominate the
HPC landscape at LLNL.
• Apps scale to unprecedented 1.6M MPI ranks,
but are still largely MPI-everywhere on ”lean”
cores
2014
BG/L
BG/P
BG/Q• Heterogeneous GPGPU-based computing begins
to look attractive with NVLINK and Unified
Memory (P9/Volta)
• Apps must undertake a 180 turn from BlueGene
toward heterogeneity and large-memory nodes
2015 –
20??
• Heterogeneous computing is here to stay, and Sierra gives us
a solid leg-up in preparing for exascale and beyond
29
LLNL-PRES-767803
Application enablement through the Center of
Excellence – the original (and enduring) concept
Vendor
Application /
Platform
Expertise
Staff (code
developers)
Applications
and Key
Libraries
Hands-on collaborations
On-site vendor expertise
Customized training
Early delivery platform access
Application bottleneck analysis
Both portable and vendor-specific solutions
Technical areas: algorithms, compilers,
programming models, resilience, tools, …
Codes tuned for
next-gen
platforms
Rapid feedback
to vendors on
early issues
Publications,
early science
Steering
committee
The LLNL/Sierra
CoE is jointly
funded by the ASC
Program and the
LLNL Institution
Improved
Software Tools
and
Environment
Performance
Portability
30
LLNL-PRES-767803
High level goals of the Center of Excellence
Specific strategies for each of these goals were captured in an “Execution Strategy” document
and used to guide our implementation
To train ASC and LLNL institutional application teams in the art and science of developing high
performance software on the Sierra architecture
To have our flagship applications utilizing advanced features of Sierra as soon as the machine
is generally available, and fully optimized (4-6x improvement over Sequoia) for the
architecture within 1-2 years of deployment
To develop performance-portable solutions that allow application teams to focus on
maintaining a single source code that will be effective on platforms deployed at our sister
laboratories
To build strong ties with our vendor partners as a key element of our co-design strategy –
leveraging their deep knowledge of the hardware and algorithmic approaches specific to their
platform, and informing them of our long-term application requirements as they will impact
future machine designs
31
LLNL-PRES-767803
High resolution ICF simulations help us understand yield
degradation mechanisms
Lasers
heat
inside
surface of
hohlraum
Radiation
drives
capsule
inward
DT ignites
producing
fusion
energy
Basics of
Inertial
Confinement
Fusion
32
LLNL-PRES-767803
Chris Clouse
LLNL
Until recently, 3D simulations were often run as a double
check on routinely employed 2D approximations. These 3D
simulations were computationally expensive and, as a
result, were run as sparingly as possible. Design and
analysis in LLNL’s National Ignition Facility (NIF) Program
until today relied primarily on 2D approximations because
3D simulations could not be turned around quickly enough
to make them a useful routine design tool.
Sierra’s architecture, which is expected to bring speed-ups
on the order of 10X for many of our 3D applications, will be
able to process these crucial simulations efficiently,
changing the way users compute by making the use of 3D
routine.
Taking Inertial Confinement Fusion (ICF) simulations to a higher level
“Sierra provided an unprecedented (98 billion cell)
simulation of two-fluid mixing in a spherical geometry.
Understanding hydrodynamic instability and the
transition to turbulence process is important in inertial
confinement fusion and High Energy Density (HED)
Physics.”
—Chris Clouse
33
LLNL-PRES-767803
Rob Rieben
LLNL
To maximize the full potential of new computer
architectures like Sierra’s, we are developing new
algorithms that play to the strengths of the machine—
specifically the high intensity of compute operations
available to data that remain local.
We are also developing next-generation simulation codes
for inertial confinement fusion and nuclear weapons
analysis that employ high-order, compute-intensive
algorithms that maximize the amount of computing done
for each piece of data retrieved from memory. These
schemes are very robust and should significantly improve
the overall analysis workflow for users. These advanced
simulation tools enabled by Sierra will improve throughput
along two axes: faster turnaround and less user
intervention.
Developing the next generation of
algorithms and codes
“The combination of our next generation,
compute intensive, high-order algorithms with
the processing power of Sierra will deliver an
exciting new era in multi-physics simulations.”
—Rob Rieben
34
LLNL-PRES-767803
Early Science Applications
Early Science - That time between when the system is first booted up (~3/1/18) and when it is transitioned to
the classified network for Programmatic use (~1/20/19) when friendly users help “shake down” the system.
Characteristics: Buggy or incomplete software stack, random bad hardware, rebooted on a whim, nursing
problems along through the night, etc…
Benefits… when it’s up, you can sometimes get the whole system to yourself for as long as you can keep things
running. Invaluable feedback to the facility and vendors, and a one-time shot for open science apps
35
LLNL-PRES-767803
Arthur Rodgers
LLNL
The character of strong earthquake shaking near a fault is
highly variable and poorly constrained by empirical data.
Supercomputers enable simulation of earthquake motions
to investigate the hazard and risk to buildings and
infrastructure before damaging events occur. The large
scale (~100 km) and fine detail (~10 m) of high frequency
waves (>5 Hz) from damaging earthquakes require today’s
most powerful computers.
Using SW4-RAJA, a 3D seismic simulation code ported to
GPU hardware, LLNL researchers are able to increase the
resolution of earthquake simulations to span most
frequencies of engineering interest on regional domains
with rapid throughput to enable sampling of various
rupture scenarios and sub-surface models.
Sierra advances resolution of
earthquake simulations
“Sierra’s thousands of GPU-accelerated nodes allow
SW4-RAJA to compute earthquake ground motions
with 100’s of billions of grid points in shorter run
times so we can resolve high-frequency waves and
investigate different rupture scenarios or earth
models.”
—Arthur Rodgers
36
LLNL-PRES-767803
Modeling a 7.0 quake on the hayward fault (SF Bay Area)
Courtesy: Artie Rogers, LLNL
37
LLNL-PRES-767803
Computational Domain and Rupture
• 120 km x 80 km x 35 km
• Centered on:
• San Leandro
• hypocenter (green star)
• Rupture: MW 7.0
• Dimensions: 77 x 13 km
• Draped onto 3D geometry of
the Hayward Fault
38
LLNL-PRES-767803
National Lab HPC resources we run on:
Cori-II at LBNL/NERSC and Sierra at LC/LLNL
Cori Phase-II at LBNL/NERSC
#10 on Top 500
27 PetaFLOPS (peak)
68 Intel Xeon Phi (CPUs) per node
Sierra at LLNL
#3 on Top 500
119 PetaFLOPS (peak)
2 IBM POWER (CPU) per node and
4 NVIDIA GPUs per node
SW4 port to GPU
platforms supported by 3-
year LLNL internal project
Future platforms will rely heavily on GPU’s
39
LLNL-PRES-767803
SW4 (CPU)
fmax = 5.0 Hz on Cori-II
SW4-RAJA (GPU)
fmax = 5.0 Hz on Sierra
vSmin = 500 m/s vSmin = 500 m/s
PPW = 8 PPW = 8
T = 90 s T = 90 s
hmin = 12.5 m hmin = 12.5 m
Npoints = 25.8 billion Npoints = 25.8 billion
Ntime-steps = 63,063 Ntime-steps = 63,063
Checkpointing every 4000 time-steps No checkpointing
Ran on 8192 nodes (85%) of Cori-II
(524,288 cores CPUs)
Ran on 256 nodes (6%) of Sierra
(1024 GPUs)
Solver time 10.3 hours Solver time 10.3 hours
Summer 2018
Hayward Fault MW 7.0 runs
40
LLNL-PRES-767803
Verification of SW4 and SW4-RAJA:
Peak Ground Velocity maps
HF MW7.0 3DTOPO
fmax = 5 Hz
hmin = 12.5 m
Two codes, different platforms PGV maps
agree to machine precision!
Cori-II: PGV = 2.1797349453 m/s
Sierra: PGV = 2.1797349453 m/s
41
LLNL-PRES-767803
Verification of SW4 and SW4-RAJA:
Waveforms agree to single precision (SAC files)
SW4 on Cori-II 5 Hz
SW4-RAJA on Sierra 5 Hz Difference ~1e-7
SW4-RAJA on Sierra 5 Hz
SW4 on Cori-II 5 Hz
overplotted
Two codes, different platforms waveforms
agree to machine precision!
42
LLNL-PRES-767803
Comparison of near-fault accelerations for a
range of resolutions (fmax = maximum frequency)
As frequency content increases,
amplitudes increase and are
more consistent with
observations
Doubling fmax requires 16x
more computational effort {
Sierra run Aug. 2018
fmax = 10.0 Hz
Livermore 37.6 km
43
LLNL-PRES-767803
GMIM’s versus distance with ASK (2014) GMM
HF MW 7.0 10.0 Hz - PGV
44
LLNL-PRES-767803
Perspective on previous regional-scale Hayward Fault runs on
various machines
45
LLNL-PRES-767803
Joint Design of Advanced Computing Solutions for Cancer
A collaboration between DOE and NCI
4
Pilot 2
RAS Biology in
Membranes
Pilot 3
Precision
Oncology
Surveillance
Pilot 1
Pre-clinical
Model
Development
Yvonne Evrard (FNLCR)
Rick Stevens (ANL)
Dwight Nissley (FNLCR)
Fred Streitz (LLNL)
Lynn Penberthy (NCI)
Gina Tourassi (ORNL)
Courtesy: Fred Streitz, Jim Glosli, LLNL
46
LLNL-PRES-767803
Oncogenic RAS is responsible for many human cancers
93% of all pancreatic
42% of all colorectal
33% of all lung cancers
1 million deaths/year world-wide
No effective inhibitors
Pathway transmits
signals
RAS is a switch
oncogenic RAS is “on”
RAS localizes to the
plasma membrane
RAS binds effectors (RAF)
to activate growthSimanshu,Cell 170, 2017
47
LLNL-PRES-767803
Essential strategy: utilize appropriate scale methodology
for each component
Membrane evolves on ms
time frame across µm…
…while protein interactions
involve µs across nm
A problem of length
and time scale:
48
LLNL-PRES-767803
Essential strategy: utilize appropriate scale methodology
for each component
Model membrane
with RAS at micron
(continuum) scale
Model protein behavior
at molecular scale
49
LLNL-PRES-767803
• Use variational autoencoder to
learn reduced order model (ROM)
of system
• Map every region of simulation
into this ROM
• Rank every patch in the
simulation by distance from
others (”uniqueness”) in ROM
• Identify the most unique patches
as the most interesting
From continuum to molecular modeling
1. Identify regions of interest
using AI techniques
50
LLNL-PRES-767803
1. Identify region
From continuum to molecular modeling
• Lipid positions are randomly generated consistent with composition
• Ras proteins, if present, are inserted with appropriate conformation
• System is equilibrated to “smooth edges”
2. Generate particle positions
51
LLNL-PRES-767803
1. Identify region
From continuum to molecular modeling
3. Simulate at molecular resolution
2. Generate particle positions
52
LLNL-PRES-767803
1. Identify region
From continuum to molecular modeling
3. Simulate at molecular resolution
4. Feedback molecular information to
continuum model
2. Generate particle positions
Feedback
53
LLNL-PRES-767803
Preliminary Results
• 50nm X 50nm high-res study w/ 2 RAS
proteins (40 μs)
• Investigate phenomena witnessed
originally in μm X μm scale simulation
• Aggregation/repulsion of charged lipids
(PAP6, PAPS, DIPE, CHOL) following
“collision” of RAS
• Results demonstrate importance of
time and length-scale for simulation
54
LLNL-PRES-767803
❑ The lifetime of a neutron is ~15 min. (~881 sec), and its value has a profound effect on the mass composition of the universe.
❑ These two methods result in measurements of the lifetime that
have a 99% probability of being incompatible - why?
❑ Is there an unaccounted for systematic?
❑ Or more exciting, is there new physics causing them to
disagree?
❑ Which method is consistent with the prediction from the
Standard Model?
❑ Answering this question requires access to leadership class
supercomputers
Bottle experiments trap neutrons in a “bottle” and
measure how many are left after a period of time. If
neutrons also decay to other new unknown particles (?
dark matter ?) results will not agree with the known
theory of physics (QCD).
In the known theory of physics (QCD) neutrons decay to
protons. Beam experiments count how many protons
emerge from a beam of neutrons.
❑ Using Sierra the team simulated the fundamental theory of Quantum
Chromodynamics (QCD) and calculated the lifetime to an
unprecedented theoretical accuracy (blue bar). This preliminary
result has an uncertainty that is 4 times smaller than the next best
result in the field (equivalent to a factor of 16 times more data).
❑ This early science time allowed us to form a concrete plan to further
reduce the uncertainty and achieve a discriminating level of
precision in the theoretical prediction.
The Neutron Lifetime on Sierra Early Science
Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration)
beam
! = 887.3 ± 3.1s
bottle
! = 879.4 ± 0.6s
Courtesy: Pavlos Vranas, LLNL
56
LLNL-PRES-767803
The Neutron Lifetime on Sierra Early Science
Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration)
❑ The vertical gray band denotes the physical value of pion mass - these points are significantly more expensive than the rest to compute
- but the most valuable for the final predictions
❑ The green point in our publication cost as much computing time as all the other points combined
❑ The green point from Sierra has 10x more statistics than our publication
❑ The red point from our publication was not useful
❑ The red point from Sierra came from an entirely new calculation and is now very useful
❑ The blue point from Sierra was entirely unattainable from previous computers (it still needs more statistics to be useful)
Nature 558 (2018) no. 7708, 91-94 Sierra Early Science (preliminary)
Sierra
0.00 0.05 0.10 0.15 0.20 0.25 0.30
✏⇡ = m⇡/(4⇡F⇡)
1.10
1.15
1.20
1.25
1.30
1.35
gA
model average gLQCD
A (✏⇡, a = 0)
gPDG
A = 1.2723(23)
gA(✏⇡, a ' 0.15 fm)
gA(✏⇡, a ' 0.12 fm)
gA(✏⇡, a ' 0.09 fm)
a ' 0.15 fm
a ' 0.12 fm
a ' 0.09 fm
57
LLNL-PRES-767803
Stanford PSAAP Center Results
Extreme-Scale HPC with the Soleil-X Multiphysics Solver
M. Papadakis1, L. Jofre1, G. Iaccarino1 & A. Aiken1
▪ Research Objectives:
- Performance & scalability extreme scales
- Demonstrate potential of Soleil-X
▪ Performance & Scalability Analysis:
- Strong and Weak Scaling
- Cost distribution different physics
- Time-to-solution
- Task-mapping (CPU/GPU) studies
▪ Physics modeling:
- Compressible flow formulation
- Lagrangian particle tracking
- Algebraic & DOM radiation model
▪ Computational approach:
- Written in Regent
- Targets Legion runtime
- GASNet library communication
- Realm for OpenMP
- CUDA toolkit for GPU
1 2 4 8 16 32 64 128 256 512 1024 2048
Number of Nodes
0.0
0.2
0.4
0.6
0.8
1.0
Efficiency
Weak Scaling: Total Solver
Soleil-X
Max. comm. pattern
Runtime bottleneck
Investigating …
Conclusions & Ongoing Work
- At present, Soleil-X is (in average) 1.7 times faster than Soleil-MPI in terms of time-to-solution
- Soleil-MPI presents very good weak scaling efficiency on Titan up to 256 nodes
- Soleil-X presents very good weak scaling efficiency both on Titan and Sierra up to 256 nodes
- Ongoing investigation of the runtime bottleneck observed at 512 nodes
58
LLNL-PRES-767803
§ A high resolution simulation of two-fluid mixing in a
spherical geometry was ran to gain insight into the
growth of a Rayleigh-Taylor instability.
§ High resolution simulations of instability growth are
not practical for routine use, so high resolution
simulations like this help guide the development of
sub-grid models that capture instability effects with
much less computational cost, which are used for
Inertial Confinement Fusion (ICF) calculations.
We ran a massive turbulent fluid mixing simulation on Sierra in
October 2018
In-situ Visualization of
Mixing LayerCourtesy: Cyrus Harrison, LLNL
59
LLNL-PRES-767803
Highlights:
§ The 97.8 billion element simulation ran across 16,384 GPUs on 4,096 Sierra Compute
Nodes
§ The simulation application used CUDA via RAJA to run on the GPUs
§ Time-varying evolution of the mixing was visualized in-situ using Ascent, also
leveraging 16,384 GPUs
§ Ascent leveraged VTK-m to run visualization algorithms on the GPUs
§ The last time step was exported to the parallel file system for detailed post-hoc
visualization using VisIt
This simulation was the first to utilize over 16,000 GPUs on
Sierra
This successful simulation is the result of many years of preparation!
60
LLNL-PRES-767803
§ Running physics algorithms on the GPUs allows us to
scale and turn around massive simulations in a power
efficient way
§ For this simulation, the physics calculations alone
(excluding including setup, I/O, etc) took 60 hours of
compute time on Sierra’s V100 GPUs
§ We estimate using the entire Sequoia system the same
problem would take 30 days of physics compute time
Porting has unlocked impressive performance on Sierra and
positions us well for future architectures
Compared Node to Node to other HPC architectures, we expect Sierra will
provide 6x to 12x faster simulation turn around time
61
LLNL-PRES-767803
Similarly, DOE’s visualization community has invested over many
years to create tools for in-situ and GPU-enabled visualization
2010 2011 2012 2013 2014 2015 2016 2017
EAVL
DAX
PISTON
ISAV15
Strawman
SC16
Performance
Modeling
(Strawman)ASCAC Subcommittee
on Exascale Computing
Strawman
ECP Begins
ISAV17
Ascent
Data Parallel
Visualization
Research
Dist. Mem.
2018
In-situ viz of
97.8 B Element
Simulation on
Sierra using
Ascent
62
LLNL-PRES-767803
Knowing this, we used a mix of in-situ (in-memory) and post-hoc (file-based)
visualization:
§ In-situ: Use used Ascent (https://github.com/alpine-dav/ascent) to visualize the time-
varying evolution of the mixing layer and export the final mesh data to compressed
HDF5 files for post-hoc exploration
—Ascent used VTK-m (http://m.vtk.org/) to run visualization algorithms on the GPUs
§ Post-hoc: We used VisIt (https://visit.llnl.gov) to explore details of final state at the
final simulation state
On Sierra, computation far outpaces our ability to save data to
the parallel file system for detailed post-hoc visualization
In-situ visualization of the time-varying details of the simulation
avoided terabytes of I/O vs post-hoc visualization
63
LLNL-PRES-767803
We ran a massive turbulent fluid mixing simulation on Sierra in
October 2018
Post-hoc Visualization of
Mixing Layer
64
LLNL-PRES-767803
Conclusion: This successful simulation is the result of many years
of preparation in both our simulation and visualization tools
Highlights:
§ The 97.8 billion element simulation ran
across 16,384 GPUs on 4,096 Sierra
Compute Nodes
§ The simulation application used CUDA via
RAJA to run on the GPUs
§ Time-varying evolution of the mixing was
visualized in-situ using Ascent, also
leveraging 16,384 GPUs
§ Ascent leveraged VTK-m to run visualization
algorithms on the GPUs
§ The last time step was exported to the
parallel file system for detailed post-hoc
visualization using VisIt
Links:
§ LLNL's Sierra Supercomputer:
https://computation.llnl.gov/computers/sierra
§ RAJA https://github.com/llnl/raja
§ https://github.com/alpine-dav/ascent
§ http://m.vtk.org/
§ https://visit.llnl.gov
§ https://github.com/llnl/conduit
VisIt
65
LLNL-PRES-767803
Future simulations for complex systems will
be guided by machine learning
Experimental data
Model
predictions,
uncertainties
Complex
design
model The next
simulation?
Simulation
ensemble
pipeline
On-line
machine
learning
Cognitive simulation learns an approximate
model for simulation state dynamics and
output as it runs …
The next
experiment?
We’re developing an integrated roadmap for NNSA and LLNL
in cognitive simulation
66
LLNL-PRES-767803
The Virtuous Circle of HPC Simulation and Machine Learning
AI and machine learning is driven by
large amounts of training data Which gets funneled down to a small amount
of data useful for decision analysis
HPC Simulation starts with a small
amount of data as initial conditions
And generates huge amounts of data
suitable for AI training
67
LLNL-PRES-767803
Cognitive simulation integrates machine
learning with simulation at multiple levels
Learning in the
simulation
loop
Learning on the
simulation loop
Steering the
simulation
loop
T
10
0
T
60
00
T
0
Molecular dynamics systems that decide when
and where to switch resolution
100X longer
simulation time
spans
Simulation systems that can predict
and correct mesh tangling
Greatly reduced
time on mesh
tuning
Learned models integrating
simulation and experiment
Improved
prediction,
Efficient UQ,
New designs
68
LLNL-PRES-767803
Scientific computing challenges differ from industry challenges
• Different type of scaling
• Data augmentation
techniques
• Image classification defined
by fine features
• Laws of physics
Goal is to leverage industry advancements and specialize when needed.
Validation needs to be a part of this!
69
LLNL-PRES-767803
• Results must honor laws conservation (e.g. energy, mass)Physics Constraints
• It is sometimes intractable to generate sufficient training dataSparse Data
• Interpretability of results necessary for predictive scienceExplainability
• Lots of data with no standards for specification or sharingData Collection
Open Research Questions for Machine Learning in Science
These are all critical research areas to pursue if we are to demonstrate the
value of AI techniques in the pursuit of predictive science
70
LLNL-PRES-767803
Intelligence built into HPC simulations will change how we tackle
problems that have overwhelming amounts of data
Higher-fidelity subgrid models run
inline with multi-physics
integrated codes
Exascale Approach
Analysis is performed by knowing
where to look for interesting
features in the data
Performance optimized in
applications to be best for
average case
AI mimics high fidelity
models at a fraction of the
computational cost
AI-based Approach
Smart integrated AI
identifies features of
interest in throngs of data
On-the-fly recompilation
of code with optimal
settings for specific run
Applications run faster, we can
increase number of simulations
for UQ and validation
Expected Benefits
Insights from exascale
simulations aren’t lost in a data
deluge
More effective use of complex
exascale architectures
Intelligent Simulation complements our exascale investments.
Sierra will allow us to research and explore the possiblities
71
LLNL-PRES-767803
§ Multi-scale problems are difficult to model
—Often solved using low-fidelity table lookups
—More accurate models are user-intensive and/or computationally expensive
§ Surrogate models can be trained on high-fidelity physics models
—Expensive calculations may be reduced to function calls
Surrogate models to increase simulation coupling for
multi-scale problems
Ab-initio Atoms Long-time Microstructure Dislocation Crystal Continuum
Inter-atomic
forces, EOS,
excited states
Defects and
interfaces,
nucleation
Defects and
defect structures
Meso-scale multi-
phase, multi-grain
evolution
Meso-scale
strength
Meso-scale
material response
Macro-scale
material response
72
LLNL-PRES-767803
The ATOM partnership is exploring new active
learning approaches to accelerate drug design
Approach: An open public-
private partnership
-Lead with computation
supported and validated by
targeted experiments
-Data-sharing to build models
using everyone’s data
Partners: LLNL, GSK, NCI,
UCSF
Product: An open-source
framework of tools and
capabilities
25 FTE’s in shared Mission
Bay, SF space
R&D started March 2018
73
LLNL-PRES-767803
ADAPD: Advanced Data Analytics for Proliferation
Detection
Indirect Data
New and responsive analytics
capabilities at the Labs
Focused Data
Science R&D
SME process and
physics models
NNSA R&D Testbeds
Establishing the next
generation of proliferation
analytics expertise
• Predictive Models
• MultiModal Detection
• Crossing Facilities and
Scenarios
DOE/NNSA
Partnerships
Student and University
Programs
Focus on Hard
Problems
Leveraging Growing
Industry Investments
Innovation in
low-profile
nuclear
proliferation
detection
74
LLNL-PRES-767803
LLNL’s Data Science Institute (established 2017)
Big Machines. Big Data. Big Ideas.
The Data Science Institute acts as the central hub for
all data science activity at LLNL working to help lead,
build, and strengthen the data science workforce,
research, and outreach to advance the state-of-the-
art of our nation's data science capabilities.
The Data Science Institute acts as the central hub for all data science activity at LLNL,
working to help lead, build, and strengthen the data science workforce, research, and
outreach to advance the state of the art of LLNL’s data science capabilities.
https://data-science.llnl.gov/
75
LLNL-PRES-767803
Data Science
Council
§ Inform S&T
investments
§ Communication
Workforce
Development
§ Reading groups
§ Courses
§ Summer Program
Web Presence
§ External/Internal
website
§ Mailing list
§ Slack
Outreach
§ Academic
engagement
§ Info sessions
§ Recruiting
Seminar
Series/
Workshop
§ Invited speakers
§ UC/LLNL multi-
day workshop
§ Strengthen and sustain LLNL’s data science workforce
§ Enhance internal coordination and communication of the data science
workforce across disciplines and programs
§ Promote and expand the visibility of data science work at LLNL
§ Evolve strategic data science vision and guide S&T investments
Strengthening the LLNL Data Science Workforce
Fostering a sense of community
76
LLNL-PRES-767803
Data Science Summer Institute
The Focus is the Future
1000+ applicants in FY18
12-week flexible internship focused on data science
Students funded 50% to work on solutions to challenge problems,
attend courses and seminars, and strengthen skills.
50 total students in FY17+FY18
§ 26 students in FY18
§ 24 students in FY17
Visiting Faculty
§ James Flegal, UC Riverside
§ Robert Gramacy, Virginia Tech
Challenge Problems
§ Topology Optimization
§ Machine Vision
§ Multimodal Data Exploration
§ Cyber
77
LLNL-PRES-767803
§ The 125 PF Sierra system is currently being stood up for use in the Stockpile
Stewardship mission of the NNSA
§ Preparation for the pivot to heterogenous computing was difficult, but is
demonstrating significant benefits in computational results
§ The Open Science period for Sierra allowed open applications significant allocations to
demonstrate cutting-edge science results
§ LLNL is pursuing an application strategy that will encompass both traditional exascale
computing needs, as well as the burgeoning AI/ML trends.
Conclusions
Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United
States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or
implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or
imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security,
LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government
or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

More Related Content

PDF
The Sierra Supercomputer: Science and Technology on a Mission
PPTX
Introduction to Kafka Cruise Control
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
PDF
Network Programming: Data Plane Development Kit (DPDK)
PDF
Using eBPF for High-Performance Networking in Cilium
PDF
SRv6 Network Programming: deployment use-cases
PDF
LSFMM 2019 BPF Observability
PDF
Spark, ou comment traiter des données à la vitesse de l'éclair
The Sierra Supercomputer: Science and Technology on a Mission
Introduction to Kafka Cruise Control
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Network Programming: Data Plane Development Kit (DPDK)
Using eBPF for High-Performance Networking in Cilium
SRv6 Network Programming: deployment use-cases
LSFMM 2019 BPF Observability
Spark, ou comment traiter des données à la vitesse de l'éclair

What's hot (20)

PPTX
Log analysis using Logstash,ElasticSearch and Kibana
PDF
Architectural Overview of MapR's Apache Hadoop Distribution
PDF
Ceph issue 해결 사례
PDF
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
PDF
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
PDF
Introduction to Hadoop
PDF
Monitoring End User Experience with Endpoint Agent
PDF
AMD EPYC™ Microprocessor Architecture
 
PDF
IMS-PSTN Interworking Flow
PDF
Red Hat OpenShift Container Platform Overview
PDF
Why Splunk Chose Pulsar_Karthik Ramasamy
PDF
Java Performance Analysis on Linux with Flame Graphs
PPTX
Better Place
PDF
Intel DPDK Step by Step instructions
PDF
Container Performance Analysis
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
PPTX
The Power of HPC with Next Generation Supermicro Systems
PDF
MPLS Tutorial
PDF
EBPF and Linux Networking
PDF
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Log analysis using Logstash,ElasticSearch and Kibana
Architectural Overview of MapR's Apache Hadoop Distribution
Ceph issue 해결 사례
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
Introduction to Hadoop
Monitoring End User Experience with Endpoint Agent
AMD EPYC™ Microprocessor Architecture
 
IMS-PSTN Interworking Flow
Red Hat OpenShift Container Platform Overview
Why Splunk Chose Pulsar_Karthik Ramasamy
Java Performance Analysis on Linux with Flame Graphs
Better Place
Intel DPDK Step by Step instructions
Container Performance Analysis
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
The Power of HPC with Next Generation Supermicro Systems
MPLS Tutorial
EBPF and Linux Networking
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Ad

Similar to Sierra Supercomputer: Science Unleashed (20)

PPTX
Sierra overview
PPTX
Panel Presentation - Tom DeFanti with Larry Smarr and Frank Wuerthwein - Naut...
PDF
The U.S. Exascale Computing Project: Status and Plans
PPT
From NCSA to the National Research Platform
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PDF
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
PDF
The Cell at Los Alamos: From Ray Tracing to Roadrunner
PDF
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
PDF
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
PPTX
The Increasing Use of the National Research Platform by the CSU Campuses
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PDF
05 Preparing for Extreme Geterogeneity in HPC
PDF
NSCC Training Introductory Class
PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PDF
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
PPTX
HPC Top 5 Stories: January 12, 2018
PPTX
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
PDF
3 rd International Conference on Signal Processing, VLSI Design & Communicati...
PDF
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Sierra overview
Panel Presentation - Tom DeFanti with Larry Smarr and Frank Wuerthwein - Naut...
The U.S. Exascale Computing Project: Status and Plans
From NCSA to the National Research Platform
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
The Cell at Los Alamos: From Ray Tracing to Roadrunner
Toward an Open and Unified Model for Heterogeneous and Accelerated Multicore ...
MVAPICH: How a Bunch of Buckeyes Crack Tough Nuts
The Increasing Use of the National Research Platform by the CSU Campuses
Toward a National Research Platform to Enable Data-Intensive Computing
05 Preparing for Extreme Geterogeneity in HPC
NSCC Training Introductory Class
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
HPC Top 5 Stories: January 12, 2018
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
3 rd International Conference on Signal Processing, VLSI Design & Communicati...
Cyber Infrastructure as a Service to Empower Multidisciplinary, Data-Driven S...
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced IT Governance
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced IT Governance
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I

Sierra Supercomputer: Science Unleashed

  • 1. 1 LLNL-PRES-767803 LLNL-PRES-767803 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE- AC52-07NA27344. Lawrence Livermore National Security, LLC Sierra: Science Unleashed HPC Advisory Council Stanford University Presenter: Rob Neely Weapons Simulation and Computing Program Coordinator for Computing Environments Feb 14, 2019
  • 2. 2 LLNL-PRES-767803 Outline • LLNL’s Mission for Stockpile Stewardship • Sierra system overview • Motivation for Sierra • Application preparation with the “Sierra Center of Excellence” • Exemplar unclassified NNSA application areas • Open Science Period Results • Earthquake modeling with SW4 • Biological modeling of Cancer • Neutron lifetime • Extreme-scale RANS modeling • Future directions: AI/ML in scientific computing
  • 4. 4 LLNL-PRES-767803 Lawrence Livermore Nat’l Lab (LLNL) is a multidisciplinary national security laboratory § Established in 1952 § Approximately 7,000 employees § 1 square mile, 684 facilities 4 Located in the SF Bay Area
  • 5. 5 LLNL-PRES-767803 Nuclear Security Other National Security Bi0-Science and Bio-Engineering Earth and Atmospheric Science High-Energy-Density Science Lasers and Optical S&T Advanced Materials and Manufacturing Nuclear, Chemical, and Isotopic S&T High-Performance Computing, Simulation, and Data Science ScienceandTechnology
  • 6. 6 LLNL-PRES-767803 Nuclear security and the Stockpile Stewardship Program § Assess annually the safety, security, reliability and, effectiveness of the stockpile § Extend life of stockpile warheads; adapting safety and security features to evolving requirements § Strengthen underpinning science, technology, and engineering § Nuclear nonproliferation and counterterrorism The Stockpile Stewardship Program has successfully maintained the nuclear stockpile without explosive nuclear testing since 1992
  • 7. 7 LLNL-PRES-767803 Life extension programs can improve safety and security while providing no new military capabilities (consistent with policy) Legacy System Improved safety and security Still cannot take it on the track Remanufacture as close as possible to original Remanufacture approach Improve safety and security approach
  • 8. 8 LLNL-PRES-767803 Nuclear security is more than nuclear weapons Non-proliferation Treaty monitoring Counterterrorism § Verificationmonitoring § Samplingandanalysis § Safetyandsecurityof special nuclear material § Monitoringanddetection for nuclear test bantreaty agreements § Analysisandcharacterizationof nuclear tests § Forensics § Detectionanddiagnostics § Incident response
  • 9. 9 LLNL-PRES-767803 Providing simulation capabilities to the end user involves a large, integrated ecosphere Development and debugging tools PCA Linux Cluster (128) PCB Linux Cluster (88) ASCI Platform White 4 Visualization Servers DisCom2 WAN Cisco 8540 Router SCF Network 4-way striping on jumbo frame GigE 4 OC-12c Fastlane ATM Encryptors ASCI Platform SKY HPGN Routers OC-48c DisCom2 WAN (from SNL/NM & LANL) TBMX Switch Frame 1 OC-48c to SNL/CA Frame 2 Frame 3 Frame 4 SecureNet TidalWave EdgeWater SecureNet Router NES Encryptor Fastlane Encryptor gw1 gw2 NES Router LLNL ATM Switch Open LabNetESnet ATM Switch Cisco 8540 ATM switch Ice 32 VIEWS nodes GigE channel B451 Area B113 Area 4 32 4 4 4 2 4 7 8 2 4 4 4 4 LOGIN nodes 4 LOGIN nodes 4 LOGIN nodes 4 subnets 4 subnets 4 subnets 4 subnets 2 2 NFS 4 4 4 NFS R5 NFS MTD NFS R2 NFS D2 NFS E2 22 2 NFS Maple NFS Poplar 22 6 6 Alpha 255 Forest 5 5 5 Forest Cluster (6) SC Cluster (5) ICF Cluster (8) 8 8 1 Sun Ultra Denali (X Server) CLN#2 CLN#1 GigE, External/Internet, std frame GigE, Internal, std frame GigE, Internal, jumbo frame CLN#3 To SCF Server Network FDDI Switch To NSF Servers D2, E3, R2, R5, MTD J90 Router To SCF Server1 Network To LC Servers 2 Computers Facilities Filesystems and Networks Security access controls Visualization Codes
  • 10. 10 LLNL-PRES-767803 High-performance computing (HPC) is central to sustaining the nuclear stockpile in the absence of additional nuclear tests
  • 11. 11 LLNL-PRES-767803 What drives super-exascale requirements is 3D high resolution, enhanced physics calculations done in an uncertainty quantification regime 11 Advanced physics and algorithms (new codes?) Higher fidelity and resolution in 3D (new platforms, scalable codes) Predictive simulation through Quantification of Uncertainties (1000’s of managed simulations and data analytics) Our mission demands require progress along all three axes to advance our predictive capabilities
  • 12. 12 LLNL-PRES-767803 LLNL has a long history of fielding Top500 systems Courtesy of Erich Strohmaier (LBL) SC’17 Invited talk https://www.top500.org/static/media/uploads/presentations/top500_ppt_invited_201711.pdf Normalized HPL performance of deployed systems over the first 50 Top500 lists (25 years) What is not captured here is the derivative. Like a “moving average”, this will change over the next decade as non-US/DOE sites dominate the top 3 positions.
  • 14. 14 LLNL-PRES-767803 Sierra is a 125 Petaflop peak system based in the Department of Energy’s Lawrence Livermore National Laboratory supporting national security mission advancing science in the public interest
  • 15. 15 LLNL-PRES-767803 15 System performance • Peak performance of 125 petaflops for modeling and simulation Each node has • 2 IBM POWER9 processors • 4 NVIDIA Tesla V100 GPUs • 320 GiB of fast memory • 256 GiB DDR4 • 64 GiB HBM2 • 1.6 TB of NVMe memory The system includes • 4,320 nodes • 2:1 tapered Mellanox EDR InfiniBand tree topology (50% global bandwidth) with dual-port HCA per node • 154 PB IBM Spectrum Scale file system with 1.54 TB/s R/W bandwidth System Overview
  • 16. 16 LLNL-PRES-767803 What makes Sierra so effective? GPUs: Sierra links more than 17,000 deep-learning optimized NVIDIA GPUs with the potential to deliver exascale-level performance (a billion-billion calculations per second) for AI applications. High-Speed Data Movement: High-speed Mellanox interconnect and NVLink high- bandwidth technology built into all of Sierra’s processors supply the next- generation information superhighways. Memory: Sierra’s sizable memory gives researchers a convenient launching point for data-intensive tasks, an asset that allows for greatly improved application performance and algorithmic accuracy as well as AI training. CPUs: IBM Power9 processors to rapidly execute serial code, run storage and I/O services, and manage data so the compute is done in the right place.
  • 17. 17 LLNL-PRES-767803 IBM Power9 Processor • Up to 24 cores • Sierra’s P9s have 22 cores for yield optimization on first processors • PCI-Express 4.0 • Twice as fast as PCIe 3.0 • NVLink 2.0 • Coherent, high-bandwidth links to GPUs • 14nm FinFET SOI technology • 8 billion transistors • Cache • L1I: 32 KiB per core, 8-way set associative • L1D: 32KiB per core, 8-way • L2: 256 KiB per core • L3: 120 MiB eDRAM, 20-way
  • 18. 18 LLNL-PRES-767803 NVIDIA Volta Details Tesla V100 for NVLink Tesla V100 for PCIe Performance with NVIDIA GPU Boost DOUBLE-PRECISION 7.8 TeraFLOPS SINGLE-PRECISION 15.7 TeraFLOPS DEEP LEARNING 125 TeraFLOPS DOUBLE-PRECISION 7 TeraFLOPS SINGLE-PRECISION 14 TeraFLOPS DEEP LEARNING 112 TeraFLOPS Interconnect Bandwidth Bi-Directional NVLINK 300GB/s PCIE 32GB/s Memory CoWoS Stacked HBM2 CAPACITY 16GB HBM2 BANDWIDTH 900GB/s
  • 19. 19 LLNL-PRES-767803 Lassen is an unclassified system similar to Sierra, but smaller in size. Its peak performance is 20.9 petaflops vs. Sierra's 125 petaflops peak performance. Multiple programs and institutional users will use Lassen for leading-edge science, such as predictive biology, computer- aided drug discovery, or exploring novel materials, on a world-class system. Lassen System Details (subject to change on final install) Lassen Zone CZ (Unclassified) Nodes 720 POWER9 Processors per Node 2 GV100 (Volta) GPUs per Node 4 Node Peak (tFLOP/s) 29.1 System Peak (pFLOP/s) 20.9 Node Memory (GiB) 320 System Memory (PiB) 0.225 Interconnect 2x IB EDR Off-Node Aggregate b/w (GB/s) 45.5 Compute Racks 40 Network and Infrastructure Racks 4 Storage Racks 4 Total Racks 48 Peak Power (MW) ~1.8
  • 21. 21 LLNL-PRES-767803 The Need to Pivot Toward Exascale
  • 22. 22 LLNL-PRES-767803 Advancements in HPC have happened over three major epochs, and we’re settling on a fourth (heterogeneity) Each of these eras defines a new common programming model and transitions are taking longer; DOE/NNSA is entering a fourth era. 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11 1.00E+12 1.00E+13 1.00E+14 1.00E+15 1.00E+16 1.00E+17 1.00E+18 1.00E+19Jan-52 Jun-54 D ec-56 M ay-59 N ov-61 M ay-64 O ct-66 A pr-69 S ep-71 M ar-74 A ug-76 Feb-79 A ug-81 Jan-84 Jul-86 D ec-88 Jun-91 N ov-93 M ay-96 N ov-98 A pr-01 O ct-03 M ar-06 S ep-08 Feb-11 A ug-13 Jan-16 Jul-18 Jan-21 Jun-23 D ec-25 MegascaleGigascaleTerascalePetascaleKiloscale peak (flops) Serial Era Vector Era Blue Pacific BlueGene/L White Roadrunner Many Core EraDistributed Memory Era Cielo ATS-4 Exascale ATS-5 Sierra Trinity Sequoia Purple ASCI Red Red Storm Crossroads Transitions are essential for progress: between 1952 and 2012 (Sequoia), NNSA saw a factor of ~1 trillion improvement in peak speed Code Transition Period Serial Era Vector Era Distributed Memory Era After a decade or so of uncertainty, history will most likely refer to this next era as the heterogeneous era Heterogeneous Era
  • 23. 23 LLNL-PRES-767803 Power • Unreasonable operating costs • Today's tech = $100M/year – exceeds capital investments Chip / processor efficiency • GPUs / accelerators • Simpler cores • Unreliability at near-threshold voltages – compounded by scale of systems On-node "inscaling" challenges • Complex hardware • Massive on- node concurrency requirements • New programming and memory models Disruptive Changes • Code evolution • Algorithm rewrites • Platform uncertainty The drive for more capable high-end computing is driving disruption in HPC application development Meanwhile, “Big Data” and AI is driving industry trends and investments • Social Networking, Machine/Deep Learning, Cloud Computing, … • Software innovation in data-centric HPC is thriving • Traditional HPC is facing strong headwinds if it can’t adapt External Constraints Hardware Designs Software Supports Applications React The trickle down… Assume: Continued increased computational power is desired
  • 24. 24 LLNL-PRES-767803 Processor trends tell the parallelism story…. Moore’s Law – Alive and well! Barely hanging on (dynamic clock management, IPC tricks) Flat (or even slightly down) Mostly flat Exponential growth?
  • 25. 25 LLNL-PRES-767803 The LLNL/ASC procurement strategy relies on long-lived partnerships with HPC vendors 25 Concept 9 – 10 years Retirement Pre- competitive discussions with potential bidders Machine Delivery Machine Retirement Request for Proposals with target requirements Contract selection Non-recurring engineering contracts (make system better, not bigger) Establish representative evaluation benchmarks Early Delivery System(s) Nail down target requirements as final deliverables Establish Center of Excellence for Application Development Machine Acceptance • Long lead times in procurements allow vendor roadmaps to be clear enough to bid, but still fungible • Use of target requirements reduces risk to vendor, and thus cost to the Program • Testbed architectures make up the unofficial 3rd tier of the ASC platform strategy: evaluation of competing and potential future technologies at smaller scale ~5 years of effective use Begin next procurement cycle
  • 26. 26 LLNL-PRES-767803 Predicting the life cycle behavior of nuclear warheads requires simulation at a wide range of spatial and temporal scale 10–8 10–4 0.01 110–10 10–6Distance (m) Nuclear Physics Full System Atomic Physics Molecular Level Turbulence Continuum Models
  • 27. 27 LLNL-PRES-767803 Large, Integrated multi-physics Codes provide the basic simulation capabilities for broad range of application domains Average of 60Kloc/year of additional code to support in a representative IC Source code size over time 900 kloc § Long life-time projects § 15+ years of development by large teams (10 – 20+ people, ~50% CS & 50% Physicists) § Algorithms tuned for minimal turn- around time based on today’s hardware! HE Cookoff Navy Railguns Fracture and Failure Blast on Structures Inertial Confinement Fusion Additive Manufacturing
  • 28. 28 LLNL-PRES-767803 Sierra Represents a Major Shift For Our Mission-Critical Applications on Advance Architectures 2004 - present • IBM BlueGene, along with copious amounts of linux-based cluster technology dominate the HPC landscape at LLNL. • Apps scale to unprecedented 1.6M MPI ranks, but are still largely MPI-everywhere on ”lean” cores 2014 BG/L BG/P BG/Q• Heterogeneous GPGPU-based computing begins to look attractive with NVLINK and Unified Memory (P9/Volta) • Apps must undertake a 180 turn from BlueGene toward heterogeneity and large-memory nodes 2015 – 20?? • Heterogeneous computing is here to stay, and Sierra gives us a solid leg-up in preparing for exascale and beyond
  • 29. 29 LLNL-PRES-767803 Application enablement through the Center of Excellence – the original (and enduring) concept Vendor Application / Platform Expertise Staff (code developers) Applications and Key Libraries Hands-on collaborations On-site vendor expertise Customized training Early delivery platform access Application bottleneck analysis Both portable and vendor-specific solutions Technical areas: algorithms, compilers, programming models, resilience, tools, … Codes tuned for next-gen platforms Rapid feedback to vendors on early issues Publications, early science Steering committee The LLNL/Sierra CoE is jointly funded by the ASC Program and the LLNL Institution Improved Software Tools and Environment Performance Portability
  • 30. 30 LLNL-PRES-767803 High level goals of the Center of Excellence Specific strategies for each of these goals were captured in an “Execution Strategy” document and used to guide our implementation To train ASC and LLNL institutional application teams in the art and science of developing high performance software on the Sierra architecture To have our flagship applications utilizing advanced features of Sierra as soon as the machine is generally available, and fully optimized (4-6x improvement over Sequoia) for the architecture within 1-2 years of deployment To develop performance-portable solutions that allow application teams to focus on maintaining a single source code that will be effective on platforms deployed at our sister laboratories To build strong ties with our vendor partners as a key element of our co-design strategy – leveraging their deep knowledge of the hardware and algorithmic approaches specific to their platform, and informing them of our long-term application requirements as they will impact future machine designs
  • 31. 31 LLNL-PRES-767803 High resolution ICF simulations help us understand yield degradation mechanisms Lasers heat inside surface of hohlraum Radiation drives capsule inward DT ignites producing fusion energy Basics of Inertial Confinement Fusion
  • 32. 32 LLNL-PRES-767803 Chris Clouse LLNL Until recently, 3D simulations were often run as a double check on routinely employed 2D approximations. These 3D simulations were computationally expensive and, as a result, were run as sparingly as possible. Design and analysis in LLNL’s National Ignition Facility (NIF) Program until today relied primarily on 2D approximations because 3D simulations could not be turned around quickly enough to make them a useful routine design tool. Sierra’s architecture, which is expected to bring speed-ups on the order of 10X for many of our 3D applications, will be able to process these crucial simulations efficiently, changing the way users compute by making the use of 3D routine. Taking Inertial Confinement Fusion (ICF) simulations to a higher level “Sierra provided an unprecedented (98 billion cell) simulation of two-fluid mixing in a spherical geometry. Understanding hydrodynamic instability and the transition to turbulence process is important in inertial confinement fusion and High Energy Density (HED) Physics.” —Chris Clouse
  • 33. 33 LLNL-PRES-767803 Rob Rieben LLNL To maximize the full potential of new computer architectures like Sierra’s, we are developing new algorithms that play to the strengths of the machine— specifically the high intensity of compute operations available to data that remain local. We are also developing next-generation simulation codes for inertial confinement fusion and nuclear weapons analysis that employ high-order, compute-intensive algorithms that maximize the amount of computing done for each piece of data retrieved from memory. These schemes are very robust and should significantly improve the overall analysis workflow for users. These advanced simulation tools enabled by Sierra will improve throughput along two axes: faster turnaround and less user intervention. Developing the next generation of algorithms and codes “The combination of our next generation, compute intensive, high-order algorithms with the processing power of Sierra will deliver an exciting new era in multi-physics simulations.” —Rob Rieben
  • 34. 34 LLNL-PRES-767803 Early Science Applications Early Science - That time between when the system is first booted up (~3/1/18) and when it is transitioned to the classified network for Programmatic use (~1/20/19) when friendly users help “shake down” the system. Characteristics: Buggy or incomplete software stack, random bad hardware, rebooted on a whim, nursing problems along through the night, etc… Benefits… when it’s up, you can sometimes get the whole system to yourself for as long as you can keep things running. Invaluable feedback to the facility and vendors, and a one-time shot for open science apps
  • 35. 35 LLNL-PRES-767803 Arthur Rodgers LLNL The character of strong earthquake shaking near a fault is highly variable and poorly constrained by empirical data. Supercomputers enable simulation of earthquake motions to investigate the hazard and risk to buildings and infrastructure before damaging events occur. The large scale (~100 km) and fine detail (~10 m) of high frequency waves (>5 Hz) from damaging earthquakes require today’s most powerful computers. Using SW4-RAJA, a 3D seismic simulation code ported to GPU hardware, LLNL researchers are able to increase the resolution of earthquake simulations to span most frequencies of engineering interest on regional domains with rapid throughput to enable sampling of various rupture scenarios and sub-surface models. Sierra advances resolution of earthquake simulations “Sierra’s thousands of GPU-accelerated nodes allow SW4-RAJA to compute earthquake ground motions with 100’s of billions of grid points in shorter run times so we can resolve high-frequency waves and investigate different rupture scenarios or earth models.” —Arthur Rodgers
  • 36. 36 LLNL-PRES-767803 Modeling a 7.0 quake on the hayward fault (SF Bay Area) Courtesy: Artie Rogers, LLNL
  • 37. 37 LLNL-PRES-767803 Computational Domain and Rupture • 120 km x 80 km x 35 km • Centered on: • San Leandro • hypocenter (green star) • Rupture: MW 7.0 • Dimensions: 77 x 13 km • Draped onto 3D geometry of the Hayward Fault
  • 38. 38 LLNL-PRES-767803 National Lab HPC resources we run on: Cori-II at LBNL/NERSC and Sierra at LC/LLNL Cori Phase-II at LBNL/NERSC #10 on Top 500 27 PetaFLOPS (peak) 68 Intel Xeon Phi (CPUs) per node Sierra at LLNL #3 on Top 500 119 PetaFLOPS (peak) 2 IBM POWER (CPU) per node and 4 NVIDIA GPUs per node SW4 port to GPU platforms supported by 3- year LLNL internal project Future platforms will rely heavily on GPU’s
  • 39. 39 LLNL-PRES-767803 SW4 (CPU) fmax = 5.0 Hz on Cori-II SW4-RAJA (GPU) fmax = 5.0 Hz on Sierra vSmin = 500 m/s vSmin = 500 m/s PPW = 8 PPW = 8 T = 90 s T = 90 s hmin = 12.5 m hmin = 12.5 m Npoints = 25.8 billion Npoints = 25.8 billion Ntime-steps = 63,063 Ntime-steps = 63,063 Checkpointing every 4000 time-steps No checkpointing Ran on 8192 nodes (85%) of Cori-II (524,288 cores CPUs) Ran on 256 nodes (6%) of Sierra (1024 GPUs) Solver time 10.3 hours Solver time 10.3 hours Summer 2018 Hayward Fault MW 7.0 runs
  • 40. 40 LLNL-PRES-767803 Verification of SW4 and SW4-RAJA: Peak Ground Velocity maps HF MW7.0 3DTOPO fmax = 5 Hz hmin = 12.5 m Two codes, different platforms PGV maps agree to machine precision! Cori-II: PGV = 2.1797349453 m/s Sierra: PGV = 2.1797349453 m/s
  • 41. 41 LLNL-PRES-767803 Verification of SW4 and SW4-RAJA: Waveforms agree to single precision (SAC files) SW4 on Cori-II 5 Hz SW4-RAJA on Sierra 5 Hz Difference ~1e-7 SW4-RAJA on Sierra 5 Hz SW4 on Cori-II 5 Hz overplotted Two codes, different platforms waveforms agree to machine precision!
  • 42. 42 LLNL-PRES-767803 Comparison of near-fault accelerations for a range of resolutions (fmax = maximum frequency) As frequency content increases, amplitudes increase and are more consistent with observations Doubling fmax requires 16x more computational effort { Sierra run Aug. 2018 fmax = 10.0 Hz Livermore 37.6 km
  • 43. 43 LLNL-PRES-767803 GMIM’s versus distance with ASK (2014) GMM HF MW 7.0 10.0 Hz - PGV
  • 44. 44 LLNL-PRES-767803 Perspective on previous regional-scale Hayward Fault runs on various machines
  • 45. 45 LLNL-PRES-767803 Joint Design of Advanced Computing Solutions for Cancer A collaboration between DOE and NCI 4 Pilot 2 RAS Biology in Membranes Pilot 3 Precision Oncology Surveillance Pilot 1 Pre-clinical Model Development Yvonne Evrard (FNLCR) Rick Stevens (ANL) Dwight Nissley (FNLCR) Fred Streitz (LLNL) Lynn Penberthy (NCI) Gina Tourassi (ORNL) Courtesy: Fred Streitz, Jim Glosli, LLNL
  • 46. 46 LLNL-PRES-767803 Oncogenic RAS is responsible for many human cancers 93% of all pancreatic 42% of all colorectal 33% of all lung cancers 1 million deaths/year world-wide No effective inhibitors Pathway transmits signals RAS is a switch oncogenic RAS is “on” RAS localizes to the plasma membrane RAS binds effectors (RAF) to activate growthSimanshu,Cell 170, 2017
  • 47. 47 LLNL-PRES-767803 Essential strategy: utilize appropriate scale methodology for each component Membrane evolves on ms time frame across µm… …while protein interactions involve µs across nm A problem of length and time scale:
  • 48. 48 LLNL-PRES-767803 Essential strategy: utilize appropriate scale methodology for each component Model membrane with RAS at micron (continuum) scale Model protein behavior at molecular scale
  • 49. 49 LLNL-PRES-767803 • Use variational autoencoder to learn reduced order model (ROM) of system • Map every region of simulation into this ROM • Rank every patch in the simulation by distance from others (”uniqueness”) in ROM • Identify the most unique patches as the most interesting From continuum to molecular modeling 1. Identify regions of interest using AI techniques
  • 50. 50 LLNL-PRES-767803 1. Identify region From continuum to molecular modeling • Lipid positions are randomly generated consistent with composition • Ras proteins, if present, are inserted with appropriate conformation • System is equilibrated to “smooth edges” 2. Generate particle positions
  • 51. 51 LLNL-PRES-767803 1. Identify region From continuum to molecular modeling 3. Simulate at molecular resolution 2. Generate particle positions
  • 52. 52 LLNL-PRES-767803 1. Identify region From continuum to molecular modeling 3. Simulate at molecular resolution 4. Feedback molecular information to continuum model 2. Generate particle positions Feedback
  • 53. 53 LLNL-PRES-767803 Preliminary Results • 50nm X 50nm high-res study w/ 2 RAS proteins (40 μs) • Investigate phenomena witnessed originally in μm X μm scale simulation • Aggregation/repulsion of charged lipids (PAP6, PAPS, DIPE, CHOL) following “collision” of RAS • Results demonstrate importance of time and length-scale for simulation
  • 54. 54 LLNL-PRES-767803 ❑ The lifetime of a neutron is ~15 min. (~881 sec), and its value has a profound effect on the mass composition of the universe. ❑ These two methods result in measurements of the lifetime that have a 99% probability of being incompatible - why? ❑ Is there an unaccounted for systematic? ❑ Or more exciting, is there new physics causing them to disagree? ❑ Which method is consistent with the prediction from the Standard Model? ❑ Answering this question requires access to leadership class supercomputers Bottle experiments trap neutrons in a “bottle” and measure how many are left after a period of time. If neutrons also decay to other new unknown particles (? dark matter ?) results will not agree with the known theory of physics (QCD). In the known theory of physics (QCD) neutrons decay to protons. Beam experiments count how many protons emerge from a beam of neutrons. ❑ Using Sierra the team simulated the fundamental theory of Quantum Chromodynamics (QCD) and calculated the lifetime to an unprecedented theoretical accuracy (blue bar). This preliminary result has an uncertainty that is 4 times smaller than the next best result in the field (equivalent to a factor of 16 times more data). ❑ This early science time allowed us to form a concrete plan to further reduce the uncertainty and achieve a discriminating level of precision in the theoretical prediction. The Neutron Lifetime on Sierra Early Science Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration) beam ! = 887.3 ± 3.1s bottle ! = 879.4 ± 0.6s Courtesy: Pavlos Vranas, LLNL
  • 55. 56 LLNL-PRES-767803 The Neutron Lifetime on Sierra Early Science Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration) ❑ The vertical gray band denotes the physical value of pion mass - these points are significantly more expensive than the rest to compute - but the most valuable for the final predictions ❑ The green point in our publication cost as much computing time as all the other points combined ❑ The green point from Sierra has 10x more statistics than our publication ❑ The red point from our publication was not useful ❑ The red point from Sierra came from an entirely new calculation and is now very useful ❑ The blue point from Sierra was entirely unattainable from previous computers (it still needs more statistics to be useful) Nature 558 (2018) no. 7708, 91-94 Sierra Early Science (preliminary) Sierra 0.00 0.05 0.10 0.15 0.20 0.25 0.30 ✏⇡ = m⇡/(4⇡F⇡) 1.10 1.15 1.20 1.25 1.30 1.35 gA model average gLQCD A (✏⇡, a = 0) gPDG A = 1.2723(23) gA(✏⇡, a ' 0.15 fm) gA(✏⇡, a ' 0.12 fm) gA(✏⇡, a ' 0.09 fm) a ' 0.15 fm a ' 0.12 fm a ' 0.09 fm
  • 56. 57 LLNL-PRES-767803 Stanford PSAAP Center Results Extreme-Scale HPC with the Soleil-X Multiphysics Solver M. Papadakis1, L. Jofre1, G. Iaccarino1 & A. Aiken1 ▪ Research Objectives: - Performance & scalability extreme scales - Demonstrate potential of Soleil-X ▪ Performance & Scalability Analysis: - Strong and Weak Scaling - Cost distribution different physics - Time-to-solution - Task-mapping (CPU/GPU) studies ▪ Physics modeling: - Compressible flow formulation - Lagrangian particle tracking - Algebraic & DOM radiation model ▪ Computational approach: - Written in Regent - Targets Legion runtime - GASNet library communication - Realm for OpenMP - CUDA toolkit for GPU 1 2 4 8 16 32 64 128 256 512 1024 2048 Number of Nodes 0.0 0.2 0.4 0.6 0.8 1.0 Efficiency Weak Scaling: Total Solver Soleil-X Max. comm. pattern Runtime bottleneck Investigating … Conclusions & Ongoing Work - At present, Soleil-X is (in average) 1.7 times faster than Soleil-MPI in terms of time-to-solution - Soleil-MPI presents very good weak scaling efficiency on Titan up to 256 nodes - Soleil-X presents very good weak scaling efficiency both on Titan and Sierra up to 256 nodes - Ongoing investigation of the runtime bottleneck observed at 512 nodes
  • 57. 58 LLNL-PRES-767803 § A high resolution simulation of two-fluid mixing in a spherical geometry was ran to gain insight into the growth of a Rayleigh-Taylor instability. § High resolution simulations of instability growth are not practical for routine use, so high resolution simulations like this help guide the development of sub-grid models that capture instability effects with much less computational cost, which are used for Inertial Confinement Fusion (ICF) calculations. We ran a massive turbulent fluid mixing simulation on Sierra in October 2018 In-situ Visualization of Mixing LayerCourtesy: Cyrus Harrison, LLNL
  • 58. 59 LLNL-PRES-767803 Highlights: § The 97.8 billion element simulation ran across 16,384 GPUs on 4,096 Sierra Compute Nodes § The simulation application used CUDA via RAJA to run on the GPUs § Time-varying evolution of the mixing was visualized in-situ using Ascent, also leveraging 16,384 GPUs § Ascent leveraged VTK-m to run visualization algorithms on the GPUs § The last time step was exported to the parallel file system for detailed post-hoc visualization using VisIt This simulation was the first to utilize over 16,000 GPUs on Sierra This successful simulation is the result of many years of preparation!
  • 59. 60 LLNL-PRES-767803 § Running physics algorithms on the GPUs allows us to scale and turn around massive simulations in a power efficient way § For this simulation, the physics calculations alone (excluding including setup, I/O, etc) took 60 hours of compute time on Sierra’s V100 GPUs § We estimate using the entire Sequoia system the same problem would take 30 days of physics compute time Porting has unlocked impressive performance on Sierra and positions us well for future architectures Compared Node to Node to other HPC architectures, we expect Sierra will provide 6x to 12x faster simulation turn around time
  • 60. 61 LLNL-PRES-767803 Similarly, DOE’s visualization community has invested over many years to create tools for in-situ and GPU-enabled visualization 2010 2011 2012 2013 2014 2015 2016 2017 EAVL DAX PISTON ISAV15 Strawman SC16 Performance Modeling (Strawman)ASCAC Subcommittee on Exascale Computing Strawman ECP Begins ISAV17 Ascent Data Parallel Visualization Research Dist. Mem. 2018 In-situ viz of 97.8 B Element Simulation on Sierra using Ascent
  • 61. 62 LLNL-PRES-767803 Knowing this, we used a mix of in-situ (in-memory) and post-hoc (file-based) visualization: § In-situ: Use used Ascent (https://github.com/alpine-dav/ascent) to visualize the time- varying evolution of the mixing layer and export the final mesh data to compressed HDF5 files for post-hoc exploration —Ascent used VTK-m (http://m.vtk.org/) to run visualization algorithms on the GPUs § Post-hoc: We used VisIt (https://visit.llnl.gov) to explore details of final state at the final simulation state On Sierra, computation far outpaces our ability to save data to the parallel file system for detailed post-hoc visualization In-situ visualization of the time-varying details of the simulation avoided terabytes of I/O vs post-hoc visualization
  • 62. 63 LLNL-PRES-767803 We ran a massive turbulent fluid mixing simulation on Sierra in October 2018 Post-hoc Visualization of Mixing Layer
  • 63. 64 LLNL-PRES-767803 Conclusion: This successful simulation is the result of many years of preparation in both our simulation and visualization tools Highlights: § The 97.8 billion element simulation ran across 16,384 GPUs on 4,096 Sierra Compute Nodes § The simulation application used CUDA via RAJA to run on the GPUs § Time-varying evolution of the mixing was visualized in-situ using Ascent, also leveraging 16,384 GPUs § Ascent leveraged VTK-m to run visualization algorithms on the GPUs § The last time step was exported to the parallel file system for detailed post-hoc visualization using VisIt Links: § LLNL's Sierra Supercomputer: https://computation.llnl.gov/computers/sierra § RAJA https://github.com/llnl/raja § https://github.com/alpine-dav/ascent § http://m.vtk.org/ § https://visit.llnl.gov § https://github.com/llnl/conduit VisIt
  • 64. 65 LLNL-PRES-767803 Future simulations for complex systems will be guided by machine learning Experimental data Model predictions, uncertainties Complex design model The next simulation? Simulation ensemble pipeline On-line machine learning Cognitive simulation learns an approximate model for simulation state dynamics and output as it runs … The next experiment? We’re developing an integrated roadmap for NNSA and LLNL in cognitive simulation
  • 65. 66 LLNL-PRES-767803 The Virtuous Circle of HPC Simulation and Machine Learning AI and machine learning is driven by large amounts of training data Which gets funneled down to a small amount of data useful for decision analysis HPC Simulation starts with a small amount of data as initial conditions And generates huge amounts of data suitable for AI training
  • 66. 67 LLNL-PRES-767803 Cognitive simulation integrates machine learning with simulation at multiple levels Learning in the simulation loop Learning on the simulation loop Steering the simulation loop T 10 0 T 60 00 T 0 Molecular dynamics systems that decide when and where to switch resolution 100X longer simulation time spans Simulation systems that can predict and correct mesh tangling Greatly reduced time on mesh tuning Learned models integrating simulation and experiment Improved prediction, Efficient UQ, New designs
  • 67. 68 LLNL-PRES-767803 Scientific computing challenges differ from industry challenges • Different type of scaling • Data augmentation techniques • Image classification defined by fine features • Laws of physics Goal is to leverage industry advancements and specialize when needed. Validation needs to be a part of this!
  • 68. 69 LLNL-PRES-767803 • Results must honor laws conservation (e.g. energy, mass)Physics Constraints • It is sometimes intractable to generate sufficient training dataSparse Data • Interpretability of results necessary for predictive scienceExplainability • Lots of data with no standards for specification or sharingData Collection Open Research Questions for Machine Learning in Science These are all critical research areas to pursue if we are to demonstrate the value of AI techniques in the pursuit of predictive science
  • 69. 70 LLNL-PRES-767803 Intelligence built into HPC simulations will change how we tackle problems that have overwhelming amounts of data Higher-fidelity subgrid models run inline with multi-physics integrated codes Exascale Approach Analysis is performed by knowing where to look for interesting features in the data Performance optimized in applications to be best for average case AI mimics high fidelity models at a fraction of the computational cost AI-based Approach Smart integrated AI identifies features of interest in throngs of data On-the-fly recompilation of code with optimal settings for specific run Applications run faster, we can increase number of simulations for UQ and validation Expected Benefits Insights from exascale simulations aren’t lost in a data deluge More effective use of complex exascale architectures Intelligent Simulation complements our exascale investments. Sierra will allow us to research and explore the possiblities
  • 70. 71 LLNL-PRES-767803 § Multi-scale problems are difficult to model —Often solved using low-fidelity table lookups —More accurate models are user-intensive and/or computationally expensive § Surrogate models can be trained on high-fidelity physics models —Expensive calculations may be reduced to function calls Surrogate models to increase simulation coupling for multi-scale problems Ab-initio Atoms Long-time Microstructure Dislocation Crystal Continuum Inter-atomic forces, EOS, excited states Defects and interfaces, nucleation Defects and defect structures Meso-scale multi- phase, multi-grain evolution Meso-scale strength Meso-scale material response Macro-scale material response
  • 71. 72 LLNL-PRES-767803 The ATOM partnership is exploring new active learning approaches to accelerate drug design Approach: An open public- private partnership -Lead with computation supported and validated by targeted experiments -Data-sharing to build models using everyone’s data Partners: LLNL, GSK, NCI, UCSF Product: An open-source framework of tools and capabilities 25 FTE’s in shared Mission Bay, SF space R&D started March 2018
  • 72. 73 LLNL-PRES-767803 ADAPD: Advanced Data Analytics for Proliferation Detection Indirect Data New and responsive analytics capabilities at the Labs Focused Data Science R&D SME process and physics models NNSA R&D Testbeds Establishing the next generation of proliferation analytics expertise • Predictive Models • MultiModal Detection • Crossing Facilities and Scenarios DOE/NNSA Partnerships Student and University Programs Focus on Hard Problems Leveraging Growing Industry Investments Innovation in low-profile nuclear proliferation detection
  • 73. 74 LLNL-PRES-767803 LLNL’s Data Science Institute (established 2017) Big Machines. Big Data. Big Ideas. The Data Science Institute acts as the central hub for all data science activity at LLNL working to help lead, build, and strengthen the data science workforce, research, and outreach to advance the state-of-the- art of our nation's data science capabilities. The Data Science Institute acts as the central hub for all data science activity at LLNL, working to help lead, build, and strengthen the data science workforce, research, and outreach to advance the state of the art of LLNL’s data science capabilities. https://data-science.llnl.gov/
  • 74. 75 LLNL-PRES-767803 Data Science Council § Inform S&T investments § Communication Workforce Development § Reading groups § Courses § Summer Program Web Presence § External/Internal website § Mailing list § Slack Outreach § Academic engagement § Info sessions § Recruiting Seminar Series/ Workshop § Invited speakers § UC/LLNL multi- day workshop § Strengthen and sustain LLNL’s data science workforce § Enhance internal coordination and communication of the data science workforce across disciplines and programs § Promote and expand the visibility of data science work at LLNL § Evolve strategic data science vision and guide S&T investments Strengthening the LLNL Data Science Workforce Fostering a sense of community
  • 75. 76 LLNL-PRES-767803 Data Science Summer Institute The Focus is the Future 1000+ applicants in FY18 12-week flexible internship focused on data science Students funded 50% to work on solutions to challenge problems, attend courses and seminars, and strengthen skills. 50 total students in FY17+FY18 § 26 students in FY18 § 24 students in FY17 Visiting Faculty § James Flegal, UC Riverside § Robert Gramacy, Virginia Tech Challenge Problems § Topology Optimization § Machine Vision § Multimodal Data Exploration § Cyber
  • 76. 77 LLNL-PRES-767803 § The 125 PF Sierra system is currently being stood up for use in the Stockpile Stewardship mission of the NNSA § Preparation for the pivot to heterogenous computing was difficult, but is demonstrating significant benefits in computational results § The Open Science period for Sierra allowed open applications significant allocations to demonstrate cutting-edge science results § LLNL is pursuing an application strategy that will encompass both traditional exascale computing needs, as well as the burgeoning AI/ML trends. Conclusions
  • 77. Disclaimer This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.