SlideShare a Scribd company logo
Trinity: 
Advanced 
Technology 
System 
for 
the 
ASC 
Program 
Manuel Vigil 
Trinity Project Director 
High Performance Computing Division 
Los Alamos National Laboratory 
HPC 
User 
Forum 
September 
17, 
2014 
LA-­‐UR-­‐14-­‐27024
NERSC-8 and Trinity team activities 
Market 
surveys 
NERSC-­‐8 
Trinity 
Market 
surveys 
Joint 
Market 
surveys 
Crea6ng 
requirements 
Release 
RFP 
Vendor 
Selec6on 
Some 
joint 
nego6a6ons 
Common 
base 
SOW 
language? 
Nego6a6ons 
Con$nue 
to 
work 
together 
for 
CORI 
and 
Trinity 
deployments 
Nego6a6ons 
Separate 
SOWs 
Separate 
SOWs
Topics 
• Trinity 
Status 
• ASC 
Compu@ng 
Strategy 
• Trinity 
Project 
Drivers 
& 
Mission 
Need 
• The 
Trinity 
System 
– High-­‐level 
architecture 
overview 
– High-­‐level 
capabili@es 
– Schedule 
• Summary 
3
Trinity 
Status 
• Formal 
Design 
Review 
and 
Independent 
Project 
Review 
completed 
in 
2013 
• Trinity/NERSC8 
RFP 
released 
August 
2013 
• Technical 
Evalua@on 
of 
the 
proposals 
completed 
September 
2013 
• Ini@al 
nego@a@ons 
for 
both 
systems 
completed 
November 
2013 
• Ini@al 
Trinity 
system 
procurement 
completed 
in 
late 
2013/early 
2014 
– Before 
final 
approval 
Trinity 
went 
back 
to 
the 
proposing 
vendors 
for 
a 
Best 
and 
Final 
Offer 
(BAFO) 
– Target 
delivery 
date 
of 
September 
2015 
is 
unchanged 
• Trinity 
Best 
and 
Final 
Offer 
RFP 
released 
to 
vendors 
March 
2014 
• Trinity 
proposal 
evalua@ons 
and 
nego@a@ons 
completed 
April 
2014 
• Trinity 
Procurement 
Approval 
– 
No@ce 
of 
Consent 
from 
NNSA 
received 
May 
16, 
2014 
• Trinity 
Independent 
Cost 
Review 
completed 
May 
2014 
• Trinity 
CD-­‐2/3b 
approved 
July 
3, 
2014 
• Trinity 
Contract 
Awarded 
to 
Cray, 
Inc. 
on 
July 
9, 
2014 
4
ASC 
Compu;ng 
Strategy 
• Approach: Two classes of systems 
– Advanced Technology: First-of-a-kind systems that identify and foster 
technical capabilities and features that are beneficial to ASC 
applications 
– Commodity Technology: Robust, cost-effective systems to meet the 
day-to-day simulation workload needs of the program 
• Advanced Technology Systems 
– Leadership-class platforms 
– Pursue promising new technology paths 
– These systems are to meet unique mission needs and to help prepare 
the program for future system designs 
– Includes Non-Recurring Engineering (NRE) funding to enable delivery of 
leading-edge platforms 
– Acquire right-sized platforms to meet the mission needs 
– Trinity will be deployed by ACES (New Mexico Alliance for Computing at 
Extreme Scale, i.e., Los Alamos & Sandia) 
5
Advanced Technology 
Systems (ATS) 
Cielo 
(LANL/SNL) 
Sequoia 
(LLNL) 
‘12 ‘13 ‘14 ‘15 ‘16 ‘17 
Fiscal Year 
Use 
Retire 
‘18 ‘19 ‘20 
Commodity 
Technology 
Systems (CTS) 
Dev. & 
Deploy 
ATS 
1 
– 
Trinity 
(LANL/SNL) 
ATS 
2 
– 
(LLNL) 
ATS 
3 
– 
(LANL/SNL) 
Tri-­‐lab 
Linux 
Capacity 
Cluster 
II 
(TLCC 
II) 
CTS 
1 
CTS 
2 
‘21 
System 
Delivery 
ASC 
Plaborm 
Timeline 
ASC 
Compu@ng 
Strategy 
includes 
applica@on 
code 
transi@on 
for 
all 
plaborms 
6
Trinity 
Project 
Drivers 
and 
Mission 
Need 
• Sa@sfy 
the 
mission 
need 
for 
more 
capable 
plaborms 
– Trinity 
is 
designed 
to 
support 
the 
largest, 
most 
demanding 
ASC 
applica@ons 
– Increases 
in 
geometric 
and 
physics 
fideli@es 
while 
sa@sfying 
analysts’ 
@me-­‐to-­‐ 
solu@on 
expecta@ons 
• Mission 
Need 
developed 
with 
tri-­‐lab 
input 
• Trinity 
will 
support 
the 
tri-­‐lab 
applica@ons 
community 
at 
LLNL, 
SNL, 
and 
LANL 
• Mission 
Need 
Requirements 
are 
primarily 
driving 
memory 
capacity 
– Over 
2 
PB 
of 
aggregate 
main 
memory 
– Trinity 
is 
sized 
to 
run 
several 
jobs 
using 
about 
750 
TBytes 
of 
memory 
7
Overview 
of 
Trinity 
Award 
• Subcontractor 
– Cray, 
Inc. 
• Firm 
Fixed 
Price 
Subcontract: 
– Trinity 
Plaborm 
(including 
File 
System) 
– Burst 
Buffer 
– 2 
Applica@on 
Regression 
Test 
Systems 
– 1 
System 
Development 
Test 
System 
– On-­‐site 
System 
and 
Applica@on 
Analysts 
– Center 
of 
Excellence 
for 
Applica@on 
Transi@on 
Support 
– Advanced 
Power 
Management 
– Trinity 
System 
Maintenance 
8
Trinity 
PlaBorm 
• Trinity 
is 
a 
single 
system 
that 
contains 
both 
Intel 
Haswell 
and 
Knights 
Landing 
processors 
– Haswell 
par@@on 
sa@sfies 
FY15 
mission 
needs 
(well 
suited 
to 
exis@ng 
codes). 
– KNL 
par@@on 
delivered 
in 
FY16 
results 
in 
a 
system 
significantly 
more 
capable 
than 
current 
plaborms 
and 
provides 
the 
applica@on 
developers 
with 
an 
ahrac@ve 
next-­‐genera@on 
target 
(and 
significant 
challenges) 
– Aries 
interconnect 
with 
the 
Dragonfly 
network 
topology 
• Based 
on 
mature 
Cray 
XC30 
architecture 
with 
Trinity 
introducing 
new 
architectural 
features 
– Intel 
Knights 
Landing 
(KNL) 
processors 
– Burst 
Buffer 
storage 
nodes 
– Advanced 
power 
management 
system 
sokware 
enhancements 
9
Trinity 
PlaBorm 
• Trinity 
is 
enabling 
new 
architecture 
features 
in 
a 
produc@on 
compu@ng 
environment 
– Trinity’s 
architecture 
will 
introduce 
new 
challenges 
for 
code 
teams: 
transi@on 
from 
mul@-­‐ 
core 
to 
many-­‐core, 
high-­‐speed 
on-­‐chip 
memory 
subsystem, 
wider 
SIMD/vector 
units 
– Tightly 
coupled 
solid 
state 
storage 
serves 
as 
a 
“burst 
buffer” 
for 
checkpoint/restart 
file 
I/O 
& 
data 
analy@cs, 
enabling 
improved 
@me-­‐to-­‐ 
solu@on 
efficiencies 
– Advanced 
power 
management 
features 
enable 
measurement 
and 
control 
at 
the 
system, 
node, 
and 
component 
levels, 
allowing 
explora@on 
of 
applica@on 
performance/wah 
and 
reducing 
total 
cost 
of 
ownership 
• Managed 
Risk 
– Cray 
XC30 
architecture 
minimizes 
system 
sokware 
risk 
and 
provides 
a 
mature 
high-­‐speed 
interconnect 
– Haswell 
par@@on 
is 
low 
risk 
as 
technology; 
available 
in 
Fall 
CY14 
– KNL 
is 
higher 
risk 
due 
to 
new 
technology, 
but 
provides 
a 
reasonable 
path, 
and 
resource, 
for 
code 
teams 
to 
transi@on 
to 
the 
many-­‐core 
architecture 
10
11
Trinity 
Architecture 
Overview 
Metric 
Trinity 
Node 
Architecture 
KNL 
+ 
Haswell 
Haswell 
Par@@on 
KNL 
Par@@on 
Memory 
Capacity 
2.11 
PB 
>1 
PB 
>1 
PB 
Memory 
BW 
>7PB/sec 
>1 
PB/s 
>1PB/s 
+>4PB/s 
Peak 
FLOPS 
42.2 
PF 
11.5 
PF 
30.7 
PF 
Number 
of 
Nodes 
19,000+ 
>9,500 
>9500 
Number 
of 
Cores 
>760,000 
>190,000 
>570000 
Number 
of 
Cabs 
(incl 
I/O 
& 
BB) 
112 
PFS 
Capacity 
(usable) 
82 
PB 
usable 
PFS 
Bandwidth 
(sustained) 
1.45 
TB/s 
BB 
Capacity 
(usable) 
3.7 
PB 
BB 
Bandwidth 
(sustained) 
3.3 
TB/s 
12
Compute 
Node 
specifica;ons 
Haswell 
Knights 
Landing 
Memory 
Capacity 
(DDR) 
2x64=128 
GB 
Comparable 
to 
Intel® 
Xeon® 
processor 
Memory 
Bandwidth 
(DDR) 
136.5 
GB/s 
Comparable 
to 
Intel® 
Xeon® 
processor 
# 
of 
sockets 
per 
node 
2 
N/A 
# 
of 
cores 
2x16=32 
60+ 
cores 
Core 
frequency 
(GHz) 
2.3 
N/A 
# 
of 
memory 
channels 
2x4=8 
N/A 
Memory 
Technology 
2133 
MHz 
DDR4 
MCDRAM 
& 
DDR4 
Threads 
per 
core 
2 
4 
Vector 
units 
& 
width 
(per 
core) 
1x256 
AVX2 
AVX-­‐512 
On-­‐chip 
MCDRAM 
N/A 
Up 
to 
16GB 
at 
launch, 
over 
5x 
STREAM 
vs. 
DDR4 
13
Trinity 
Capabili;es 
• Each 
par@@on 
will 
accommodate 
1 
to 
2 
large 
mission 
problems 
• Capability 
rela@ve 
to 
Cielo 
– 8x 
to 
12x 
improvement 
in 
fidelity, 
physics 
and 
performance 
– > 
30x 
increase 
in 
peak 
FLOPS 
– > 
2x 
increase 
in 
node-­‐level 
parallelism 
– > 
6x 
increase 
in 
cores 
– > 
20x 
increase 
in 
threads 
• Capability 
rela@ve 
to 
Sequoia 
– 2x 
increase 
in 
peak 
FLOPS 
– Similar 
complexity 
rela@ve 
to 
core 
and 
thread 
level 
parallelism
The 
Trinity 
Center 
of 
Excellence 
& 
Applica;on 
Transi;on 
Challenges 
• Center 
of 
Excellence 
– Work 
with 
select 
NW 
applica@on 
code 
teams 
to 
ensure 
KNL 
Par@@on 
is 
used 
effec@vely 
upon 
ini@al 
deployment 
– Nominally 
one 
applica@on 
per 
laboratory 
(SNL, 
LANL, 
LLNL) 
– Chosen 
such 
that 
they 
impact 
the 
NW 
program 
in 
FY17 
– Facilitate 
the 
transi@on 
to 
next-­‐genera@on 
ATS 
code 
migra@on 
issues 
– This 
is 
NOT 
a 
benchmarking 
effort 
• Intel 
Knights 
Landing 
processor 
– From 
mul@-­‐core 
to 
many-­‐core 
– > 
10x 
increase 
in 
thread 
level 
parallelism 
– A 
reduc@on 
in 
per 
core 
throughput 
(1/4 
to 
1/3 
the 
performance 
of 
a 
Xeon 
core) 
– MCDRAM: 
Fast 
but 
limited 
capacity 
(~5x 
the 
BW, 
~1/5 
the 
capacity 
of 
DDR4 
memory) 
– Dual 
AVX-­‐512 
SIMD 
units: 
Does 
your 
code 
vectorize? 
• Burst 
Buffer 
– Data 
analy@cs 
use 
cases 
need 
to 
be 
developed 
and/or 
deployed 
into 
produc@on 
codes 
– Checkpoint/Restart 
should 
“just 
work”, 
although 
advanced 
features 
may 
require 
code 
changes
Trinity 
will 
be 
located 
at 
the 
Los 
Alamos 
Nicholas 
C. 
Metropolis 
Center 
for 
Modeling 
and 
Simula;on 
16 
Nicholas 
C. 
Metropolis 
Center 
for 
Modeling 
and 
Simula;on 
• Classified 
compu@ng 
• 15MW 
/ 
12MW 
water, 
3MW 
air 
• 42” 
subfloor, 
300 
lbs/sqf 
• 80’x100’ 
(8,000 
sqf) 
Trinity 
Power 
and 
Cooling 
• At 
least 
80% 
of 
the 
plaborm 
will 
be 
water 
cooled 
• First 
large 
water 
cooled 
plaborm 
at 
Los 
Alamos 
• Concerns 
• Idle 
power 
efficiency 
• Rapid 
ramp 
up 
/ 
ramp 
down 
load 
on 
power 
grid 
over 
2MW
Trinity 
PlaBorm 
Schedule 
Highlights 
2014-­‐2016 
17
Challenges 
and 
Opportuni;es 
• Applica@on 
transi@on 
work 
using 
next 
genera@on 
technologies 
(for 
Trinity, 
ATS-­‐2, 
ATS-­‐3, 
…) 
– Many-­‐core, 
hierarchical 
memory, 
burst 
buffer 
• Opera@ng 
a 
large 
supercomputer 
using 
liquid 
cooling 
technology 
• Opera@ng 
Trinity 
using 
a 
mix 
of 
Haswell 
and 
KNL 
nodes 
• The 
Burst 
Buffer 
concepts 
and 
technology 
for 
improving 
applica@on 
efficiency 
and 
exploring 
other 
user 
cases 
• On 
the 
road 
to 
Exascale… 
• ….
Trinity 
System 
Summary 
• Trinity 
is 
the 
first 
instan@a@on 
of 
the 
ASC’s 
ATS 
plaborm 
• Trinity 
meets 
or 
exceeds 
the 
goals 
set 
by 
the 
ACES 
Design 
Team 
• Rela@ve 
to 
Cielo, 
Trinity 
will 
require 
applica@ons 
to 
transi@on 
to 
an 
MPI+X 
programming 
environment 
and 
requires 
increases 
in 
thread 
and 
vector 
level 
parallelism 
to 
be 
exposed 
• Trinity 
introduces 
Ac@ve 
Power 
Management, 
Burst 
Buffer 
storage 
accelera@on, 
and 
the 
concept 
of 
a 
Center 
of 
Excellence 
to 
ASC 
produc@on 
plaborms 
19
Special 
thanks 
to 
the 
following 
for 
contribu@on 
of 
slides 
used 
in 
this 
talk 
• Doug 
Doerfler 
-­‐ 
SNL 
• Thuc 
Hoang 
– 
NNSA 
ASC 
• And 
a 
cast 
of 
others……. 
20
Ques;ons? 
• For 
more 
info 
visit 
the 
Trinity 
Web 
Site: 
trinity.lanl.gov 
21

More Related Content

PPT
Open Programmable Architecture for Java-enabled Network Devices
PPTX
Task allocation on many core-multi processor distributed system
PPT
Pushing the limits of Controller Area Network (CAN)
PPT
Runtime Reconfigurable Network-on-chips for FPGA-based Devices
PDF
Concept Drift: Monitoring Model Quality In Streaming ML Applications
PPT
Runtime Reconfigurable Network-on-chips for FPGA-based Systems
PPTX
Cpscom2012 jianli
PDF
Topograhical synthesis
Open Programmable Architecture for Java-enabled Network Devices
Task allocation on many core-multi processor distributed system
Pushing the limits of Controller Area Network (CAN)
Runtime Reconfigurable Network-on-chips for FPGA-based Devices
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Runtime Reconfigurable Network-on-chips for FPGA-based Systems
Cpscom2012 jianli
Topograhical synthesis

What's hot (20)

PPTX
DevoFlow - Scaling Flow Management for High-Performance Networks
PDF
CORAL-2 Exascale Computing RFP and Draft Technical Requirements
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
TRACK D: A breakthrough in logic design drastically improving performances fr...
PPT
Tech Days 2015: User Presentation Vermont Technical College
PDF
Traffic Engineering in Software-Defined Networks
PPTX
Medea: Scheduling of Long Running Applications in Shared Production Clusters
PDF
A Dataflow Processing Chip for Training Deep Neural Networks
PDF
ARM HPC Ecosystem
PPT
IT Platform Selection by Economic Factors and Information Security Requiremen...
PPT
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
PDF
Trends in Mixed Signal Validation
PPTX
Cloud computing Module 2 First Part
PPT
Synthesis of Platform Architectures from OpenCL Programs
PPTX
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
PDF
OVERCOMING KEY CHALLENGES OF TODAY'S COMPLEX SOC: PERFORMANCE OPTIMIZATION AN...
PPTX
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
PDF
PLNOG 3: John Evans - Best Practices in Network Planning
PPTX
Valiant Load Balancing and Traffic Oblivious Routing
PPTX
Processing and retrieval of geotagged unmanned aerial system telemetry
DevoFlow - Scaling Flow Management for High-Performance Networks
CORAL-2 Exascale Computing RFP and Draft Technical Requirements
Energy Efficient Computing using Dynamic Tuning
TRACK D: A breakthrough in logic design drastically improving performances fr...
Tech Days 2015: User Presentation Vermont Technical College
Traffic Engineering in Software-Defined Networks
Medea: Scheduling of Long Running Applications in Shared Production Clusters
A Dataflow Processing Chip for Training Deep Neural Networks
ARM HPC Ecosystem
IT Platform Selection by Economic Factors and Information Security Requiremen...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Trends in Mixed Signal Validation
Cloud computing Module 2 First Part
Synthesis of Platform Architectures from OpenCL Programs
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
OVERCOMING KEY CHALLENGES OF TODAY'S COMPLEX SOC: PERFORMANCE OPTIMIZATION AN...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
PLNOG 3: John Evans - Best Practices in Network Planning
Valiant Load Balancing and Traffic Oblivious Routing
Processing and retrieval of geotagged unmanned aerial system telemetry
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
PDF
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Ad

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
cuic standard and advanced reporting.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
cuic standard and advanced reporting.pdf
Modernizing your data center with Dell and AMD
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Monthly Chronicles - July 2025
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity

Update on Trinity System Procurement and Plans

  • 1. Trinity: Advanced Technology System for the ASC Program Manuel Vigil Trinity Project Director High Performance Computing Division Los Alamos National Laboratory HPC User Forum September 17, 2014 LA-­‐UR-­‐14-­‐27024
  • 2. NERSC-8 and Trinity team activities Market surveys NERSC-­‐8 Trinity Market surveys Joint Market surveys Crea6ng requirements Release RFP Vendor Selec6on Some joint nego6a6ons Common base SOW language? Nego6a6ons Con$nue to work together for CORI and Trinity deployments Nego6a6ons Separate SOWs Separate SOWs
  • 3. Topics • Trinity Status • ASC Compu@ng Strategy • Trinity Project Drivers & Mission Need • The Trinity System – High-­‐level architecture overview – High-­‐level capabili@es – Schedule • Summary 3
  • 4. Trinity Status • Formal Design Review and Independent Project Review completed in 2013 • Trinity/NERSC8 RFP released August 2013 • Technical Evalua@on of the proposals completed September 2013 • Ini@al nego@a@ons for both systems completed November 2013 • Ini@al Trinity system procurement completed in late 2013/early 2014 – Before final approval Trinity went back to the proposing vendors for a Best and Final Offer (BAFO) – Target delivery date of September 2015 is unchanged • Trinity Best and Final Offer RFP released to vendors March 2014 • Trinity proposal evalua@ons and nego@a@ons completed April 2014 • Trinity Procurement Approval – No@ce of Consent from NNSA received May 16, 2014 • Trinity Independent Cost Review completed May 2014 • Trinity CD-­‐2/3b approved July 3, 2014 • Trinity Contract Awarded to Cray, Inc. on July 9, 2014 4
  • 5. ASC Compu;ng Strategy • Approach: Two classes of systems – Advanced Technology: First-of-a-kind systems that identify and foster technical capabilities and features that are beneficial to ASC applications – Commodity Technology: Robust, cost-effective systems to meet the day-to-day simulation workload needs of the program • Advanced Technology Systems – Leadership-class platforms – Pursue promising new technology paths – These systems are to meet unique mission needs and to help prepare the program for future system designs – Includes Non-Recurring Engineering (NRE) funding to enable delivery of leading-edge platforms – Acquire right-sized platforms to meet the mission needs – Trinity will be deployed by ACES (New Mexico Alliance for Computing at Extreme Scale, i.e., Los Alamos & Sandia) 5
  • 6. Advanced Technology Systems (ATS) Cielo (LANL/SNL) Sequoia (LLNL) ‘12 ‘13 ‘14 ‘15 ‘16 ‘17 Fiscal Year Use Retire ‘18 ‘19 ‘20 Commodity Technology Systems (CTS) Dev. & Deploy ATS 1 – Trinity (LANL/SNL) ATS 2 – (LLNL) ATS 3 – (LANL/SNL) Tri-­‐lab Linux Capacity Cluster II (TLCC II) CTS 1 CTS 2 ‘21 System Delivery ASC Plaborm Timeline ASC Compu@ng Strategy includes applica@on code transi@on for all plaborms 6
  • 7. Trinity Project Drivers and Mission Need • Sa@sfy the mission need for more capable plaborms – Trinity is designed to support the largest, most demanding ASC applica@ons – Increases in geometric and physics fideli@es while sa@sfying analysts’ @me-­‐to-­‐ solu@on expecta@ons • Mission Need developed with tri-­‐lab input • Trinity will support the tri-­‐lab applica@ons community at LLNL, SNL, and LANL • Mission Need Requirements are primarily driving memory capacity – Over 2 PB of aggregate main memory – Trinity is sized to run several jobs using about 750 TBytes of memory 7
  • 8. Overview of Trinity Award • Subcontractor – Cray, Inc. • Firm Fixed Price Subcontract: – Trinity Plaborm (including File System) – Burst Buffer – 2 Applica@on Regression Test Systems – 1 System Development Test System – On-­‐site System and Applica@on Analysts – Center of Excellence for Applica@on Transi@on Support – Advanced Power Management – Trinity System Maintenance 8
  • 9. Trinity PlaBorm • Trinity is a single system that contains both Intel Haswell and Knights Landing processors – Haswell par@@on sa@sfies FY15 mission needs (well suited to exis@ng codes). – KNL par@@on delivered in FY16 results in a system significantly more capable than current plaborms and provides the applica@on developers with an ahrac@ve next-­‐genera@on target (and significant challenges) – Aries interconnect with the Dragonfly network topology • Based on mature Cray XC30 architecture with Trinity introducing new architectural features – Intel Knights Landing (KNL) processors – Burst Buffer storage nodes – Advanced power management system sokware enhancements 9
  • 10. Trinity PlaBorm • Trinity is enabling new architecture features in a produc@on compu@ng environment – Trinity’s architecture will introduce new challenges for code teams: transi@on from mul@-­‐ core to many-­‐core, high-­‐speed on-­‐chip memory subsystem, wider SIMD/vector units – Tightly coupled solid state storage serves as a “burst buffer” for checkpoint/restart file I/O & data analy@cs, enabling improved @me-­‐to-­‐ solu@on efficiencies – Advanced power management features enable measurement and control at the system, node, and component levels, allowing explora@on of applica@on performance/wah and reducing total cost of ownership • Managed Risk – Cray XC30 architecture minimizes system sokware risk and provides a mature high-­‐speed interconnect – Haswell par@@on is low risk as technology; available in Fall CY14 – KNL is higher risk due to new technology, but provides a reasonable path, and resource, for code teams to transi@on to the many-­‐core architecture 10
  • 11. 11
  • 12. Trinity Architecture Overview Metric Trinity Node Architecture KNL + Haswell Haswell Par@@on KNL Par@@on Memory Capacity 2.11 PB >1 PB >1 PB Memory BW >7PB/sec >1 PB/s >1PB/s +>4PB/s Peak FLOPS 42.2 PF 11.5 PF 30.7 PF Number of Nodes 19,000+ >9,500 >9500 Number of Cores >760,000 >190,000 >570000 Number of Cabs (incl I/O & BB) 112 PFS Capacity (usable) 82 PB usable PFS Bandwidth (sustained) 1.45 TB/s BB Capacity (usable) 3.7 PB BB Bandwidth (sustained) 3.3 TB/s 12
  • 13. Compute Node specifica;ons Haswell Knights Landing Memory Capacity (DDR) 2x64=128 GB Comparable to Intel® Xeon® processor Memory Bandwidth (DDR) 136.5 GB/s Comparable to Intel® Xeon® processor # of sockets per node 2 N/A # of cores 2x16=32 60+ cores Core frequency (GHz) 2.3 N/A # of memory channels 2x4=8 N/A Memory Technology 2133 MHz DDR4 MCDRAM & DDR4 Threads per core 2 4 Vector units & width (per core) 1x256 AVX2 AVX-­‐512 On-­‐chip MCDRAM N/A Up to 16GB at launch, over 5x STREAM vs. DDR4 13
  • 14. Trinity Capabili;es • Each par@@on will accommodate 1 to 2 large mission problems • Capability rela@ve to Cielo – 8x to 12x improvement in fidelity, physics and performance – > 30x increase in peak FLOPS – > 2x increase in node-­‐level parallelism – > 6x increase in cores – > 20x increase in threads • Capability rela@ve to Sequoia – 2x increase in peak FLOPS – Similar complexity rela@ve to core and thread level parallelism
  • 15. The Trinity Center of Excellence & Applica;on Transi;on Challenges • Center of Excellence – Work with select NW applica@on code teams to ensure KNL Par@@on is used effec@vely upon ini@al deployment – Nominally one applica@on per laboratory (SNL, LANL, LLNL) – Chosen such that they impact the NW program in FY17 – Facilitate the transi@on to next-­‐genera@on ATS code migra@on issues – This is NOT a benchmarking effort • Intel Knights Landing processor – From mul@-­‐core to many-­‐core – > 10x increase in thread level parallelism – A reduc@on in per core throughput (1/4 to 1/3 the performance of a Xeon core) – MCDRAM: Fast but limited capacity (~5x the BW, ~1/5 the capacity of DDR4 memory) – Dual AVX-­‐512 SIMD units: Does your code vectorize? • Burst Buffer – Data analy@cs use cases need to be developed and/or deployed into produc@on codes – Checkpoint/Restart should “just work”, although advanced features may require code changes
  • 16. Trinity will be located at the Los Alamos Nicholas C. Metropolis Center for Modeling and Simula;on 16 Nicholas C. Metropolis Center for Modeling and Simula;on • Classified compu@ng • 15MW / 12MW water, 3MW air • 42” subfloor, 300 lbs/sqf • 80’x100’ (8,000 sqf) Trinity Power and Cooling • At least 80% of the plaborm will be water cooled • First large water cooled plaborm at Los Alamos • Concerns • Idle power efficiency • Rapid ramp up / ramp down load on power grid over 2MW
  • 17. Trinity PlaBorm Schedule Highlights 2014-­‐2016 17
  • 18. Challenges and Opportuni;es • Applica@on transi@on work using next genera@on technologies (for Trinity, ATS-­‐2, ATS-­‐3, …) – Many-­‐core, hierarchical memory, burst buffer • Opera@ng a large supercomputer using liquid cooling technology • Opera@ng Trinity using a mix of Haswell and KNL nodes • The Burst Buffer concepts and technology for improving applica@on efficiency and exploring other user cases • On the road to Exascale… • ….
  • 19. Trinity System Summary • Trinity is the first instan@a@on of the ASC’s ATS plaborm • Trinity meets or exceeds the goals set by the ACES Design Team • Rela@ve to Cielo, Trinity will require applica@ons to transi@on to an MPI+X programming environment and requires increases in thread and vector level parallelism to be exposed • Trinity introduces Ac@ve Power Management, Burst Buffer storage accelera@on, and the concept of a Center of Excellence to ASC produc@on plaborms 19
  • 20. Special thanks to the following for contribu@on of slides used in this talk • Doug Doerfler -­‐ SNL • Thuc Hoang – NNSA ASC • And a cast of others……. 20
  • 21. Ques;ons? • For more info visit the Trinity Web Site: trinity.lanl.gov 21