SlideShare a Scribd company logo
Alan F. Sill, PhD
Managing Director, High Performance Computing Center, Texas Tech University
On behalf of the TTU IT Division and HPCC Staff
AMD HPC User Forum — September 15-17, 2020
Design Considerations, Installation, and
Commissioning of the RedRaider
Cluster at the Texas Tech University
High Performance Computing Center
High Performance Computing Center
AMD HPC User Forum — September 15-17, 2020
Outline of this talk
HPCC Staff and Students
Previous clusters
• History, Performance, usage
Patterns, and Experience
Motivation for Upgrades
• Compute Capacity Goals
• Related Considerations
Installation and Benchmarks
Conclusions and Q&A
AMD HPC User Forum — September 15-17, 2020
Staff members:
Alan Sill, PhD — Managing Director
Eric Rees, PhD — Assistant Managing Director
Chris Turner, PhD — Research Associate
Tom Brown, PhD — Research Associate
Amanda McConnell, BA — Administrative 

Business Assistant
Graduate students:
Misha Ahmadian, Graduate Research Assistant
These people provide and support the TTU HPCC resources
HPCC Staff and Students (Fall 2020)
Amy Wang, MSc — Programmer Analyst IV

(Lead System Administrator)
Nandini Ramanathan, MSc — Programmer

Analyst III
Sowmith Lakki-Reddy, MSc — System

Administrator III
Undergraduate students:
Arthur Jones, Student Assistant
Nhi Nguyen, Student Assistant
Travis Turner, Student Assistant
AMD HPC User Forum — September 15-17, 2020
Quanah I cluster (Commissioned March 2017)
4 racks, 243 nodes, 36 cores/node, 8748 total cores Intel Broadwell
100 Gbps non-blocking Omni-Path fabric. Benchmarked at 253 TF.
AMD HPC User Forum — September 15-17, 2020
Quanah II Cluster (As Upgraded Nov. 2017)
• 10 racks, 467 Dell™ C6320 nodes
- 36 CPU cores/node Intel Xeon E5-2695
v4 (two 18-core sockets per node)
- 192 GB of RAM per node
- 16,812 worker node cores total
- Compute power: ~1 Tflop/s per node
- Benchmarked at 485 TF
• Operating System:
- CentOS 7.4.1708, 64-bit, Kernel 3.10
• High Speed Fabric:
- Intel ™ OmniPath, 100Gbps/connection
- Non-blocking fat tree topology
- 12 core switches, 48 ports/switch
- 57.6 Tbit/s core throughput capacity
• Management/Control Network:
- Ethernet, 10 Gbps, sequential chained
switches, 36 ports per switch
AMD HPC User Forum — September 15-17, 2020
Uptime and Utilization - Previous Cluster (Quanah)
Quanah I Quanah II —>
AMD HPC User Forum — September 15-17, 2020
Job Sizes Patterns - Previous Cluster (Quanah)
1
10
100
1000
10000
100000
1 2-10 11-100 101-1000 1001+
Jobs in Range
0
100000
200000
300000
400000
500000
600000
1 2-10 11-100 101-1000 1001+
Slots Taken in Range
Typical usage pattern for jobs:
• Charts above show most recent month of
job activity
• Large number of small jobs (note log scale)
• Most jobs below 1000 cores
• Not unusual to see requests for several
thousand cores
Typical usage pattern for queue slots:
• Most cores consumed by jobs in the middle
(11-1000 cores/job) range
• Scheduling a job of more than 2000 cores
allocates ~1/8 of the cluster
• Some evidence of users self-limiting job
sizes to avoid long scheduling queue waits
AMD HPC User Forum — September 15-17, 2020
RedRaider Design Goals
1. Add at least 1 petaflops total computing capacity beyond existing Quanah cluster.
2. Fit within existing cooling capacity and recently expanded power limits, which
means that approximately 2/3rds of new power used by cluster must be removed
through direct liquid cooling to stay within room air cooling limits.
3. Coalesce operation of the existing Quanah and older Ivy Bridge and Community
Cluster nodes with the addition above, to be operated as a single cluster.
4. Streamline storage network utilizing LNet routers to connect all components to
expanded central storage based on 200 Gbps Mellanox HDR fabric.
5. Chosen path: Addition of 240 nodes (30,720 cores) of dual 64-core AMD
Rome processors + 20 GPU nodes with 2 Nvidia GPU accelerators per node.
This new cluster will eventually include the new AMD Rome CPU, NVidia GPU, previous 

Intel Broadwell cluster and other previous specialty queues operated as Slurm partitions.
AMD HPC User Forum — September 15-17, 2020
RedRaider cluster (Delivered July 2020)
CPUs: 256 physical cores per rack unit; 200 Gbps non-blocking HDR200 Infiniband fabric
GPUs: 2 NVidia V100s per two rack units; 100 Gbps HDR100 Infiniband to HDR200 core
AMD HPC User Forum — September 15-17, 2020
Why non-blocking HDR200?
• Previous experience with Quanah and earlier clusters shows simplicity of scheduling jobs
without having to schedule into islands in the fabric produces simpler scheduling and
allows a high degree of utilization of the cluster.
• Increase in density produces high demands on fabric in terms of bandwidth per core. 

This figure of merit is actually lower for the RedRaider Nocona CPUs than for the
previous Quanah cluster.
• Simple fat-tree non-blocking arrangements with multiple core-to-leaf links per switch
provide redundancy and resilience in the event of cable or connector failures.
• User community has sometimes asked for jobs of many thousands of cores.
• Compare and contrast to Expanse design: RedRaider uses full non-blocking fabric;
Expanse is non-blocking within racks with modest oversubscription between rack groups.
Overall cost at our scale is not that different.
• Given the density of our racks and relatively small size of the overall Infiniband fabric,
reaching full non-blocking did not add very much to the cost.
AMD HPC User Forum — September 15-17, 2020
RedRaider cluster initial installation
Front view Back view - CPU racks
40 GPU nodes: Dell R740, 2 GPUs/node, 256 GB main memory/node, air cooled
240 CPU nodes: Dell C6525, 2 Rome 7702’s w/ 512 GB memory/node, liquid cooled
AMD HPC User Forum — September 15-17, 2020
RedRaider cluster initial installation - close-ups
Secondary cooling line
installation under floor
Back view close-up of
cooling lines in CPU rack
Interior of example 1/2-U
C6525 cpu worker node
AMD HPC User Forum — September 15-17, 2020
RedRaider final installation w/ cold-aisle enclosure
Benchmark
(GPUs):
• 226 Tflops
• 20 nodes
• Efficiency:
80.6%
Benchmark
(CPUs):
• 804 Tflops
• 240 nodes
• Efficiency:
81.4%
Total Cluster
Performance
• 1030 Tflops
AMD HPC User Forum — September 15-17, 2020
Software Environment (in progress, starting deployment)
Operating System
• CentOS 8.1 *
Cluster Management Software
• OpenHPC 2.0
Infiniband Drivers
• Mellanox OFED 5.0-2.1.8.0 *
Cluster-Wide File System
• Lustre 2.12.5 *
BMC Firmware
• Dell iDRAC 4.10.10.10 *
Image and Node Provisioning
• Warewulf 3.9.0
• rpmbuild 4.14.2
Job Resource Manager (batch scheduler)
• Slurm 20.02.3-1
Package Build Environment
• Spack 0.15.4
Software Deployment Environment
• LMod 8.2.10
Other Conditions and Tools
• Single-instance Slurm DBD and job status area
(investigating shared-mount NVMEoF for job status)
• Dual NFS 2.3.3 (HA mode) for applications
• Gnu compilers made available through Spack and
LMod. Others (NVidia HPC-X, AOCC) also loadable
as alternatives through LMod.
• Cluster will also have Open OnDemand access.
* Had to fall back to previous version in each starred case above to get consistent deployable conditions
AMD HPC User Forum — September 15-17, 2020
Total Compute Capacity versus Fiscal Year: 2011 - 2021 (All Clusters)
0
500
1000
1500
2000
FY2011 2013 2015 2017 2019 2021
HPCC Total Theoretical Capacity By Cluster (Teraflops):
TTU-Owned and Researcher-Owned Systems
Campus Grid Janus, Weland Hrothgar Westmere
Hrothgar Ivy Bridge Quanah (public) Lonestar 4/5
RedRaider NoconaCPU (public) RedRaider Matador GPU RedRaider NoconaCPU (researchers)
Hrothgar CC Quanah HEP Realtime/Realtime2
AntaeusHEP Nepag Discfarm
105.0 105.1 109.9 132.2 150.6 152.4
417.5
641.0 629.0
581.0
1536.0
0
500
1000
1500
FY2011 2013 2015 2017 2019 2021
HPCC Total Usable Capacity (Teraflops, 80% of Theoretical Peak)
Hrothgar
+
Ivy
+
Ivy+
CC
+
Quanah
I
+
Quanah
II
+
RedRaider
(Gradual retirement of
Hrothgar Westmere)
Design goals for the RedRaider cluster:
• Add at least 1 PF of overall compute capacity
• Allow retirement of older Hrothgar cluster
• Merge operation of primary clusters
• Support GPU computing
Practical restrictions:
• < 300 kVA power usage
• Fit within 8 racks of available floor space
• Stay within existing cooling capacity
• Commission by early FY 2021
(Gradual retirement of
Hrothgar Westmere)
AMD HPC User Forum — September 15-17, 2020
Questions for this group
We look to this forum to help with the following community topics and issues:
• Application building
• Spack, EasyBuild recipes
• Compatible compilers and libraries
• Benchmarking and Workload Optimization
• Processor Settings
• NUMA and Memory Settings
• I/O and PCIe Settings
• On-the-fly versus fixed permanent choice settings
• Safe conditions for liquid vs. air-cooled operation
AMD HPC User Forum — September 15-17, 2020
Conclusions
We have designed, installed, and commissioned a cluster that delivers the desired
performance level of >1 PF in less than 7 racks, with space left over for future expansion.
The cluster was designed to allow an increase in the average job size for large jobs while
still delivering good performance for small and medium-size jobs, with simple scheduling
due to non-blocking fat tree Infiniband.
Overall, adding new cluster based on AMD Rome CPUs and NVidia GPUs allowed us to
put more than twice as much computing capacity in 2/3rds of the space compared to our
existing cluster and stay within our desired power and cooling profile. Since we also retain
and still run the previous cluster, the overall capacity has nearly tripled as a result of this
addition.
Future expansions are possible.
Design installation-commissioning-red raider-cluster-ttu
AMD HPC User Forum — September 15-17, 2020
Backup slides
AMD HPC User Forum — September 15-17, 2020
Primary Cluster Utilization
Hrothgar cluster averaged about 80% utilization before addition of Quanah phase I,
commissioned in April 2017.
Addition of Quanah I roughly doubled the total number of utilized cores to ~16,000, with a
total number of available cores of ~18,000 across all of HPCC.
Addition of Quanah II in November 2017 required decreasing size of Hrothgar to make
everything fit, but still led to over 20,000 cores in regular use. Power limits and campus
research network bandwidth restrictions prevented running all of former Hrothgar.
Power limits in ESB solved with new UPS and generator upgrade in FY 2019. (Campus
research network limits will be addressed in upcoming upgrades to be discussed here.)
Quanah utilization has been extremely good, beginning with Quanah I in early 2017 and
extending on to current state with nearly 95% calendar uptime and near-100% usage.
Former Hrothgar systems exceeded expected end of life (oldest components 8 years old).
AMD HPC User Forum — September 15-17, 2020
Primary HPCC Clusters With RedRaider Expansion
Quanah Cluster - Omni CPU
• 16812 Cores (467 Nodes, 36 Cores/Node, 192 GB/
Node), Intel Xeon E5-2695 v4 @ 2.1GHz
• Dell C6300 4-node 2U enclosures
• Benchmarked 486 Teraflop/sec (464/467 nodes)
• Omni-Path non-blocking 100 Gbits/Second fabric for
MPI communication & storage
• 10 Gb/s Ethernet network for cluster management
• Total power drawn by Quanah cluster: ~ 200 kW
RedRaider Cluster - Nocona CPU
• 30720 Cores (240 Nodes, 128 Cores/Node, 512 GB/
Node), AMD Rome 7702 @ 2.0GHz
• Dell C6525 4-node 2U enclosures, direct liquid
cooled CPUs
• Benchmarked 804 Teraflop/sec (240 nodes)
• Mellanox non-blocking 200 Gbits/Second fabric for
MPI communication & storage
• 25 Gb/s Ethernet network for cluster management
• Total power drawn: ~ 150 kW
RedRaider Cluster - Matador GPU
• 40 NVidia V100, 640 cpu + 25,600 tensor + 204,800
CUDA cores total
• Dell R740 2U host nodes, 2 GPUs/node
• Benchmarked 226 Teraflop/sec (20 nodes)
• Mellanox non-blocking 100 Gbits/Second fabric for
MPI communication & storage
• 25 Gb/s Ethernet network for cluster management
• Total power drawn: ~ 21 kW
• Total power for RedRaider cluster including LNet
routers: ~180 kW

More Related Content

PDF
POWER9 for AI & HPC
PPTX
Ac922 watson 180208 v1
PDF
POWER10 innovations for HPC
PDF
Ac922 cdac webinar
PDF
A Fresh Look at HPC from Huawei Enterprise
PDF
DDN: Protecting Your Data, Protecting Your Hardware
PDF
Heterogeneous Computing : The Future of Systems
PDF
Hardware & Software Platforms for HPC, AI and ML
POWER9 for AI & HPC
Ac922 watson 180208 v1
POWER10 innovations for HPC
Ac922 cdac webinar
A Fresh Look at HPC from Huawei Enterprise
DDN: Protecting Your Data, Protecting Your Hardware
Heterogeneous Computing : The Future of Systems
Hardware & Software Platforms for HPC, AI and ML

What's hot (20)

PDF
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
PDF
Using a Field Programmable Gate Array to Accelerate Application Performance
PDF
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
PDF
NNSA Explorations: ARM for Supercomputing
PPTX
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
PDF
@IBM Power roadmap 8
PDF
Learning from ZFS to Scale Storage on and under Containers
PDF
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
PDF
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
PDF
POWER9 AC922 Newell System - HPC & AI
PDF
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
PDF
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
PDF
Ucx an open source framework for hpc network ap is and beyond
PPT
OpenPOWER Webinar
PDF
dCUDA: Distributed GPU Computing with Hardware Overlap
PDF
IBM HPC Transformation with AI
PDF
Summit workshop thompto
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
PDF
Integrating data stored in rdbms and hadoop
PDF
TAU E4S ON OpenPOWER /POWER9 platform
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator S...
Using a Field Programmable Gate Array to Accelerate Application Performance
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
NNSA Explorations: ARM for Supercomputing
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
@IBM Power roadmap 8
Learning from ZFS to Scale Storage on and under Containers
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
POWER9 AC922 Newell System - HPC & AI
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
IMCSummit 2015 - Day 1 Developer Session - The Science and Engineering Behind...
Ucx an open source framework for hpc network ap is and beyond
OpenPOWER Webinar
dCUDA: Distributed GPU Computing with Hardware Overlap
IBM HPC Transformation with AI
Summit workshop thompto
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Integrating data stored in rdbms and hadoop
TAU E4S ON OpenPOWER /POWER9 platform
Ad

Similar to Design installation-commissioning-red raider-cluster-ttu (20)

PDF
Workshop actualización SVG CESGA 2012
PDF
An Update on Arm HPC
PPTX
PDF
PDF
PDF
PDF
PDF
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
PDF
From the Archives: Future of Supercomputing at Altparty 2009
PDF
The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem
PDF
PDF
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
PDF
SGI HPC Update for June 2013
PPT
NWU and HPC
PDF
High Performance Computing in a Nutshell
PDF
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
ODP
Systems Support for Many Task Computing
PDF
Bullx HPC eXtreme computing technology
PDF
R&D work on pre exascale HPC systems
PDF
AMD It's Time to ROC
Workshop actualización SVG CESGA 2012
An Update on Arm HPC
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
From the Archives: Future of Supercomputing at Altparty 2009
The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem
"Performance Evaluation, Scalability Analysis, and Optimization Tuning of A...
SGI HPC Update for June 2013
NWU and HPC
High Performance Computing in a Nutshell
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
Systems Support for Many Task Computing
Bullx HPC eXtreme computing technology
R&D work on pre exascale HPC systems
AMD It's Time to ROC
Ad

More from Alan Sill (18)

PDF
Return on Investment for Research Computing and Data Support 2021-03-03
PDF
OGF standards for cloud computing
PDF
OGF Introductory Overview - OGF 44 at EGI Conference 2015
PDF
NSF CAC Cloud Interoperability Testbed Projects
PDF
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
PDF
Cloud Testbeds for Standards Development and Innovation
PDF
OGF Introductory Overview - FAS* 2014
PDF
Cloud Standards in the Real World: Cloud Standards Testing for Developers
PDF
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
PDF
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
PDF
OCCI-OS tutorial
PDF
OGF Standards Overview - Globus World 2013
PDF
Cloud-Oriented Federated Projects and Testbeds: Current Status in the US and...
PDF
OGF Standards Overview - Cloudscape V
PDF
OGF Standards Overview - ITU-T JCA Cloud
PDF
SAJACC Working Group Recommendations to NIST, Feb. 12 2013
PDF
SAJACC WG Report Summary and Conclusions Jan 2013
PDF
Requirement 5: Federated Community Cloud - Sill
Return on Investment for Research Computing and Data Support 2021-03-03
OGF standards for cloud computing
OGF Introductory Overview - OGF 44 at EGI Conference 2015
NSF CAC Cloud Interoperability Testbed Projects
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
Cloud Testbeds for Standards Development and Innovation
OGF Introductory Overview - FAS* 2014
Cloud Standards in the Real World: Cloud Standards Testing for Developers
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
OCCI-OS tutorial
OGF Standards Overview - Globus World 2013
Cloud-Oriented Federated Projects and Testbeds: Current Status in the US and...
OGF Standards Overview - Cloudscape V
OGF Standards Overview - ITU-T JCA Cloud
SAJACC Working Group Recommendations to NIST, Feb. 12 2013
SAJACC WG Report Summary and Conclusions Jan 2013
Requirement 5: Federated Community Cloud - Sill

Recently uploaded (20)

PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction

Design installation-commissioning-red raider-cluster-ttu

  • 1. Alan F. Sill, PhD Managing Director, High Performance Computing Center, Texas Tech University On behalf of the TTU IT Division and HPCC Staff AMD HPC User Forum — September 15-17, 2020 Design Considerations, Installation, and Commissioning of the RedRaider Cluster at the Texas Tech University High Performance Computing Center High Performance Computing Center
  • 2. AMD HPC User Forum — September 15-17, 2020 Outline of this talk HPCC Staff and Students Previous clusters • History, Performance, usage Patterns, and Experience Motivation for Upgrades • Compute Capacity Goals • Related Considerations Installation and Benchmarks Conclusions and Q&A
  • 3. AMD HPC User Forum — September 15-17, 2020 Staff members: Alan Sill, PhD — Managing Director Eric Rees, PhD — Assistant Managing Director Chris Turner, PhD — Research Associate Tom Brown, PhD — Research Associate Amanda McConnell, BA — Administrative 
 Business Assistant Graduate students: Misha Ahmadian, Graduate Research Assistant These people provide and support the TTU HPCC resources HPCC Staff and Students (Fall 2020) Amy Wang, MSc — Programmer Analyst IV
 (Lead System Administrator) Nandini Ramanathan, MSc — Programmer
 Analyst III Sowmith Lakki-Reddy, MSc — System
 Administrator III Undergraduate students: Arthur Jones, Student Assistant Nhi Nguyen, Student Assistant Travis Turner, Student Assistant
  • 4. AMD HPC User Forum — September 15-17, 2020 Quanah I cluster (Commissioned March 2017) 4 racks, 243 nodes, 36 cores/node, 8748 total cores Intel Broadwell 100 Gbps non-blocking Omni-Path fabric. Benchmarked at 253 TF.
  • 5. AMD HPC User Forum — September 15-17, 2020 Quanah II Cluster (As Upgraded Nov. 2017) • 10 racks, 467 Dell™ C6320 nodes - 36 CPU cores/node Intel Xeon E5-2695 v4 (two 18-core sockets per node) - 192 GB of RAM per node - 16,812 worker node cores total - Compute power: ~1 Tflop/s per node - Benchmarked at 485 TF • Operating System: - CentOS 7.4.1708, 64-bit, Kernel 3.10 • High Speed Fabric: - Intel ™ OmniPath, 100Gbps/connection - Non-blocking fat tree topology - 12 core switches, 48 ports/switch - 57.6 Tbit/s core throughput capacity • Management/Control Network: - Ethernet, 10 Gbps, sequential chained switches, 36 ports per switch
  • 6. AMD HPC User Forum — September 15-17, 2020 Uptime and Utilization - Previous Cluster (Quanah) Quanah I Quanah II —>
  • 7. AMD HPC User Forum — September 15-17, 2020 Job Sizes Patterns - Previous Cluster (Quanah) 1 10 100 1000 10000 100000 1 2-10 11-100 101-1000 1001+ Jobs in Range 0 100000 200000 300000 400000 500000 600000 1 2-10 11-100 101-1000 1001+ Slots Taken in Range Typical usage pattern for jobs: • Charts above show most recent month of job activity • Large number of small jobs (note log scale) • Most jobs below 1000 cores • Not unusual to see requests for several thousand cores Typical usage pattern for queue slots: • Most cores consumed by jobs in the middle (11-1000 cores/job) range • Scheduling a job of more than 2000 cores allocates ~1/8 of the cluster • Some evidence of users self-limiting job sizes to avoid long scheduling queue waits
  • 8. AMD HPC User Forum — September 15-17, 2020 RedRaider Design Goals 1. Add at least 1 petaflops total computing capacity beyond existing Quanah cluster. 2. Fit within existing cooling capacity and recently expanded power limits, which means that approximately 2/3rds of new power used by cluster must be removed through direct liquid cooling to stay within room air cooling limits. 3. Coalesce operation of the existing Quanah and older Ivy Bridge and Community Cluster nodes with the addition above, to be operated as a single cluster. 4. Streamline storage network utilizing LNet routers to connect all components to expanded central storage based on 200 Gbps Mellanox HDR fabric. 5. Chosen path: Addition of 240 nodes (30,720 cores) of dual 64-core AMD Rome processors + 20 GPU nodes with 2 Nvidia GPU accelerators per node. This new cluster will eventually include the new AMD Rome CPU, NVidia GPU, previous 
 Intel Broadwell cluster and other previous specialty queues operated as Slurm partitions.
  • 9. AMD HPC User Forum — September 15-17, 2020 RedRaider cluster (Delivered July 2020) CPUs: 256 physical cores per rack unit; 200 Gbps non-blocking HDR200 Infiniband fabric GPUs: 2 NVidia V100s per two rack units; 100 Gbps HDR100 Infiniband to HDR200 core
  • 10. AMD HPC User Forum — September 15-17, 2020 Why non-blocking HDR200? • Previous experience with Quanah and earlier clusters shows simplicity of scheduling jobs without having to schedule into islands in the fabric produces simpler scheduling and allows a high degree of utilization of the cluster. • Increase in density produces high demands on fabric in terms of bandwidth per core. 
 This figure of merit is actually lower for the RedRaider Nocona CPUs than for the previous Quanah cluster. • Simple fat-tree non-blocking arrangements with multiple core-to-leaf links per switch provide redundancy and resilience in the event of cable or connector failures. • User community has sometimes asked for jobs of many thousands of cores. • Compare and contrast to Expanse design: RedRaider uses full non-blocking fabric; Expanse is non-blocking within racks with modest oversubscription between rack groups. Overall cost at our scale is not that different. • Given the density of our racks and relatively small size of the overall Infiniband fabric, reaching full non-blocking did not add very much to the cost.
  • 11. AMD HPC User Forum — September 15-17, 2020 RedRaider cluster initial installation Front view Back view - CPU racks 40 GPU nodes: Dell R740, 2 GPUs/node, 256 GB main memory/node, air cooled 240 CPU nodes: Dell C6525, 2 Rome 7702’s w/ 512 GB memory/node, liquid cooled
  • 12. AMD HPC User Forum — September 15-17, 2020 RedRaider cluster initial installation - close-ups Secondary cooling line installation under floor Back view close-up of cooling lines in CPU rack Interior of example 1/2-U C6525 cpu worker node
  • 13. AMD HPC User Forum — September 15-17, 2020 RedRaider final installation w/ cold-aisle enclosure Benchmark (GPUs): • 226 Tflops • 20 nodes • Efficiency: 80.6% Benchmark (CPUs): • 804 Tflops • 240 nodes • Efficiency: 81.4% Total Cluster Performance • 1030 Tflops
  • 14. AMD HPC User Forum — September 15-17, 2020 Software Environment (in progress, starting deployment) Operating System • CentOS 8.1 * Cluster Management Software • OpenHPC 2.0 Infiniband Drivers • Mellanox OFED 5.0-2.1.8.0 * Cluster-Wide File System • Lustre 2.12.5 * BMC Firmware • Dell iDRAC 4.10.10.10 * Image and Node Provisioning • Warewulf 3.9.0 • rpmbuild 4.14.2 Job Resource Manager (batch scheduler) • Slurm 20.02.3-1 Package Build Environment • Spack 0.15.4 Software Deployment Environment • LMod 8.2.10 Other Conditions and Tools • Single-instance Slurm DBD and job status area (investigating shared-mount NVMEoF for job status) • Dual NFS 2.3.3 (HA mode) for applications • Gnu compilers made available through Spack and LMod. Others (NVidia HPC-X, AOCC) also loadable as alternatives through LMod. • Cluster will also have Open OnDemand access. * Had to fall back to previous version in each starred case above to get consistent deployable conditions
  • 15. AMD HPC User Forum — September 15-17, 2020 Total Compute Capacity versus Fiscal Year: 2011 - 2021 (All Clusters) 0 500 1000 1500 2000 FY2011 2013 2015 2017 2019 2021 HPCC Total Theoretical Capacity By Cluster (Teraflops): TTU-Owned and Researcher-Owned Systems Campus Grid Janus, Weland Hrothgar Westmere Hrothgar Ivy Bridge Quanah (public) Lonestar 4/5 RedRaider NoconaCPU (public) RedRaider Matador GPU RedRaider NoconaCPU (researchers) Hrothgar CC Quanah HEP Realtime/Realtime2 AntaeusHEP Nepag Discfarm 105.0 105.1 109.9 132.2 150.6 152.4 417.5 641.0 629.0 581.0 1536.0 0 500 1000 1500 FY2011 2013 2015 2017 2019 2021 HPCC Total Usable Capacity (Teraflops, 80% of Theoretical Peak) Hrothgar + Ivy + Ivy+ CC + Quanah I + Quanah II + RedRaider (Gradual retirement of Hrothgar Westmere) Design goals for the RedRaider cluster: • Add at least 1 PF of overall compute capacity • Allow retirement of older Hrothgar cluster • Merge operation of primary clusters • Support GPU computing Practical restrictions: • < 300 kVA power usage • Fit within 8 racks of available floor space • Stay within existing cooling capacity • Commission by early FY 2021 (Gradual retirement of Hrothgar Westmere)
  • 16. AMD HPC User Forum — September 15-17, 2020 Questions for this group We look to this forum to help with the following community topics and issues: • Application building • Spack, EasyBuild recipes • Compatible compilers and libraries • Benchmarking and Workload Optimization • Processor Settings • NUMA and Memory Settings • I/O and PCIe Settings • On-the-fly versus fixed permanent choice settings • Safe conditions for liquid vs. air-cooled operation
  • 17. AMD HPC User Forum — September 15-17, 2020 Conclusions We have designed, installed, and commissioned a cluster that delivers the desired performance level of >1 PF in less than 7 racks, with space left over for future expansion. The cluster was designed to allow an increase in the average job size for large jobs while still delivering good performance for small and medium-size jobs, with simple scheduling due to non-blocking fat tree Infiniband. Overall, adding new cluster based on AMD Rome CPUs and NVidia GPUs allowed us to put more than twice as much computing capacity in 2/3rds of the space compared to our existing cluster and stay within our desired power and cooling profile. Since we also retain and still run the previous cluster, the overall capacity has nearly tripled as a result of this addition. Future expansions are possible.
  • 19. AMD HPC User Forum — September 15-17, 2020 Backup slides
  • 20. AMD HPC User Forum — September 15-17, 2020 Primary Cluster Utilization Hrothgar cluster averaged about 80% utilization before addition of Quanah phase I, commissioned in April 2017. Addition of Quanah I roughly doubled the total number of utilized cores to ~16,000, with a total number of available cores of ~18,000 across all of HPCC. Addition of Quanah II in November 2017 required decreasing size of Hrothgar to make everything fit, but still led to over 20,000 cores in regular use. Power limits and campus research network bandwidth restrictions prevented running all of former Hrothgar. Power limits in ESB solved with new UPS and generator upgrade in FY 2019. (Campus research network limits will be addressed in upcoming upgrades to be discussed here.) Quanah utilization has been extremely good, beginning with Quanah I in early 2017 and extending on to current state with nearly 95% calendar uptime and near-100% usage. Former Hrothgar systems exceeded expected end of life (oldest components 8 years old).
  • 21. AMD HPC User Forum — September 15-17, 2020 Primary HPCC Clusters With RedRaider Expansion Quanah Cluster - Omni CPU • 16812 Cores (467 Nodes, 36 Cores/Node, 192 GB/ Node), Intel Xeon E5-2695 v4 @ 2.1GHz • Dell C6300 4-node 2U enclosures • Benchmarked 486 Teraflop/sec (464/467 nodes) • Omni-Path non-blocking 100 Gbits/Second fabric for MPI communication & storage • 10 Gb/s Ethernet network for cluster management • Total power drawn by Quanah cluster: ~ 200 kW RedRaider Cluster - Nocona CPU • 30720 Cores (240 Nodes, 128 Cores/Node, 512 GB/ Node), AMD Rome 7702 @ 2.0GHz • Dell C6525 4-node 2U enclosures, direct liquid cooled CPUs • Benchmarked 804 Teraflop/sec (240 nodes) • Mellanox non-blocking 200 Gbits/Second fabric for MPI communication & storage • 25 Gb/s Ethernet network for cluster management • Total power drawn: ~ 150 kW RedRaider Cluster - Matador GPU • 40 NVidia V100, 640 cpu + 25,600 tensor + 204,800 CUDA cores total • Dell R740 2U host nodes, 2 GPUs/node • Benchmarked 226 Teraflop/sec (20 nodes) • Mellanox non-blocking 100 Gbits/Second fabric for MPI communication & storage • 25 Gb/s Ethernet network for cluster management • Total power drawn: ~ 21 kW • Total power for RedRaider cluster including LNet routers: ~180 kW