SlideShare a Scribd company logo
Rack Cluster Deployment for SDSC
AI Supercomputer
Shawn Strande, San Diego Supercomputer Center
Sree Ganesan, Habana Labs
Thomas Jorgensen, Supermicro
11/10/2021
© 2021 Supermicro
SDSC: Thirty-Five Years of Excellence in High
Performance and Data Intensive Computing
• Established as a national supercomputer
resource center in 1985 by NSF
• Serves the national, UC San Diego, UC
System, and State of California research
communities.
• Supports research in all domains, incl. life
sciences, physics, materials science, social
sciences, and others.
• Design, deployment, and operations of
large-scale, innovative supercomputer and
data resources.
• Operates a state-of-the-art data center on
the UC San Diego campus
• Strong connections to the local tech sector
NSF Award 2005369
PIs: Amit Majumdar (PI), Rommie Amaro,
Javier Duarte, Mai Nguyen, Bob Sinkovits
SDSC/UCSD
Me Shawn Strande, SDSC Deputy Director and Voyager
Project Manager
Voyager deployment is underway now at SDSC!
• Supermicro handover to SDSC
complete
• SDSC performing systems and
application installs
• Early access in Jan ’22
• Formal operations est. Feb ’22
• 3 years as a focused testbed
• 2 years with wider access
offered through NSF
allocations
• Opportunities for access and
collaboration with and by
industry
Voyager System and Software
• 42x Training Nodes each with 8 Habana Gaudi processors (336 total); 3rd
Generation Intel® Xeon® Scalable processors; 6 TB node-local NVMe
• 2x Inference Nodes each with 8x Habana Goya processors (16 total); 2nd
Generation Intel® Xeon® Scalable processors; 3TB NVMe node—local storage
• 36x Intel x86 two-socket compute nodes
• Gaudi network: 400GbE Arista - RDMA over Converged Ethernet; on-chip
• 2PB Storage system – potential to experiment with various parallel file systems
(Ceph, Lustre); connectivity to compute via 25GbE
• 200 TB HFS; connectivity to compute via 25GbE
• DL frameworks - TensorFlow, PyTorch + Habana Synapse AI software development
tools
• https://habana.ai/ (white papers on Gaudi and Goya and other information)
Science application characteristics
Application domain AI techniques ML frameworks
Training vs
inference
Astronomy NN TensorFlow Mostly T
Atmospheric science NN TensorFlow Mostly T
Chemistry, Biophysics NN Custom, PyTorch Both T & I
Chemistry, Materials NN Custom, PyTorch Mostly I
Computer science Reinforcement learning, RNN TensorFlow Mostly T
Human microbiome mmvec, GAN TensorFlow, PyTorch Mostly T
Particle physics CNN, GAN, GNN, RNN, NN, VAE TensorFlow, PyTorch Both T & I
Population genetics CNN TensorFlow Mostly T
Satellite image analysis
U-Net, CNN, GAN, cluster
analysis, PCA
TensorFlow Mostly T
Systems biology CNN, SVM TensorFlow, PyTorch Both T & I
Key: CNN=Convolutional Neural
Network, GAN=Generative
Adversarial Network,
GNN=Graph Neural Network,
I=Inference; NN=Dense Neural
Network, PCA=Principal
Components Analysis,
RNN=Recurrent Neural Network,
SVM=Support Vector
Machine, T=Training;
VAE=Variational Auto-
Encoder
High energy physics application – Javier Duarte, UCSD
Data processing pipeline for Higgs boson to bottom quark event
processing can benefit from Voyager's inference processors to filter
data coming out of detector and Voyager's training processors in
processing data that passes the high-level trigger. Credit: Javier Duarte
• LHC at CERN generates massive amount of data
• More than 99% of events (responsible for
discovery of Higgs boson) are discarded
immediately
• Remaining petabytes of data are further analyzed
• Duarte and collaborators use ML for triggering,
event reconstruction and data analysis
• For triggering, ML improves signal selection
efficiency
• For data analysis, various ML algorithms
(including dense, convolutional, recurrent, and
graph neural networks) are used to classify each
event as signal or background and to identify
particle signatures (such as Higgs boson decay
candidates)
• GNNs on Gaudi to improve particle identification
and event reconstruction
• Goya to test software-based triggering step of the
data processing pipeline
Satellite image analysis – Mai Nguyen, Ilkay Altintas and
collaborators, SDSC
• Applying DL to image analysis, disaster management, NLP, others
• A Voyager project - DL algorithms on satellite images to determine land covers across different areas in the
context of wildfire management
• WIFIRE: WIFIRE HOME | WIFIRE (ucsd.edu)
• Goal is to combine AI models with fire science models and fire science expertise
• Study and simulate fire behavior under different conditions
• Algorithms developed on the TF framework will be ported to Voyager
• Easy transition of DL models to Habana expected
Satellite Imagery
Tiles
Crop Images
Pan-Sharpen Reproject Create RGB Downsample
Data Preparation
Feature
Extraction
Cluster PCA Sort
Machine Learning
Ordered
Clusters
Histogram and Map
CNN
Satellite image processing pipeline, showing data preparation and
machine learning steps. ML model training and inference will be
accomplished using Voyager's processers.
• For land-cover map generation, U-Nets and
CNN will be trained on Gaudi processors for
segmentation and classification
• DL models are used to extract features from
satellite images for
o Region-of-interest detection to locate
schools in rural areas
o Demographic analysis to understand
organization of a city and refugee camp
formation
Our partnership with Supermicro and Intel Habana Labs is
allowing us to deploy a cutting-edge AI supercomputer for
research
• Ability and willingness to engage in a project with innovative
technology for advanced computing and AI in science and
engineering research
• Deep technical collaboration with Supermicro and Intel Habana Labs
in advanced AI processors, high performance networking, and
systems integration
• Rigorous pre-delivery testing for reliability and performance
• Onsite installation, 5-years support
A little about Habana
10
• Founded in 2016 to develop purpose-built AI processors
• Launched inference processor in 2018, training processor in 2019
• Acquired by Intel in late-2019
• Fully leveraging Intel’s scale, resources and infrastructure
• Accessing Intel ecosystem and customer partnerships
• Gaudi AI processor is now available on AWS; DL1 is the first non-GPU instance on AWS
• Continuing with our mission to build AI processors optimized for data center and cloud
performance and efficiency
&
Rack Cluster Deployment for SDSC Supercomputer
Gaudi: architected for efficiency
12
Designed to optimize AI performance, delivering higher
efficiency than traditional CPUs & GPUs
• Heterogeneous compute architecture
- Configurable centralized GEMM engine (MME)
- Fully programmable, AI-customized Tensor Processing Cores
• Software-managed memory architecture
- 32 GB of HBM2 memory
• Natively integrated 10 x 100Gb Ethernet RoCE for scaling
13
Designed for flexible and easy model migration
Ease of use
Integrated with TensorFlow and
PyTorch; minimal code changes
to get started
 SynapseAI maps model
topology onto Gaudi devices
Developers can enjoy the same
abstraction they are accustomed
to today
Customization
SynapseAI TPC SDK
facilitates development of
custom kernels
Developers can customize
models to extract best
performance
32GB HBM2 memories similar
to GPUs, so existing DL models
will fit into Gaudi memory
Developers can spend less
effort to port their models to
Gaudi
Balanced compute
& memory
Designed for Scaling Efficiency
14
The industry’s FIRST:
Native integration of 10 x 100 Gigabit Ethernet RoCE ports onto every Gaudi
• Eliminates network bottlenecks
• Standard Ethernet inside the server and across nodes
• Eliminates lock-in with proprietary interfaces
• Lowers total system cost and power by reducing discrete components
15
15
Scaling Within A Gaudi Server
• 8 Gaudi OCP OAM cards
• 24 x 100GbE RDMA RoCEfor
scale-out
• Non-blocking, all-2-all internal
interconnect across Gaudi AI
processors
• Separate PCIe ports for
external HostCPUtraffic
Example of Integrated Server with eight Gaudi AI processors, two Xeon CPU and
multiple Ethernet Interfaces
16
16
Rack And Pod Level Scaling
Easily build rack and pod- scale training systems with off-the-shelf
standard ethernet switches
Example of rack configuration with four
Gaudi servers (eight Gaudi processors
per server) connected to a single
Ethernet switch
SynapseAI® Software Suite:
designed for performance and ease of use
17
Driving end-user efficiency for
model build and migration
• Train deep learning models on Gaudi with
minimal code changes
• Integrated with TensorFlow & PyTorch
• Habana Developer Site & GitHub
• Support with reference models, kernel
libraries, documentation and “how tos”
• Advanced users can write their own custom
kernels
Graph Compiler
Habana Communication Libraries
Habana Kernel
Library
Customer
Kernel Library
User Mode Driver
Kernel Mode Driver
Debugging
&
Profiling
Tools
T
PC
Programming
Tools
Framework Integration Layer
import tensorflow as tf
from TensorFlow.common.library_loader import load_habana_module
load_habana_module()
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(10),
])
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=128)
model.evaluate(x_test, y_test)
Load the Habana libraries needed to
use Gaudi aka HPU device.
Once loaded, the HPU device is
registered in TensorFlow and
prioritized over CPU.
When an Op is available for both CPU
and HPU, the Op is assigned to the
HPU.
When an Op is not supported on
HPU, it runs on the CPU
Getting Started with TensorFlow on Gaudi
Habana Developer Platform
Developer.Habana.Ai
SDSC Voyager Supercomputer powered by Supermicro
Habana AI processors
• Supermicro X12 8-Gaudi Server
powering Voyager
• 16 Goya inference processors in
8-card server from Supermicro
20
SMC Executive Summary
11/10/2021 Better Faster Greener™ © 2021 Supermicro
21
• Founded and Headquartered in San Jose, USA since 1993
• Top-ranked Server & Storage Solutions Provider Worldwide
• Industry’s Broadest Range of Application-Optimized Products
• Product Time-to-Market Leadership for Every Refresh Cycle
• The Best Product Performance Per-watt/Per-sq. ft./Value
• Green Computing Leader Offering the Lowest TCO and TCE
Charles Liang, CEO/Founder, has been
the chief designer and architect of
Supermicro – the fastest growing server
hardware solutions company in USA for
the past 27 years.
Supermicro Habana Gaudi AI Training Solution
Better Faster Greener™ © 2021 Supermicro
22
AI Training Accelerator
Unified software stack
Optimized for AI
models & workloads
Software Solution
X12 Gaudi AI System
+ SuperCloud Composer
X12 Gaudi AI Training System Overview
Better Faster Greener™ © 2021 Supermicro
23
System Front View
SYS-420GH-TNGR
System Rear View
Supermicro Confidential
Specifications
CPU – Dual Socket
Dual Intel® Xeon® Scalable Processors (Ice Lake)
TDP up to 270W
Memory – 32 DIMM Slots
32 DIMMs, up to 8TB Registered ECC DDR4
3200MHz SDRAM
Drives – 4 Hot-Swap Bays
4x 2.5” SAS/SATA/NVMe Hybrid
Expansion
1x PCIe Gen 4 x16 (FHHL)
2x AIOM PCIe Gen 4 x16
Networking
1x 10GbE BaseT
6x 400Gb QSFP-DD
Power Supply
4x 3000W redundant, Titanium Level
• Key Features
• 4U System for 8x Habana Gaudi HL-205 AI Processors
• Purpose Built for AI/Deep Learning Training
• Lower system cost with build-in 100GbE Ethernet ports
• 24 x 100GbE RDMA (6 QSFP-DDs) for scale-out
Up to 40% better price performance than existing AI Training Solutions
Purpose-designed for data center AI Training efficiency
• Cost-efficient AI Training
• Usability to ease model migration
• Hardware and software architected for scalability
A new class of AI Training: X12 Gaudi AI System
24 Better Faster Greener™ © 2021 Supermicro
11/10/2021 Supermicro Confidential
AI training workloads targeted: Computer Vision, NLP and Recommendation
SuperCloud Composer: Single Pane of
Glass Management – Gaudi System, storage and
networking integration
Better Faster Greener™ © 2021 Supermicro
25
Solution Brief and Rack Level Reference Design Available
Success Story: SDSC’s Voyager Research Program
• Support research conducted across range of science and engineering domains
- Astronomy, climate sciences, chemistry and particle physics
• 336 Gaudi Training accelerators with native RoCE scaling
DISCLAIMER
Super Micro Computer, Inc. may make changes to specifications and product descriptions at any time, without notice. The
information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions
and typographical errors. Any performance tests and ratings are measured using systems that reflect the approximate
performance of Super Micro Computer, Inc. products as measured by those tests. Any differences in software or hardware
configuration may affect actual performance, and Super Micro Computer, Inc. does not control the design or implementation of
third party benchmarks or websites referenced in this document. The information contained herein is subject to change and may
be rendered inaccurate for many reasons, including but not limited to any changes in product and/or roadmap, component and
hardware revision changes, new model and/or product releases, software changes, firmware changes, or the like. Super Micro
Computer, Inc. assumes no obligation to update or otherwise correct or revise this information.
SUPER MICRO COMPUTER, INC. MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE
CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT
MAY APPEAR IN THIS INFORMATION.
SUPER MICRO COMPUTER, INC. SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL SUPER MICRO COMPUTER, INC. BE LIABLE TO ANY
PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF
ANY INFORMATION CONTAINED HEREIN, EVEN IF SUPER MICRO COMPUTER, Inc. IS EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2021 Super Micro Computer, Inc. All rights reserved.
26 © 2021 Supermicro
www.supermicro.com
Thank You

More Related Content

PDF
Supermicro X12 Performance Update
PPTX
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
PPTX
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
PPTX
Accelerating Innovation from Edge to Cloud
PDF
Fujitsu World Tour 2017 - Compute Platform For The Digital World
PPTX
Consumption Based On-Demand Private Cloud in a Box
PDF
Intel xeon-scalable-processors-overview
PDF
GTC 2022 Keynote
Supermicro X12 Performance Update
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
High Performance Object Storage in 30 Minutes with Supermicro and MinIO
Accelerating Innovation from Edge to Cloud
Fujitsu World Tour 2017 - Compute Platform For The Digital World
Consumption Based On-Demand Private Cloud in a Box
Intel xeon-scalable-processors-overview
GTC 2022 Keynote

What's hot (20)

PPTX
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
PPTX
Simplify Data Management and Go Green with Supermicro & Qumulo
PDF
Artificial intelligence on the Edge
PDF
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
PDF
Summit workshop thompto
PDF
Transparent Hardware Acceleration for Deep Learning
PDF
MIT's experience on OpenPOWER/POWER 9 platform
PDF
OpenPOWER/POWER9 Webinar from MIT and IBM
PDF
SUPERMICRO Innovative Computing Architecture
PDF
SGI HPC Update for June 2013
PDF
Harnessing the virtual realm for successful real world artificial intelligence
PDF
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
PDF
White Paper - CEVA-XM4 Intelligent Vision Processor
PDF
Bullx HPC eXtreme computing cluster references
PDF
POWER10 innovations for HPC
PDF
SGI HPC DAY 2011 Kiev
PDF
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
PDF
Ac922 cdac webinar
PDF
Cloud and Grid Integration OW2 Conference Nov10
 
PDF
DDN: Protecting Your Data, Protecting Your Hardware
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Simplify Data Management and Go Green with Supermicro & Qumulo
Artificial intelligence on the Edge
LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March...
Summit workshop thompto
Transparent Hardware Acceleration for Deep Learning
MIT's experience on OpenPOWER/POWER 9 platform
OpenPOWER/POWER9 Webinar from MIT and IBM
SUPERMICRO Innovative Computing Architecture
SGI HPC Update for June 2013
Harnessing the virtual realm for successful real world artificial intelligence
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
White Paper - CEVA-XM4 Intelligent Vision Processor
Bullx HPC eXtreme computing cluster references
POWER10 innovations for HPC
SGI HPC DAY 2011 Kiev
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Ac922 cdac webinar
Cloud and Grid Integration OW2 Conference Nov10
 
DDN: Protecting Your Data, Protecting Your Hardware
Ad

Similar to Rack Cluster Deployment for SDSC Supercomputer (20)

PDF
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
PDF
Bertenthal
PPTX
Big data analytics and machine intelligence v5.0
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
PDF
Cloud computing infrastructure
PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
PPTX
Panel: NRP Science Impacts​
PDF
FPGA Hardware Accelerator for Machine Learning
PDF
NVIDIA Rapids presentation
PDF
Rapids: Data Science on GPUs
PDF
PIMRC-2012, Sydney, Australia, 28 July, 2012
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
PPTX
Cloud Services for Big Data Analytics
PPTX
Cloud Services for Big Data Analytics
PDF
Venkata brundavanam 2020
PDF
Venkata brundavanam 2020
PPTX
Chug dl presentation
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Bertenthal
Big data analytics and machine intelligence v5.0
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud computing infrastructure
Accelerating TensorFlow with RDMA for high-performance deep learning
Panel: NRP Science Impacts​
FPGA Hardware Accelerator for Machine Learning
NVIDIA Rapids presentation
Rapids: Data Science on GPUs
PIMRC-2012, Sydney, Australia, 28 July, 2012
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Venkata brundavanam 2020
Venkata brundavanam 2020
Chug dl presentation
Spark and Deep Learning Frameworks at Scale 7.19.18
Ad

More from Rebekah Rodriguez (20)

PPTX
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
PPTX
MWC Roundtable: Accelerating Innovation from the Intelligent Edge to Cloud
PPTX
Supermicro and The Green Grid (TGG)
PPTX
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
PPTX
X13 Products + Intel® Xeon® CPU Max Series–An Applications & Performance View
PPTX
X13 Products + Intel® Xeon® CPU Max Series–An Applications & Performance View
PPTX
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
PDF
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
PPTX
The Power of HPC with Next Generation Supermicro Systems
PPTX
Building Efficient Edge Nodes for Content Delivery Networks
PPTX
New Accelerated Compute Infrastructure Solutions from Supermicro
PPTX
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
PPTX
Zero Trust for Private 5G and Edge
PPTX
Benefits of Operating an On-Premises Infrastructure
PPTX
Emerging Cloud Storage Trends for Enterprises
PPTX
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
PPTX
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
PPTX
Tackling Retail Technology Management Challenges at the Edge
PPTX
Optimize Content Delivery with Multi-Access Edge Computing
PPTX
Delivering Breakthrough Performance Per Core with AMD EPYC
Delivering Supermicro Software Defined Storage Solutions with OSNexus QuantaStor
MWC Roundtable: Accelerating Innovation from the Intelligent Edge to Cloud
Supermicro and The Green Grid (TGG)
Supermicro Servers with Micron DDR5 & SSDs: Accelerating Real World Workloads
X13 Products + Intel® Xeon® CPU Max Series–An Applications & Performance View
X13 Products + Intel® Xeon® CPU Max Series–An Applications & Performance View
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
X13 Pre-Release Update featuring 4th Gen Intel® Xeon® Scalable Processors
The Power of HPC with Next Generation Supermicro Systems
Building Efficient Edge Nodes for Content Delivery Networks
New Accelerated Compute Infrastructure Solutions from Supermicro
Innovative Solutions for Cloud Gaming, Media, Transcoding, & AI Inferencing
Zero Trust for Private 5G and Edge
Benefits of Operating an On-Premises Infrastructure
Emerging Cloud Storage Trends for Enterprises
Drive Data Center Efficiency with SuperBlade, Powered by AMD EPYC™ and Instinct™
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Tackling Retail Technology Management Challenges at the Edge
Optimize Content Delivery with Multi-Access Edge Computing
Delivering Breakthrough Performance Per Core with AMD EPYC

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Advanced IT Governance
PDF
Electronic commerce courselecture one. Pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Cloud computing and distributed systems.
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Advanced Soft Computing BINUS July 2025.pdf
Advanced IT Governance
Electronic commerce courselecture one. Pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
NewMind AI Weekly Chronicles - August'25 Week I

Rack Cluster Deployment for SDSC Supercomputer

  • 1. Rack Cluster Deployment for SDSC AI Supercomputer Shawn Strande, San Diego Supercomputer Center Sree Ganesan, Habana Labs Thomas Jorgensen, Supermicro 11/10/2021 © 2021 Supermicro
  • 2. SDSC: Thirty-Five Years of Excellence in High Performance and Data Intensive Computing • Established as a national supercomputer resource center in 1985 by NSF • Serves the national, UC San Diego, UC System, and State of California research communities. • Supports research in all domains, incl. life sciences, physics, materials science, social sciences, and others. • Design, deployment, and operations of large-scale, innovative supercomputer and data resources. • Operates a state-of-the-art data center on the UC San Diego campus • Strong connections to the local tech sector
  • 3. NSF Award 2005369 PIs: Amit Majumdar (PI), Rommie Amaro, Javier Duarte, Mai Nguyen, Bob Sinkovits SDSC/UCSD Me Shawn Strande, SDSC Deputy Director and Voyager Project Manager
  • 4. Voyager deployment is underway now at SDSC! • Supermicro handover to SDSC complete • SDSC performing systems and application installs • Early access in Jan ’22 • Formal operations est. Feb ’22 • 3 years as a focused testbed • 2 years with wider access offered through NSF allocations • Opportunities for access and collaboration with and by industry
  • 5. Voyager System and Software • 42x Training Nodes each with 8 Habana Gaudi processors (336 total); 3rd Generation Intel® Xeon® Scalable processors; 6 TB node-local NVMe • 2x Inference Nodes each with 8x Habana Goya processors (16 total); 2nd Generation Intel® Xeon® Scalable processors; 3TB NVMe node—local storage • 36x Intel x86 two-socket compute nodes • Gaudi network: 400GbE Arista - RDMA over Converged Ethernet; on-chip • 2PB Storage system – potential to experiment with various parallel file systems (Ceph, Lustre); connectivity to compute via 25GbE • 200 TB HFS; connectivity to compute via 25GbE • DL frameworks - TensorFlow, PyTorch + Habana Synapse AI software development tools • https://habana.ai/ (white papers on Gaudi and Goya and other information)
  • 6. Science application characteristics Application domain AI techniques ML frameworks Training vs inference Astronomy NN TensorFlow Mostly T Atmospheric science NN TensorFlow Mostly T Chemistry, Biophysics NN Custom, PyTorch Both T & I Chemistry, Materials NN Custom, PyTorch Mostly I Computer science Reinforcement learning, RNN TensorFlow Mostly T Human microbiome mmvec, GAN TensorFlow, PyTorch Mostly T Particle physics CNN, GAN, GNN, RNN, NN, VAE TensorFlow, PyTorch Both T & I Population genetics CNN TensorFlow Mostly T Satellite image analysis U-Net, CNN, GAN, cluster analysis, PCA TensorFlow Mostly T Systems biology CNN, SVM TensorFlow, PyTorch Both T & I Key: CNN=Convolutional Neural Network, GAN=Generative Adversarial Network, GNN=Graph Neural Network, I=Inference; NN=Dense Neural Network, PCA=Principal Components Analysis, RNN=Recurrent Neural Network, SVM=Support Vector Machine, T=Training; VAE=Variational Auto- Encoder
  • 7. High energy physics application – Javier Duarte, UCSD Data processing pipeline for Higgs boson to bottom quark event processing can benefit from Voyager's inference processors to filter data coming out of detector and Voyager's training processors in processing data that passes the high-level trigger. Credit: Javier Duarte • LHC at CERN generates massive amount of data • More than 99% of events (responsible for discovery of Higgs boson) are discarded immediately • Remaining petabytes of data are further analyzed • Duarte and collaborators use ML for triggering, event reconstruction and data analysis • For triggering, ML improves signal selection efficiency • For data analysis, various ML algorithms (including dense, convolutional, recurrent, and graph neural networks) are used to classify each event as signal or background and to identify particle signatures (such as Higgs boson decay candidates) • GNNs on Gaudi to improve particle identification and event reconstruction • Goya to test software-based triggering step of the data processing pipeline
  • 8. Satellite image analysis – Mai Nguyen, Ilkay Altintas and collaborators, SDSC • Applying DL to image analysis, disaster management, NLP, others • A Voyager project - DL algorithms on satellite images to determine land covers across different areas in the context of wildfire management • WIFIRE: WIFIRE HOME | WIFIRE (ucsd.edu) • Goal is to combine AI models with fire science models and fire science expertise • Study and simulate fire behavior under different conditions • Algorithms developed on the TF framework will be ported to Voyager • Easy transition of DL models to Habana expected Satellite Imagery Tiles Crop Images Pan-Sharpen Reproject Create RGB Downsample Data Preparation Feature Extraction Cluster PCA Sort Machine Learning Ordered Clusters Histogram and Map CNN Satellite image processing pipeline, showing data preparation and machine learning steps. ML model training and inference will be accomplished using Voyager's processers. • For land-cover map generation, U-Nets and CNN will be trained on Gaudi processors for segmentation and classification • DL models are used to extract features from satellite images for o Region-of-interest detection to locate schools in rural areas o Demographic analysis to understand organization of a city and refugee camp formation
  • 9. Our partnership with Supermicro and Intel Habana Labs is allowing us to deploy a cutting-edge AI supercomputer for research • Ability and willingness to engage in a project with innovative technology for advanced computing and AI in science and engineering research • Deep technical collaboration with Supermicro and Intel Habana Labs in advanced AI processors, high performance networking, and systems integration • Rigorous pre-delivery testing for reliability and performance • Onsite installation, 5-years support
  • 10. A little about Habana 10 • Founded in 2016 to develop purpose-built AI processors • Launched inference processor in 2018, training processor in 2019 • Acquired by Intel in late-2019 • Fully leveraging Intel’s scale, resources and infrastructure • Accessing Intel ecosystem and customer partnerships • Gaudi AI processor is now available on AWS; DL1 is the first non-GPU instance on AWS • Continuing with our mission to build AI processors optimized for data center and cloud performance and efficiency &
  • 12. Gaudi: architected for efficiency 12 Designed to optimize AI performance, delivering higher efficiency than traditional CPUs & GPUs • Heterogeneous compute architecture - Configurable centralized GEMM engine (MME) - Fully programmable, AI-customized Tensor Processing Cores • Software-managed memory architecture - 32 GB of HBM2 memory • Natively integrated 10 x 100Gb Ethernet RoCE for scaling
  • 13. 13 Designed for flexible and easy model migration Ease of use Integrated with TensorFlow and PyTorch; minimal code changes to get started  SynapseAI maps model topology onto Gaudi devices Developers can enjoy the same abstraction they are accustomed to today Customization SynapseAI TPC SDK facilitates development of custom kernels Developers can customize models to extract best performance 32GB HBM2 memories similar to GPUs, so existing DL models will fit into Gaudi memory Developers can spend less effort to port their models to Gaudi Balanced compute & memory
  • 14. Designed for Scaling Efficiency 14 The industry’s FIRST: Native integration of 10 x 100 Gigabit Ethernet RoCE ports onto every Gaudi • Eliminates network bottlenecks • Standard Ethernet inside the server and across nodes • Eliminates lock-in with proprietary interfaces • Lowers total system cost and power by reducing discrete components
  • 15. 15 15 Scaling Within A Gaudi Server • 8 Gaudi OCP OAM cards • 24 x 100GbE RDMA RoCEfor scale-out • Non-blocking, all-2-all internal interconnect across Gaudi AI processors • Separate PCIe ports for external HostCPUtraffic Example of Integrated Server with eight Gaudi AI processors, two Xeon CPU and multiple Ethernet Interfaces
  • 16. 16 16 Rack And Pod Level Scaling Easily build rack and pod- scale training systems with off-the-shelf standard ethernet switches Example of rack configuration with four Gaudi servers (eight Gaudi processors per server) connected to a single Ethernet switch
  • 17. SynapseAI® Software Suite: designed for performance and ease of use 17 Driving end-user efficiency for model build and migration • Train deep learning models on Gaudi with minimal code changes • Integrated with TensorFlow & PyTorch • Habana Developer Site & GitHub • Support with reference models, kernel libraries, documentation and “how tos” • Advanced users can write their own custom kernels Graph Compiler Habana Communication Libraries Habana Kernel Library Customer Kernel Library User Mode Driver Kernel Mode Driver Debugging & Profiling Tools T PC Programming Tools Framework Integration Layer
  • 18. import tensorflow as tf from TensorFlow.common.library_loader import load_habana_module load_habana_module() (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(10), ]) loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5, batch_size=128) model.evaluate(x_test, y_test) Load the Habana libraries needed to use Gaudi aka HPU device. Once loaded, the HPU device is registered in TensorFlow and prioritized over CPU. When an Op is available for both CPU and HPU, the Op is assigned to the HPU. When an Op is not supported on HPU, it runs on the CPU Getting Started with TensorFlow on Gaudi
  • 20. SDSC Voyager Supercomputer powered by Supermicro Habana AI processors • Supermicro X12 8-Gaudi Server powering Voyager • 16 Goya inference processors in 8-card server from Supermicro 20
  • 21. SMC Executive Summary 11/10/2021 Better Faster Greener™ © 2021 Supermicro 21 • Founded and Headquartered in San Jose, USA since 1993 • Top-ranked Server & Storage Solutions Provider Worldwide • Industry’s Broadest Range of Application-Optimized Products • Product Time-to-Market Leadership for Every Refresh Cycle • The Best Product Performance Per-watt/Per-sq. ft./Value • Green Computing Leader Offering the Lowest TCO and TCE Charles Liang, CEO/Founder, has been the chief designer and architect of Supermicro – the fastest growing server hardware solutions company in USA for the past 27 years.
  • 22. Supermicro Habana Gaudi AI Training Solution Better Faster Greener™ © 2021 Supermicro 22 AI Training Accelerator Unified software stack Optimized for AI models & workloads Software Solution X12 Gaudi AI System + SuperCloud Composer
  • 23. X12 Gaudi AI Training System Overview Better Faster Greener™ © 2021 Supermicro 23 System Front View SYS-420GH-TNGR System Rear View Supermicro Confidential Specifications CPU – Dual Socket Dual Intel® Xeon® Scalable Processors (Ice Lake) TDP up to 270W Memory – 32 DIMM Slots 32 DIMMs, up to 8TB Registered ECC DDR4 3200MHz SDRAM Drives – 4 Hot-Swap Bays 4x 2.5” SAS/SATA/NVMe Hybrid Expansion 1x PCIe Gen 4 x16 (FHHL) 2x AIOM PCIe Gen 4 x16 Networking 1x 10GbE BaseT 6x 400Gb QSFP-DD Power Supply 4x 3000W redundant, Titanium Level • Key Features • 4U System for 8x Habana Gaudi HL-205 AI Processors • Purpose Built for AI/Deep Learning Training • Lower system cost with build-in 100GbE Ethernet ports • 24 x 100GbE RDMA (6 QSFP-DDs) for scale-out Up to 40% better price performance than existing AI Training Solutions
  • 24. Purpose-designed for data center AI Training efficiency • Cost-efficient AI Training • Usability to ease model migration • Hardware and software architected for scalability A new class of AI Training: X12 Gaudi AI System 24 Better Faster Greener™ © 2021 Supermicro 11/10/2021 Supermicro Confidential AI training workloads targeted: Computer Vision, NLP and Recommendation SuperCloud Composer: Single Pane of Glass Management – Gaudi System, storage and networking integration
  • 25. Better Faster Greener™ © 2021 Supermicro 25 Solution Brief and Rack Level Reference Design Available Success Story: SDSC’s Voyager Research Program • Support research conducted across range of science and engineering domains - Astronomy, climate sciences, chemistry and particle physics • 336 Gaudi Training accelerators with native RoCE scaling
  • 26. DISCLAIMER Super Micro Computer, Inc. may make changes to specifications and product descriptions at any time, without notice. The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. Any performance tests and ratings are measured using systems that reflect the approximate performance of Super Micro Computer, Inc. products as measured by those tests. Any differences in software or hardware configuration may affect actual performance, and Super Micro Computer, Inc. does not control the design or implementation of third party benchmarks or websites referenced in this document. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to any changes in product and/or roadmap, component and hardware revision changes, new model and/or product releases, software changes, firmware changes, or the like. Super Micro Computer, Inc. assumes no obligation to update or otherwise correct or revise this information. SUPER MICRO COMPUTER, INC. MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. SUPER MICRO COMPUTER, INC. SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL SUPER MICRO COMPUTER, INC. BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF SUPER MICRO COMPUTER, Inc. IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2021 Super Micro Computer, Inc. All rights reserved. 26 © 2021 Supermicro