A cluster of Dell™
PowerEdge™
R7615 servers featuring AMD EPYC
processors achieved much stronger performance on multi-GPU,
multi‑node operations using Broadcom 100GbE NICs than the same
cluster using 10GbE NICs
Dell PowerEdge R7615 servers with
Broadcom BCM57508 NICs can accelerate
your AI fine‑tuning tasks
We tested a two-node cluster of Dell PowerEdge R7615 servers with AMD EPYC™
9374F processors and NVIDIA®
L40 GPUs with two networking configurations:
one with Broadcom®
100GbE BCM57508
NetXtreme-E network interface cards (NICs)
with remote direct memory access (RDMA) over
Ethernet (RoCE) support
one with
10GbE NICs
LLM training and inference frameworks deployed on distributed GPUs use low-level algorithms to move data
between GPUs, operate on that data, and share the results with other GPUs. Our testing focused on three of
these algorithms as implemented in the NVIDIA Collective Communications Library (NCCL) library: all-reduce,
reduce-scatter, and send-receive. This library, which many AI frameworks use, can send data over RoCE network
paths or ordinary Ethernet network paths, and can perform RDMA transfers between distributed NVIDIA GPUs.
For each configuration, we studied three multi-GPU, multi-node AI computations from the NCCL test suite at
different packet sizes and measured the time to complete the task, latency, and the effective bandwidth of the
network during the operation. The cluster with 100GbE networking dramatically outperformed the cluster with
10GbE networking across all packet sizes and tasks without increasing power usage.
Please note that these tests do not send enough data between servers to overwhelm the networking link. Rather,
these tests comprise a sequence of computational steps on each GPU, where a given step may require data from
other GPUs. In such cases, a GPU can only start the next computational step once it has the data from those
other GPUs, even if that data is as small as a single byte. The operational bandwidth depends on the timely
transfer of data between GPUs on different servers.
The three multi-GPU, multi-node NCCL primitive operations for AI we used for testing are:
• all-reduce: Operate on the entire dataset, distribute across all GPUs in the cluster,
and store the single result on each GPU
• reduce-scatter: Divide the data on every GPU into logical chunks, and operate on each chunk
across the cluster to form partial results. Then send one partial result to each GPU and store it there
• send-receive: Send data from one GPU to another on the second server, and return a response
For full testing details and results, read our full report.
Learn more at https://facts.pt/QAauY1Y
Up to 83% less time to
complete multi-GPU,
multi‑node operations*
Up to 66% lower
latency on multi-GPU,
multi‑node operations*
Up to 6.1x the
bandwidth on multi-GPU,
multi‑node operations*
Multi-GPU,
multi‑node
operation
Latency (microseconds)
Lower is better
Percentage
reduction
Higher is better
100GbE
configuration
10GbE
configuration
all-reduce
(packet size: 4 B)
40 123 67.4%
reduce-scatter
(packet size: 4 B)
29 85 65.8%
send-receive
(packet size: 48 B)
41 56 26.7%
Data size (MB)
Operation
time
(microseconds)
Send-receive performance: Time to complete task
Lower is better
0
100,000
200,000
300,000
400,000
500,000
600,000
0 50 100 150 200 250 300
100GbE 10GbE
0
10
20
30
40
50
0 50 100 150 200 250 300
Data size (MB)
Bandwidth
(Gbps)
Send-receive bandwidth
Higher is better 100GbE 10GbE
*cluster of Dell PowerEdge R7615 servers featuring AMD EPYC 9374F processors and
Broadcom 100GbE BCM57508 NetXtreme-E NICs vs. the same cluster with 10GbE NICs.
Copyright 2024 Principled Technologies, Inc. Based on “Dell PowerEdge R7615 servers with Broadcom 100GbE NICS
can deliver lower-latency, higher‑throughput networking to speed your AI fine‑tuning tasks,” a Principled Technologies
report, December 2024. Principled Technologies®
is a registered trademark of Principled Technologies, Inc. All other
product names are the trademarks of their respective owners.
Principled
Technologies®

More Related Content

PDF
Dell PowerEdge R7615 servers with Broadcom BCM57508 NICs can accelerate your ...
PDF
Dell PowerEdge R7615 servers with Broadcom 100GbE NICs can deliver lower-late...
PDF
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
PDF
GPU Dedicated Server Harnessing High-Performance Computing (HPC)
PDF
Deep Learning on the SaturnV Cluster
PDF
Brocade solution brief
PDF
The Convergence of HPC and Deep Learning
PDF
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
Dell PowerEdge R7615 servers with Broadcom BCM57508 NICs can accelerate your ...
Dell PowerEdge R7615 servers with Broadcom 100GbE NICs can deliver lower-late...
[03 2][gpu용 개발자 도구 - parallel nsight 및 axe] gateau parallel-nsight
GPU Dedicated Server Harnessing High-Performance Computing (HPC)
Deep Learning on the SaturnV Cluster
Brocade solution brief
The Convergence of HPC and Deep Learning
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs

Similar to Dell PowerEdge R7615 servers with Broadcom BCM57508 NICs can accelerate your AI fine‑tuning tasks - Infographic (20)

PDF
NVIDIA Keynote #GTC21
PPT
Current Trends in HPC
PPTX
Kindratenko hpc day 2011 Kiev
PDF
Gpu Systems
PDF
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
PDF
Latest HPC News from NVIDIA
PDF
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
PDF
Introduction to GPU Programming
PDF
GIST AI-X Computing Cluster
PDF
VMworld 2013: How Good is PCoIP - A Remoting Protocol Shootout
PDF
GPU Dedicated Server_ Harnessing High-Performance Computing (HPC).pdf
PDF
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
PDF
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
PDF
High Performance Computing - Challenges on the Road to Exascale Computing
PDF
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
PDF
N A G P A R I S280101
PDF
Optimize networking performance with the Dell PowerEdge R750 featuring a mode...
PPT
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
PDF
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
PDF
Fast & Furious: building HPC solutions in a nutshell
NVIDIA Keynote #GTC21
Current Trends in HPC
Kindratenko hpc day 2011 Kiev
Gpu Systems
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Latest HPC News from NVIDIA
NVIDIA GPUs Power HPC & AI Workloads in Cloud with Univa
Introduction to GPU Programming
GIST AI-X Computing Cluster
VMworld 2013: How Good is PCoIP - A Remoting Protocol Shootout
GPU Dedicated Server_ Harnessing High-Performance Computing (HPC).pdf
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
High Performance Computing - Challenges on the Road to Exascale Computing
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
N A G P A R I S280101
Optimize networking performance with the Dell PowerEdge R750 featuring a mode...
BladeCenter GPU Expansion Blade (BGE) - Client Presentation
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Fast & Furious: building HPC solutions in a nutshell
Ad

More from Principled Technologies (20)

PDF
Unlocking hybrid cloud efficiency: Dell AX System for Azure Local with PowerFlex
PDF
Energize your business strategy with the new Dell Pro 14 Plus
PDF
Unlock faster data-driven business decisions with Azure Databricks - Infographic
PDF
Supercharge AI performance and enhance productivity with the HP EliteBook X G...
PDF
Supercharge AI performance and enhance productivity with the HP EliteBook X G...
PDF
Equal time, equal tools: Measuring PC deployment time in multi-vendor environ...
PDF
Modernizing your data center with Dell and AMD
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
PDF
Make GenAI investments go further with the Dell AI Factory
PDF
Unlock faster insights with Azure Databricks
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
The case for on-premises AI
PDF
Dell PowerEdge server cooling: Choose the cooling options that match the need...
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Unlock flexibility, security, and scalability by migrating MySQL databases to...
PDF
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
Unlocking hybrid cloud efficiency: Dell AX System for Azure Local with PowerFlex
Energize your business strategy with the new Dell Pro 14 Plus
Unlock faster data-driven business decisions with Azure Databricks - Infographic
Supercharge AI performance and enhance productivity with the HP EliteBook X G...
Supercharge AI performance and enhance productivity with the HP EliteBook X G...
Equal time, equal tools: Measuring PC deployment time in multi-vendor environ...
Modernizing your data center with Dell and AMD
Dell Pro 14 Plus: Be better prepared for what’s coming
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Make GenAI investments go further with the Dell AI Factory - Infographic
Make GenAI investments go further with the Dell AI Factory
Unlock faster insights with Azure Databricks
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
The case for on-premises AI
Dell PowerEdge server cooling: Choose the cooling options that match the need...
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
Propel your business into the future by refreshing with new one-socket Dell P...
Propel your business into the future by refreshing with new one-socket Dell P...
Unlock flexibility, security, and scalability by migrating MySQL databases to...
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
Ad

Recently uploaded (20)

PDF
Five Habits of High-Impact Board Members
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPT
Geologic Time for studying geology for geologist
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Five Habits of High-Impact Board Members
Enhancing plagiarism detection using data pre-processing and machine learning...
Module 1.ppt Iot fundamentals and Architecture
sbt 2.0: go big (Scala Days 2025 edition)
Consumable AI The What, Why & How for Small Teams.pdf
Comparative analysis of machine learning models for fake news detection in so...
The influence of sentiment analysis in enhancing early warning system model f...
NewMind AI Weekly Chronicles – August ’25 Week III
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Chapter 5: Probability Theory and Statistics
sustainability-14-14877-v2.pddhzftheheeeee
1 - Historical Antecedents, Social Consideration.pdf
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Geologic Time for studying geology for geologist
UiPath Agentic Automation session 1: RPA to Agents
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Flame analysis and combustion estimation using large language and vision assi...
A contest of sentiment analysis: k-nearest neighbor versus neural network
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor

Dell PowerEdge R7615 servers with Broadcom BCM57508 NICs can accelerate your AI fine‑tuning tasks - Infographic

  • 1. A cluster of Dell™ PowerEdge™ R7615 servers featuring AMD EPYC processors achieved much stronger performance on multi-GPU, multi‑node operations using Broadcom 100GbE NICs than the same cluster using 10GbE NICs Dell PowerEdge R7615 servers with Broadcom BCM57508 NICs can accelerate your AI fine‑tuning tasks We tested a two-node cluster of Dell PowerEdge R7615 servers with AMD EPYC™ 9374F processors and NVIDIA® L40 GPUs with two networking configurations: one with Broadcom® 100GbE BCM57508 NetXtreme-E network interface cards (NICs) with remote direct memory access (RDMA) over Ethernet (RoCE) support one with 10GbE NICs LLM training and inference frameworks deployed on distributed GPUs use low-level algorithms to move data between GPUs, operate on that data, and share the results with other GPUs. Our testing focused on three of these algorithms as implemented in the NVIDIA Collective Communications Library (NCCL) library: all-reduce, reduce-scatter, and send-receive. This library, which many AI frameworks use, can send data over RoCE network paths or ordinary Ethernet network paths, and can perform RDMA transfers between distributed NVIDIA GPUs. For each configuration, we studied three multi-GPU, multi-node AI computations from the NCCL test suite at different packet sizes and measured the time to complete the task, latency, and the effective bandwidth of the network during the operation. The cluster with 100GbE networking dramatically outperformed the cluster with 10GbE networking across all packet sizes and tasks without increasing power usage. Please note that these tests do not send enough data between servers to overwhelm the networking link. Rather, these tests comprise a sequence of computational steps on each GPU, where a given step may require data from other GPUs. In such cases, a GPU can only start the next computational step once it has the data from those other GPUs, even if that data is as small as a single byte. The operational bandwidth depends on the timely transfer of data between GPUs on different servers. The three multi-GPU, multi-node NCCL primitive operations for AI we used for testing are: • all-reduce: Operate on the entire dataset, distribute across all GPUs in the cluster, and store the single result on each GPU • reduce-scatter: Divide the data on every GPU into logical chunks, and operate on each chunk across the cluster to form partial results. Then send one partial result to each GPU and store it there • send-receive: Send data from one GPU to another on the second server, and return a response For full testing details and results, read our full report. Learn more at https://facts.pt/QAauY1Y Up to 83% less time to complete multi-GPU, multi‑node operations* Up to 66% lower latency on multi-GPU, multi‑node operations* Up to 6.1x the bandwidth on multi-GPU, multi‑node operations* Multi-GPU, multi‑node operation Latency (microseconds) Lower is better Percentage reduction Higher is better 100GbE configuration 10GbE configuration all-reduce (packet size: 4 B) 40 123 67.4% reduce-scatter (packet size: 4 B) 29 85 65.8% send-receive (packet size: 48 B) 41 56 26.7% Data size (MB) Operation time (microseconds) Send-receive performance: Time to complete task Lower is better 0 100,000 200,000 300,000 400,000 500,000 600,000 0 50 100 150 200 250 300 100GbE 10GbE 0 10 20 30 40 50 0 50 100 150 200 250 300 Data size (MB) Bandwidth (Gbps) Send-receive bandwidth Higher is better 100GbE 10GbE *cluster of Dell PowerEdge R7615 servers featuring AMD EPYC 9374F processors and Broadcom 100GbE BCM57508 NetXtreme-E NICs vs. the same cluster with 10GbE NICs. Copyright 2024 Principled Technologies, Inc. Based on “Dell PowerEdge R7615 servers with Broadcom 100GbE NICS can deliver lower-latency, higher‑throughput networking to speed your AI fine‑tuning tasks,” a Principled Technologies report, December 2024. Principled Technologies® is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. Principled Technologies®