SlideShare a Scribd company logo
Architecting the Right
System for Your AI
Application—without the Vendor Fluff
Brett Newman
VP Marketing & Customer Engagement
Microway, Inc.
wespeakhpc@microway.com
Where We’re Headed
1. Before You Start
• What do you know: Datasets, Algorithms, Collaborators
2. How to Select A System
• Common training, mixed workloads, datasets too large,
don’t know
3. Collaborating with Vendors
• Who, where, and what to look for
Who is This For?
End Users Who:
1. Don’t know where to start
2. Need a “checklist”
3. Afraid of/ hate working with vendors
4. Hate being sold to
Not for:
1. AI Framework Writers
2. 10+ year ninja GPU coders
Before You Start
What Do You Know?
About Your Dataset:
○ Size – overall
○ Chunkable? (batch size)
○ Size – individual datum
128GB
16GB
32GB + 32GB + 32GB + 32GB
8GB
Image Credit: By Leonardo da Vinci - Cropped and relevelled from File:Mona Lisa, by Leonardo da Vinci, from C2RMF.jpg.
Originally C2RMF: Galerie de tableaux en très haute définition: image page, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=15442524
Visual Idea Inspiration Credit: Scott Soutter, IBM
1 multi
GPU
server
POWER9
w/NVLink or pre-
process
Various Tesla V100 systems
Overall: 128GB
Oversimplified Example
About Your Algorithm
○ Standard Framework vs. Custom Algorithm
○ Have You Run Any Profilers/Tools?
PCI-E Switching
OR
CPU:GPU NVLink
Denser,
NVLink Interconnected
(+10-20% on training)
Mixed
Workload
Ex: Molecular Dynamics +
AI Simulation Refinement
NVProf
Allinea Perf Tools
Intel Visual Profiler
What Do You Know?
Tool Examples
What Do You Know?
About Your Collaborators
○ Running on what HW?
○ Using Larger facilities?
Ex: Summit @ ORNL
Basic Guidance to
Architecting Your AI System
Algorithm: Solely AI Training, Common Frameworks
• Primary: NVLink connected systems, with GPU count to dataset scale/ budget
• Secondary: PCI-E systems (switched) with GPU count to dataset scale/ budget
4 GPUs with NVLink 8 GPUs with NVLink 16 GPUs with NVLink
Dataset Size (w/ batches <32GB)
NVLink: 10-20% training
perf. increase
Greatest Ease of Use with Perf., AI Training
DGX-Station
(4 GPUs)
DGX-1
(8 GPUs)
DGX-2
(16 GPUs)
Mixed Workloads or Small Datasets
• Balanced systems (2 sockets, full/half populated 2-4 GPUs)
• Greatest flexibility & expandability
Dataset: Too Large/Non “Chunkable”
• POWER9 Systems with Coherency + CPU: GPU NVLink (5X BW)
• Switched PCI-E Tree + Custom Algorithms with Unified Memory
POWER9 with NVLink8 GPUs with Switches
Don’t Know, Can’t Find Out
1. Test it! If at all possible
Upgrading from Fermi, Kepler > most
system architecture choices
2. No Matter Your Choice…
GPU acceleration > CPU systems (5X-50X)
Good, Better, Best
Collaborating with Vendors
Vendors: Who to Look For?
People & Titles
○ Technical Sales
○ Solution Engineer
○ Anyone who proves they know something
○ Anyone with proven access to hardware
Vendors: Who to Look For?
In Tier 1 Vendors
○ Find: HPC or AI Groups, exclusively (hard)
○ Avoid: general sellers, laptop/networking guy
In Tier 2 Vendors
○ Find: Established AI/HPC Vendors
○ Avoid: parts resellers/limited integration shops
○ Find: NVIDIA NPN Elite Deep Learning Partners
Vendors: What to Look For/Signals
Signals:
○ Ask for testing/benchmarking
○ Ask to see HW architecture of solution
(back of napkin OK)
○ Spending time on phone, email, or in
person?
Don’t work with someone who doesn’t
understand what you’re talking about!
Vendors: Strategies For a Better Engagement
Overshare
○ Every piece of data: about data, algorithm/code, your goals
○ About what is working/isn’t working today
○ About what you own
Discuss Collaborators
○ What do they own?
○ Need to plan to run together?
State Realistic Plans for Flexibility/Expansion
Review
What we Talked About
1. Before You Start
• What do you know: Datasets, Algorithms, Collaborators
2. How to Select A System
• Datasets too large, common training, mixed workloads,
don’t know
3. Collaborating with Vendors
• Who, where, and what to look for
Real Experts, Real Deliveries
So, Less Confused?
Gain confidence to Solve the AI HW Puzzle
The Best Vendors are Partners & Here to Help!
microway.com/gpu-test-drive/ microway.com/configure-
your-solution
calendly.com/microway/schedul
e-a-consulation
GPU Solutions Guide
Microway designs and builds fully-integrated clusters, servers, and
workstations. For 35 years, we have delivered high-performance
systems for data analytics, cognitive systems, research, and AI.
Leverage our expertise – We Speak HPC & AI
© Copyright 2019 Microway. All Rights Reserved.
Experts in High Performance Computing
http://www.microway.com
508-746-7341

More Related Content

PPTX
Data Science Salon Miami Presentation
PPTX
Decentralised ai
PDF
Unit 3 part 2
PDF
Managing Data Science by David Martínez Rego
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PPTX
Machine Learning 101
PPTX
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Salon Miami Presentation
Decentralised ai
Unit 3 part 2
Managing Data Science by David Martínez Rego
Data Science Training | Data Science For Beginners | Data Science With Python...
Machine Learning 101
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...

Similar to Architecting the Right System for Your AI Application—without the Vendor Fluff (20)

PDF
Nvidia why every industry should be thinking about AI today
PDF
NVIDIA Artificial Intelligence Ecosystem and Workflows
PDF
Power AI introduction
PPTX
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
PDF
FPGA Hardware Accelerator for Machine Learning
PPTX
nvidia nvidia nvidia nvidia nvidia nvidia
PPTX
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
PDF
AI + E-commerce
PDF
NVIDIA’s Enterprise AI Factory and Blueprints_ Paving the Way for Smart, Scal...
PDF
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
PPTX
Revolutionizing GPU-as-a-Service for Maximum Efficiency
PDF
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
PDF
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
PDF
Omniverse for the Metaverse
PDF
AI Impact on Data Center GPU Industry Trends
PPTX
Introduction to PowerAI - The Enterprise AI Platform
PDF
NVIDIA GPU Technologies for AI and High-Performance Computing
PDF
Ai platform at scale
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
Harnessing the virtual realm for successful real world artificial intelligence
Nvidia why every industry should be thinking about AI today
NVIDIA Artificial Intelligence Ecosystem and Workflows
Power AI introduction
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
FPGA Hardware Accelerator for Machine Learning
nvidia nvidia nvidia nvidia nvidia nvidia
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
AI + E-commerce
NVIDIA’s Enterprise AI Factory and Blueprints_ Paving the Way for Smart, Scal...
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
Revolutionizing GPU-as-a-Service for Maximum Efficiency
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
Omniverse for the Metaverse
AI Impact on Data Center GPU Industry Trends
Introduction to PowerAI - The Enterprise AI Platform
NVIDIA GPU Technologies for AI and High-Performance Computing
Ai platform at scale
Enabling a hardware accelerated deep learning data science experience for Apa...
Harnessing the virtual realm for successful real world artificial intelligence
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
MIND Revenue Release Quarter 2 2025 Press Release
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx

Architecting the Right System for Your AI Application—without the Vendor Fluff

  • 1. Architecting the Right System for Your AI Application—without the Vendor Fluff Brett Newman VP Marketing & Customer Engagement Microway, Inc. wespeakhpc@microway.com
  • 2. Where We’re Headed 1. Before You Start • What do you know: Datasets, Algorithms, Collaborators 2. How to Select A System • Common training, mixed workloads, datasets too large, don’t know 3. Collaborating with Vendors • Who, where, and what to look for
  • 3. Who is This For? End Users Who: 1. Don’t know where to start 2. Need a “checklist” 3. Afraid of/ hate working with vendors 4. Hate being sold to Not for: 1. AI Framework Writers 2. 10+ year ninja GPU coders
  • 5. What Do You Know? About Your Dataset: ○ Size – overall ○ Chunkable? (batch size) ○ Size – individual datum 128GB 16GB 32GB + 32GB + 32GB + 32GB 8GB Image Credit: By Leonardo da Vinci - Cropped and relevelled from File:Mona Lisa, by Leonardo da Vinci, from C2RMF.jpg. Originally C2RMF: Galerie de tableaux en très haute définition: image page, Public Domain, https://commons.wikimedia.org/w/index.php?curid=15442524 Visual Idea Inspiration Credit: Scott Soutter, IBM 1 multi GPU server POWER9 w/NVLink or pre- process Various Tesla V100 systems Overall: 128GB Oversimplified Example
  • 6. About Your Algorithm ○ Standard Framework vs. Custom Algorithm ○ Have You Run Any Profilers/Tools? PCI-E Switching OR CPU:GPU NVLink Denser, NVLink Interconnected (+10-20% on training) Mixed Workload Ex: Molecular Dynamics + AI Simulation Refinement NVProf Allinea Perf Tools Intel Visual Profiler What Do You Know? Tool Examples
  • 7. What Do You Know? About Your Collaborators ○ Running on what HW? ○ Using Larger facilities? Ex: Summit @ ORNL
  • 9. Algorithm: Solely AI Training, Common Frameworks • Primary: NVLink connected systems, with GPU count to dataset scale/ budget • Secondary: PCI-E systems (switched) with GPU count to dataset scale/ budget 4 GPUs with NVLink 8 GPUs with NVLink 16 GPUs with NVLink Dataset Size (w/ batches <32GB) NVLink: 10-20% training perf. increase
  • 10. Greatest Ease of Use with Perf., AI Training DGX-Station (4 GPUs) DGX-1 (8 GPUs) DGX-2 (16 GPUs)
  • 11. Mixed Workloads or Small Datasets • Balanced systems (2 sockets, full/half populated 2-4 GPUs) • Greatest flexibility & expandability
  • 12. Dataset: Too Large/Non “Chunkable” • POWER9 Systems with Coherency + CPU: GPU NVLink (5X BW) • Switched PCI-E Tree + Custom Algorithms with Unified Memory POWER9 with NVLink8 GPUs with Switches
  • 13. Don’t Know, Can’t Find Out 1. Test it! If at all possible Upgrading from Fermi, Kepler > most system architecture choices 2. No Matter Your Choice… GPU acceleration > CPU systems (5X-50X) Good, Better, Best
  • 15. Vendors: Who to Look For? People & Titles ○ Technical Sales ○ Solution Engineer ○ Anyone who proves they know something ○ Anyone with proven access to hardware
  • 16. Vendors: Who to Look For? In Tier 1 Vendors ○ Find: HPC or AI Groups, exclusively (hard) ○ Avoid: general sellers, laptop/networking guy In Tier 2 Vendors ○ Find: Established AI/HPC Vendors ○ Avoid: parts resellers/limited integration shops ○ Find: NVIDIA NPN Elite Deep Learning Partners
  • 17. Vendors: What to Look For/Signals Signals: ○ Ask for testing/benchmarking ○ Ask to see HW architecture of solution (back of napkin OK) ○ Spending time on phone, email, or in person? Don’t work with someone who doesn’t understand what you’re talking about!
  • 18. Vendors: Strategies For a Better Engagement Overshare ○ Every piece of data: about data, algorithm/code, your goals ○ About what is working/isn’t working today ○ About what you own Discuss Collaborators ○ What do they own? ○ Need to plan to run together? State Realistic Plans for Flexibility/Expansion
  • 20. What we Talked About 1. Before You Start • What do you know: Datasets, Algorithms, Collaborators 2. How to Select A System • Datasets too large, common training, mixed workloads, don’t know 3. Collaborating with Vendors • Who, where, and what to look for
  • 21. Real Experts, Real Deliveries
  • 22. So, Less Confused? Gain confidence to Solve the AI HW Puzzle The Best Vendors are Partners & Here to Help! microway.com/gpu-test-drive/ microway.com/configure- your-solution calendly.com/microway/schedul e-a-consulation GPU Solutions Guide
  • 23. Microway designs and builds fully-integrated clusters, servers, and workstations. For 35 years, we have delivered high-performance systems for data analytics, cognitive systems, research, and AI. Leverage our expertise – We Speak HPC & AI © Copyright 2019 Microway. All Rights Reserved. Experts in High Performance Computing http://www.microway.com 508-746-7341

Editor's Notes

  • #6: What’s the overall size of your whole dataset? Does it fit into a single GPU or is it definitely a number of GPUs? Is it multi system? Chunkable – the professional term is whether you can set a reasonable batch size. Does you data fit into chunks the size of a GPU (or portion of one) Individual datum—sometimes your data is so large it won’t fit at all. That’s a case for a specialized code or specialized HW to compensate. Writing your code to manage data with CUDA unified memory, or better yet purchasing a POWER9 with NVLink system. Similarly, if you are using image data of fairly large size (or a batch size of many smaller, more likely), it’s likely a case for a 32GB Tesla GPU
  • #7: PCI-E switching Why CPU: GPU NVLink? If you can’t write efficiently
  • #8: End users underweight this. They are so focused on the concrete hardware value (how much, what’s my complicated price/performance calculation), that they miss the efficacy metric. If you and a primary collaborator need to dramatically change your ETL steps or even your runtime instructions perform similar runs, then you getting far less time out of your expensive hardware. Matching each other is hugely important Similarly, if you have opportunity for larger runs or dedicated time on a larger machine, matching this is critical.