SlideShare a Scribd company logo
MLPerf: An Industry Standard Performance
Benchmark Suite for Machine Learning
(Inference)
Christine Cheng
Intel, MLPerf Inference Co-chair
christine.cheng@intel.com
(Work by many people in MLPerf community)
Edge AI and Vision Innovation Forum – Invited Talk – July 2020
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
2
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
3
Why Benchmark Machine Learning Systems?
• Machine learning needs the entire software stack and hardware
working seamlessly together
• Exponential growth in research & innovations
• Need MLPerf to level the playing field
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8259424
50+
publications
every single
day!
MLArxivPapers
Year
4
Let’s imagine…
• If you have new AI HW which you are trying to sell…
• Market promotion
• Claim performance against competition
• Demonstrate value to customers
The Cost of Latency in High-Frequency Trading
5
System X System Y
What task?
What model?
What dataset?
What batch size?
What quantization?
What software libraries?
…
6
State of ML Benchmarking Today
Researchers from
MLPerf is an
ML performance benchmarking effort
with wide industry
and academic support.
7
MLPerf Goals
• Enforce replicability to ensure reliable results
• Use representative workloads, reflecting production use-cases
• Encourage innovation to improve the state-of-the-art of ML
• Accelerate progress in ML via fair and useful measurement
• Serve both the commercial and research communities
• Keep benchmarking affordable (so that all can play)
8
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
9
Do you specify the model? Closed division does, Open division does not.
MLPerf inference measures rate of inference
e.g.
image
Input
e.g.
“cat”
Result
(with required quality,
e.g. 75.1%)
e.g.
ResNet
Trained model
10
MLPerf v0.5 Inference Workloads
Use Case Neural Network
Vision
ResNet-50 v1.5
SSD ResNet-34
SSD MobileNet v1 (edge only)
Speech
Language
Commerce
Datacenter / Edge Inference
Minimal viable set for initial launch (v0.5)
11
MLPerf v0.7 Inference Workloads
Use Case Neural Network
Vision
ResNet-50 v1.5
SSD ResNet-34
SSD MobileNet v1 (edge only)
3D UNET
Speech RNN-T
Language BERT Large
Commerce DLRM (datacenter only)
Datacenter / Edge Inference
We evolved from a minimum benchmark set to a broad suite (v0.7)
12
Blue = new in v0.7
MLPerf v0.7 Inference Workloads
Use Case Neural Network
Vision
ResNet-50 v1.5
SSD ResNet-34
SSD MobileNet v1 (edge only)
3D UNET
Speech RNN-T
Language BERT Large
Commerce DLRM (datacenter only)
Datacenter / Edge Inference
Blue = new in v0.7
Mobile Inference
Use
Case
Neural Network
Vision
MobileNetEdgeTPU
SSD-MobileNet v2
DeepLabv3
Languag
e
Mobile-BERT
Cat by Alvesgaspar, Dog by December21st2012Freak
We evolved from a minimum benchmark set to a broad suite (v0.7)
13
Four scenarios to handle different use cases
Single stream
(e.g. cell phone
augmented vision)
Multiple stream
(e.g. multiple camera
driving assistance)
Server
(e.g. translation app)
Offline
(e.g. photo sorting app)
14
Different metric for each scenario
Single stream
e.g. cell phone
augmented vision
Multiple stream
e.g. multiple camera
driving assistance
Server
e.g. translation site
Offline
e.g. photo sorting
Latency
Number streams
subject to latency
bound
QPS
subject to latency
bound
Throughput
15
Inference Submitters’ Implementations
• Even greater range of software and
hardware solutions
• So, allow submitters to reimplement
subject to inference rules
• Use standard set of pre-trained
weights for Closed Division
• Use standard C++ “load
generator” that handles scenarios
and metrics
SUT
Common
weights
Must use
Load generator
Generates Times Validates
16
Not a quantization contest!
• Quantization is key to efficient inference, but do not
want a quantization contest
• Can the Closed division quantize?
• Yes, but must be principled: describe
reproducible method
• Can the Closed division calibrate?
• Yes, but must use a fixed set of calibration data
• Can the Closed division retrain?
• No, not a retraining contest. But, provide
retrained 8 bit models..
FP 32
weights
FP / INT X
weights
?
17
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
18
Typical Workflow by week number
... ... -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
Submission
deadline
Result
publication
Code freeze &
rule freeze
Benchmark list
freeze
Result review
by committee
& submitters
MLPerf
19
Typical Workflow by week number
... ... -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
Submission
deadline
Result
publication
Result review
by committee
& submitters
Code freeze &
rule freeze
Benchmark list
freeze
- propose
models
- read
current
rules
- Attend weekly
submitters meetings to
discuss details on
rules, models and
implementations
- Develop SW
- Clarify rules
- Tuning SW for HW
- Pass compliance
checker
Sign
CLA
MLPerf
Submitter
Marketing
Preparation
SW
release
Ideal
Attend weekly submitter meetings
20
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
21
Inference Results: Diverse Systems & Applications
● 600+ inference
results
● Over 30 systems
submitted
● 10,000x
difference in
performance
22
MLPerf is increasing market transparency
23
ML hardware is projected to be a ~$60B industry in 2025.
(Tractica.com $66.3B, Marketsandmarkets.com: $59.2B)
“What get measured, gets improved.” — Peter Drucker
Benchmarking aligns research with development,
engineering with marketing, and competitors across the
industry in pursuit of the same clear objective.
24
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
25
Quarterly Cadence (post COVID)
May June July Aug. Sept. Oct. Nov. Dec.
Training 0.7
Submission
Publication
Inference 0.7
Submission
Publication
Training 0.8
Submission
Publication
Inference 0.8
Submission
Publication
Area Problem Training Inference
Vision Image classification ResNet
Object detection SSD-MobileNet v1
SDD-ResNet34
Image segmentation Mask R-CNN
Medical Imaging 3D UNET
Speech Speech-to-text RNN-T
Language Translation GNMT
Transformer
NLP BERT
Commerce Recommendation DLRM
Research Reinforcement Learning MiniGo
Evolving benchmark suite v0.6 v0.7 v0.8 N/A
27
Area Problem Training Inference Mobile
Vision Image classification ResNet MobileNetEdgeTPU
Object detection SSD-MobileNet v1 SSD-MN v2
SDD-ResNet34
Image segmentation Mask R-CNN DeepLab-MN v3
Medical Imaging 3D UNET
Speech Speech-to-text RNN-T
Language Translation GNMT
Transformer
NLP BERT MobileBERT
Commerce Recommendation DLRM
Research Reinforcement Learning MiniGo
Evolving benchmark suite v0.6 v0.7 v0.8 N/A
28
Area Problem Training Inference Mobile
Vision Image classification ResNet MobileNetEdgeTPU
Object detection SSD-MobileNet v1 Upgrade v0.8 SSD-MN v2
SDD-ResNet34 Upgrade v0.8 Upgrade v0.8
Image segmentation Mask R-CNN DeepLab-MN v3
Medical Imaging 3D UNET
Speech Speech-to-text RNN-T
Language Translation GNMT Remove v0.8
Transformer
NLP BERT MobileBERT
Commerce Recommendation DLRM
Research Reinforcement Learning MiniGo
Evolving benchmark suite v0.6 v0.7 v0.8 N/A
29
Challenges in 2020
● Evolve the benchmark suites fairly
● Improve efficiency information
● Reduce result sparsity
● Reduce benchmarking cost
30
Evolve the benchmark suites fairly
● What does fair mean?
○ Reflect the most impactful industry and research needs
● What is most impactful?
○ Convene Advisory Boards of 3-5 industry users + 3-5 academics, and ask
○ Existing: Recommendation, Medical Imaging
○ Forming: Automotive, Speech
31
Improve efficiency information
The problem: Inference in particular is infinitely scalable.
Benchmark results are not enough; need
more information to determine efficiency.
Now: number of chips
Crude
Better than nothing
Future: power, cloud cost, others?
Not simple
System Offline
Inference:
ResNet
Chips Power
Foo 800ips 1 1200w
Bar 1000ips 4 400w
32
Reduce result sparsity
Why do we allow sparse results?
Specialized chips
Vendors need to focus investment
Ways to make results denser?
Prune benchmarks x scenarios?
Make small number required?
Benchmark X Benchmark Y
Chip A 23
Chip B 17,583
The problem:
33
Reduce benchmarking cost
● Why is good ML benchmarking relatively expensive?
a. We benchmark real user value
b. Software stack and system diversity; need to allow reimplementation
c. Strict requirements on inference model equivalence to the reference
● Ways to reduce cost:
a. Out-of-the-box code division?
b. Improve our reference code and best practices
34
Good ML benchmarking is hard.
We welcome ideas and contributions.
35
How to get involved?
mlperf.org/get-involved
info@mlperf.org
36
Backup
37
MLPerf is the work of many
Aaron Zhong David Patterson Jared Duke Peter Bailis
Abid Muslim Debajyoti Pal Jeff Jiao Peter Baldwin
Andrew Hock Debo Dutta Jeffery Liao Peter Mattson
Ankur Ankur Deepak Narayanan Jonah Alben Ramesh Chukka
Anton Lokhmotov Dehao Chen Jonathan Cohen Sachin Idgunji
Arun Rajan Dilip Sequeira Kim Hazelwood Sam Davis
Ashish Sirasao Ephrem Wu Koichi Yamada Sarah Bird
Atsushi Ike Fei Sun Lillian Pentecost Sergey Serebryakov
Bill Jia Francisco Massa Lingjie Xu Steve Farrell
Bing Yu Frank Wei Mark Charlebois Taylor Robie
Brian Anderson Gennady Pekhimenko Masafumi Yamazaki Tayo Oguntebi
Carole-Jean Wu George Yuan Matei Zaharia Thomas B. Jablin
Christine Cheng Greg Diamos Maximilien Breughe Tom St. John
Cliff Young Gu-Yeon Wei Michael Thomson Tsuguchika Tabaru
Cody Coleman Guenther Schmuelling Naveen Kumar Udit Gupta
Colin Osborne Guokai Ma Pan Deng Victor Bittorf
Daniel Kang Hanlin Tang Pankaj Kanwar Vijay Janapa Reddi
Dave Fick Itay Hubara Paulius Micikevicious William Chou
David Brooks J. Scott Gardner Peizhao Zhang Xinyuan Huang
David Kanter Jacob Balma Peng Meng Yuchen Zhou
Agenda
● Why do we need MLPerf Inference?
● What’s in MLPerf inference?
● Typical Submission Workflow
● Impact of MLPerf inference 2019
● What’s in MLPerf inference 2020?
● How else can we make ML better?
39
We are creating a non-profit called
MLCommons with a mission to
“Accelerate ML innovation.”
40
Recipe for accelerated innovation
Benchmarks Large public
datasets
+ + Best practices Outreach+
Photo credits (left to right): Simon A. Eugster CC BY-SA 3.0, Riksantikvarieämbetet / Pål-Nils Nilsson CC BY 2.5 se, Public Domain, Public
Domain
41
Best practice example: MLBox
MLBox
(a Docker)
datasets/
params
platform_spec
plarform_instance
Platform (HW+OS)
trained_model/
outputs/
logs/
directory
file
file
file
directory
directory
directory
Wikimedia
https://commons.wikimedia.org/wiki/File:Container_01_KMJ.jpg
Need a lightweight way to share models for benchmarking or experiments
Basic idea: a Docker with a standard file system interface
Goal: shipping container for ML models
42
Dataset example: People’s Speech
● Public datasets fuel innovation
○ ~80% of sample of large tech company
research papers cite public datasets [ Study
by Dr. Vijay Janapa Reddi, Harvard ]
● “People’s Speech” dataset
○ Goal:
■ 100,000+ hours of transcribed speech in
diverse languages by diverse speakers
■ Public-use license
○ Why?
■ Smart speakers / assistants expected to
reach entire Earth population by 2025
■ 1000+ languages with 1M+ speakers https://commons.wikimedia.org/wiki/File:List_of_languages_by_numb
er_of_native_speakers.png
43

More Related Content

PDF
Nginxを使ったオレオレCDNの構築
PDF
Introduction to TensorFlow 2.0
PDF
Kubernetes meetup-tokyo-13-customizing-kubernetes-for-ml-cluster
PPTX
凡人の凡人による凡人のためのデザインパターン第一幕 Public
PPTX
Imitation learning tutorial
PDF
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
PPTX
さるでも分かりたい9dofで作るクォータニオン姿勢
PDF
[論文紹介] DPSNet: End-to-end Deep Plane Sweep Stereo
Nginxを使ったオレオレCDNの構築
Introduction to TensorFlow 2.0
Kubernetes meetup-tokyo-13-customizing-kubernetes-for-ml-cluster
凡人の凡人による凡人のためのデザインパターン第一幕 Public
Imitation learning tutorial
NVIDIA cuQuantum SDK による量子回路シミュレーターの高速化
さるでも分かりたい9dofで作るクォータニオン姿勢
[論文紹介] DPSNet: End-to-end Deep Plane Sweep Stereo

What's hot (20)

PDF
機械学習システムのアーキテクチャアラカルト
PDF
CUDAメモ
PDF
Git LFSを触ってみた
PDF
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
PPTX
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
PDF
テスト観点に基づくテスト開発方法論 VSTePの概要
PDF
機械学習デザインパターン Machine Learning Design Patterns
PDF
Kanban 101「明日から使えるかもしれないカンバン」
PDF
アーキテクチャのレビューについて - JaSST Review '18
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PDF
サイバージェント 秋葉原ラボのHBase 活用事例
PPTX
An Introduction to ROS-Industrial
PDF
Tier Ⅳ Tech Meetup #2 - 自動運転を作るのはCloudシステムの集合体?? 活用技術を大解剖 -
PDF
リファクタリングで実装が○○分短縮した話
PPTX
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
PPTX
Deep Learning Workflows: Training and Inference
PDF
「とても小さいVim」vim tiny
 
PDF
[DI05] Azure Event Hubs と Azure Stream Analytics で、”今を処理”する
PDF
MapReduce入門
PDF
ドメイン駆動設計 分析しながら設計する
機械学習システムのアーキテクチャアラカルト
CUDAメモ
Git LFSを触ってみた
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
ちゃんとした C# プログラムを書けるようになる実践的な方法~ Visual Studio を使った 高品質・低コスト・保守性の高い開発
テスト観点に基づくテスト開発方法論 VSTePの概要
機械学習デザインパターン Machine Learning Design Patterns
Kanban 101「明日から使えるかもしれないカンバン」
アーキテクチャのレビューについて - JaSST Review '18
Lessons Learned from Building Machine Learning Software at Netflix
サイバージェント 秋葉原ラボのHBase 活用事例
An Introduction to ROS-Industrial
Tier Ⅳ Tech Meetup #2 - 自動運転を作るのはCloudシステムの集合体?? 活用技術を大解剖 -
リファクタリングで実装が○○分短縮した話
DXとかDevOpsとかのなんかいい感じのやつ 富士通TechLive
Deep Learning Workflows: Training and Inference
「とても小さいVim」vim tiny
 
[DI05] Azure Event Hubs と Azure Stream Analytics で、”今を処理”する
MapReduce入門
ドメイン駆動設計 分析しながら設計する
Ad

Similar to “An Industry Standard Performance Benchmark Suite for Machine Learning,” a Presentation from MLPerf (20)

PDF
“MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learnin...
PDF
MLPerf an industry standard benchmark suite for machine learning performance
PDF
DutchMLSchool. ML Automation
PDF
What are the Unique Challenges and Opportunities in Systems for ML?
PPTX
Open, Secure & Transparent AI Pipelines
PDF
MLSEV Virtual. From my First BigML Project to Production
PDF
PyData 2015 Keynote: "A Systems View of Machine Learning"
PDF
Strata parallel m-ml-ops_sept_2017
PDF
Data Workflows for Machine Learning - SF Bay Area ML
PDF
D7 MarkPlus - Machine Learning Algorithm.pdf
PPTX
Melissa Informatics - Data Quality and AI
PPTX
Japan 20200724 v13
PDF
1803.09010.pdf
PDF
Data Workflows for Machine Learning - Seattle DAML
PDF
Utilisation de MLflow pour le cycle de vie des projet Machine learning
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PDF
OSCON 2014: Data Workflows for Machine Learning
PDF
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
PPTX
ONNX and MLflow
PDF
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
“MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learnin...
MLPerf an industry standard benchmark suite for machine learning performance
DutchMLSchool. ML Automation
What are the Unique Challenges and Opportunities in Systems for ML?
Open, Secure & Transparent AI Pipelines
MLSEV Virtual. From my First BigML Project to Production
PyData 2015 Keynote: "A Systems View of Machine Learning"
Strata parallel m-ml-ops_sept_2017
Data Workflows for Machine Learning - SF Bay Area ML
D7 MarkPlus - Machine Learning Algorithm.pdf
Melissa Informatics - Data Quality and AI
Japan 20200724 v13
1803.09010.pdf
Data Workflows for Machine Learning - Seattle DAML
Utilisation de MLflow pour le cycle de vie des projet Machine learning
MLOps and Data Quality: Deploying Reliable ML Models in Production
OSCON 2014: Data Workflows for Machine Learning
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
ONNX and MLflow
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
sap open course for s4hana steps from ECC to s4
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity

“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Presentation from MLPerf

  • 1. MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning (Inference) Christine Cheng Intel, MLPerf Inference Co-chair christine.cheng@intel.com (Work by many people in MLPerf community) Edge AI and Vision Innovation Forum – Invited Talk – July 2020
  • 2. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 2
  • 3. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 3
  • 4. Why Benchmark Machine Learning Systems? • Machine learning needs the entire software stack and hardware working seamlessly together • Exponential growth in research & innovations • Need MLPerf to level the playing field https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8259424 50+ publications every single day! MLArxivPapers Year 4
  • 5. Let’s imagine… • If you have new AI HW which you are trying to sell… • Market promotion • Claim performance against competition • Demonstrate value to customers The Cost of Latency in High-Frequency Trading 5
  • 6. System X System Y What task? What model? What dataset? What batch size? What quantization? What software libraries? … 6 State of ML Benchmarking Today
  • 7. Researchers from MLPerf is an ML performance benchmarking effort with wide industry and academic support. 7
  • 8. MLPerf Goals • Enforce replicability to ensure reliable results • Use representative workloads, reflecting production use-cases • Encourage innovation to improve the state-of-the-art of ML • Accelerate progress in ML via fair and useful measurement • Serve both the commercial and research communities • Keep benchmarking affordable (so that all can play) 8
  • 9. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 9
  • 10. Do you specify the model? Closed division does, Open division does not. MLPerf inference measures rate of inference e.g. image Input e.g. “cat” Result (with required quality, e.g. 75.1%) e.g. ResNet Trained model 10
  • 11. MLPerf v0.5 Inference Workloads Use Case Neural Network Vision ResNet-50 v1.5 SSD ResNet-34 SSD MobileNet v1 (edge only) Speech Language Commerce Datacenter / Edge Inference Minimal viable set for initial launch (v0.5) 11
  • 12. MLPerf v0.7 Inference Workloads Use Case Neural Network Vision ResNet-50 v1.5 SSD ResNet-34 SSD MobileNet v1 (edge only) 3D UNET Speech RNN-T Language BERT Large Commerce DLRM (datacenter only) Datacenter / Edge Inference We evolved from a minimum benchmark set to a broad suite (v0.7) 12 Blue = new in v0.7
  • 13. MLPerf v0.7 Inference Workloads Use Case Neural Network Vision ResNet-50 v1.5 SSD ResNet-34 SSD MobileNet v1 (edge only) 3D UNET Speech RNN-T Language BERT Large Commerce DLRM (datacenter only) Datacenter / Edge Inference Blue = new in v0.7 Mobile Inference Use Case Neural Network Vision MobileNetEdgeTPU SSD-MobileNet v2 DeepLabv3 Languag e Mobile-BERT Cat by Alvesgaspar, Dog by December21st2012Freak We evolved from a minimum benchmark set to a broad suite (v0.7) 13
  • 14. Four scenarios to handle different use cases Single stream (e.g. cell phone augmented vision) Multiple stream (e.g. multiple camera driving assistance) Server (e.g. translation app) Offline (e.g. photo sorting app) 14
  • 15. Different metric for each scenario Single stream e.g. cell phone augmented vision Multiple stream e.g. multiple camera driving assistance Server e.g. translation site Offline e.g. photo sorting Latency Number streams subject to latency bound QPS subject to latency bound Throughput 15
  • 16. Inference Submitters’ Implementations • Even greater range of software and hardware solutions • So, allow submitters to reimplement subject to inference rules • Use standard set of pre-trained weights for Closed Division • Use standard C++ “load generator” that handles scenarios and metrics SUT Common weights Must use Load generator Generates Times Validates 16
  • 17. Not a quantization contest! • Quantization is key to efficient inference, but do not want a quantization contest • Can the Closed division quantize? • Yes, but must be principled: describe reproducible method • Can the Closed division calibrate? • Yes, but must use a fixed set of calibration data • Can the Closed division retrain? • No, not a retraining contest. But, provide retrained 8 bit models.. FP 32 weights FP / INT X weights ? 17
  • 18. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 18
  • 19. Typical Workflow by week number ... ... -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Submission deadline Result publication Code freeze & rule freeze Benchmark list freeze Result review by committee & submitters MLPerf 19
  • 20. Typical Workflow by week number ... ... -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Submission deadline Result publication Result review by committee & submitters Code freeze & rule freeze Benchmark list freeze - propose models - read current rules - Attend weekly submitters meetings to discuss details on rules, models and implementations - Develop SW - Clarify rules - Tuning SW for HW - Pass compliance checker Sign CLA MLPerf Submitter Marketing Preparation SW release Ideal Attend weekly submitter meetings 20
  • 21. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 21
  • 22. Inference Results: Diverse Systems & Applications ● 600+ inference results ● Over 30 systems submitted ● 10,000x difference in performance 22
  • 23. MLPerf is increasing market transparency 23
  • 24. ML hardware is projected to be a ~$60B industry in 2025. (Tractica.com $66.3B, Marketsandmarkets.com: $59.2B) “What get measured, gets improved.” — Peter Drucker Benchmarking aligns research with development, engineering with marketing, and competitors across the industry in pursuit of the same clear objective. 24
  • 25. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? 25
  • 26. Quarterly Cadence (post COVID) May June July Aug. Sept. Oct. Nov. Dec. Training 0.7 Submission Publication Inference 0.7 Submission Publication Training 0.8 Submission Publication Inference 0.8 Submission Publication
  • 27. Area Problem Training Inference Vision Image classification ResNet Object detection SSD-MobileNet v1 SDD-ResNet34 Image segmentation Mask R-CNN Medical Imaging 3D UNET Speech Speech-to-text RNN-T Language Translation GNMT Transformer NLP BERT Commerce Recommendation DLRM Research Reinforcement Learning MiniGo Evolving benchmark suite v0.6 v0.7 v0.8 N/A 27
  • 28. Area Problem Training Inference Mobile Vision Image classification ResNet MobileNetEdgeTPU Object detection SSD-MobileNet v1 SSD-MN v2 SDD-ResNet34 Image segmentation Mask R-CNN DeepLab-MN v3 Medical Imaging 3D UNET Speech Speech-to-text RNN-T Language Translation GNMT Transformer NLP BERT MobileBERT Commerce Recommendation DLRM Research Reinforcement Learning MiniGo Evolving benchmark suite v0.6 v0.7 v0.8 N/A 28
  • 29. Area Problem Training Inference Mobile Vision Image classification ResNet MobileNetEdgeTPU Object detection SSD-MobileNet v1 Upgrade v0.8 SSD-MN v2 SDD-ResNet34 Upgrade v0.8 Upgrade v0.8 Image segmentation Mask R-CNN DeepLab-MN v3 Medical Imaging 3D UNET Speech Speech-to-text RNN-T Language Translation GNMT Remove v0.8 Transformer NLP BERT MobileBERT Commerce Recommendation DLRM Research Reinforcement Learning MiniGo Evolving benchmark suite v0.6 v0.7 v0.8 N/A 29
  • 30. Challenges in 2020 ● Evolve the benchmark suites fairly ● Improve efficiency information ● Reduce result sparsity ● Reduce benchmarking cost 30
  • 31. Evolve the benchmark suites fairly ● What does fair mean? ○ Reflect the most impactful industry and research needs ● What is most impactful? ○ Convene Advisory Boards of 3-5 industry users + 3-5 academics, and ask ○ Existing: Recommendation, Medical Imaging ○ Forming: Automotive, Speech 31
  • 32. Improve efficiency information The problem: Inference in particular is infinitely scalable. Benchmark results are not enough; need more information to determine efficiency. Now: number of chips Crude Better than nothing Future: power, cloud cost, others? Not simple System Offline Inference: ResNet Chips Power Foo 800ips 1 1200w Bar 1000ips 4 400w 32
  • 33. Reduce result sparsity Why do we allow sparse results? Specialized chips Vendors need to focus investment Ways to make results denser? Prune benchmarks x scenarios? Make small number required? Benchmark X Benchmark Y Chip A 23 Chip B 17,583 The problem: 33
  • 34. Reduce benchmarking cost ● Why is good ML benchmarking relatively expensive? a. We benchmark real user value b. Software stack and system diversity; need to allow reimplementation c. Strict requirements on inference model equivalence to the reference ● Ways to reduce cost: a. Out-of-the-box code division? b. Improve our reference code and best practices 34
  • 35. Good ML benchmarking is hard. We welcome ideas and contributions. 35
  • 36. How to get involved? mlperf.org/get-involved info@mlperf.org 36
  • 38. MLPerf is the work of many Aaron Zhong David Patterson Jared Duke Peter Bailis Abid Muslim Debajyoti Pal Jeff Jiao Peter Baldwin Andrew Hock Debo Dutta Jeffery Liao Peter Mattson Ankur Ankur Deepak Narayanan Jonah Alben Ramesh Chukka Anton Lokhmotov Dehao Chen Jonathan Cohen Sachin Idgunji Arun Rajan Dilip Sequeira Kim Hazelwood Sam Davis Ashish Sirasao Ephrem Wu Koichi Yamada Sarah Bird Atsushi Ike Fei Sun Lillian Pentecost Sergey Serebryakov Bill Jia Francisco Massa Lingjie Xu Steve Farrell Bing Yu Frank Wei Mark Charlebois Taylor Robie Brian Anderson Gennady Pekhimenko Masafumi Yamazaki Tayo Oguntebi Carole-Jean Wu George Yuan Matei Zaharia Thomas B. Jablin Christine Cheng Greg Diamos Maximilien Breughe Tom St. John Cliff Young Gu-Yeon Wei Michael Thomson Tsuguchika Tabaru Cody Coleman Guenther Schmuelling Naveen Kumar Udit Gupta Colin Osborne Guokai Ma Pan Deng Victor Bittorf Daniel Kang Hanlin Tang Pankaj Kanwar Vijay Janapa Reddi Dave Fick Itay Hubara Paulius Micikevicious William Chou David Brooks J. Scott Gardner Peizhao Zhang Xinyuan Huang David Kanter Jacob Balma Peng Meng Yuchen Zhou
  • 39. Agenda ● Why do we need MLPerf Inference? ● What’s in MLPerf inference? ● Typical Submission Workflow ● Impact of MLPerf inference 2019 ● What’s in MLPerf inference 2020? ● How else can we make ML better? 39
  • 40. We are creating a non-profit called MLCommons with a mission to “Accelerate ML innovation.” 40
  • 41. Recipe for accelerated innovation Benchmarks Large public datasets + + Best practices Outreach+ Photo credits (left to right): Simon A. Eugster CC BY-SA 3.0, Riksantikvarieämbetet / Pål-Nils Nilsson CC BY 2.5 se, Public Domain, Public Domain 41
  • 42. Best practice example: MLBox MLBox (a Docker) datasets/ params platform_spec plarform_instance Platform (HW+OS) trained_model/ outputs/ logs/ directory file file file directory directory directory Wikimedia https://commons.wikimedia.org/wiki/File:Container_01_KMJ.jpg Need a lightweight way to share models for benchmarking or experiments Basic idea: a Docker with a standard file system interface Goal: shipping container for ML models 42
  • 43. Dataset example: People’s Speech ● Public datasets fuel innovation ○ ~80% of sample of large tech company research papers cite public datasets [ Study by Dr. Vijay Janapa Reddi, Harvard ] ● “People’s Speech” dataset ○ Goal: ■ 100,000+ hours of transcribed speech in diverse languages by diverse speakers ■ Public-use license ○ Why? ■ Smart speakers / assistants expected to reach entire Earth population by 2025 ■ 1000+ languages with 1M+ speakers https://commons.wikimedia.org/wiki/File:List_of_languages_by_numb er_of_native_speakers.png 43