SlideShare a Scribd company logo
Running deep learning onto
heterogenous hardware
Vincent Delaitre
CTO @ deepomatic
Some context
• We build a platform allowing to
easily train and deploy custom
computer vision algorithms
• Our customers are mostly large
industrial groups
• Many different use-cases, one
methodology
« Lean AI »
AIs
deployment
Image
analysis
Feedback loop
Embedded devices
(jetson, movidus, etc.)
On premise &
private cloud
Training in
the cloud
My shopping list for deployment
Running locally
Reliability constraints
Bandwidth constraints
Privacy constraints
Various hardware requirements
High vs Low throughput
May need to be powered with 24V
May need to be as cheap as possible
State of the art
Various meta-architectures:
Caffe / TF / Darknet / ?Pytorch?
Deployment: frameworks VS hardware
Caffe
Tensorflow
Darknet
Pytorch
GPU
Jetson
CPU
Myriad
FPGA
Deployment: frameworks VS hardware
Caffe
Tensorflow
Darknet
Pytorch
GPU
Jetson
CPU
Myriad
FPGA
Solution: use runtimes !
CPU Myriad FPGA
Intel OpenVino
GPU Jetson
Nvidia TensorRT
Solution: use runtimes !
CPU Myriad FPGA
Intel OpenVino
GPU Jetson
Nvidia TensorRT
ONNX
Solution: use runtimes !
Darknet Tensorflow Caffe Pytorch
CPU Myriad FPGA
Intel OpenVino
GPU Jetson
Nvidia TensorRT
ONNX
Deployment logic
Caffe
models
Tensorflow
models
Current architecture
CPU and GPU
worker
Deployment logic
Caffe
models
Tensorflow
models
Current architecture
CPU and GPU
worker
ONNX
models
Target architecture
TensorRT
worker
OpenVino
worker
Why does it matter ?
Speedratiow/rVanillaTF
0
1
2
3
4
Vanilla TF (FP32) TensorRT (FP32) TensorRT (FP16) TensorRT (INT8)
Speed-up
• From Vanilla to INT8: x3 speedup
• Using 4 times less memory, use batch of 4: x3 speedup
Total: x9 speedup
INT8, really ?
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
• Need a representative and diverse validation set
• Increase error on accuracy of 0.4% in average (top-1)
Why does it matter ?
0
4
8
12
16
20
CPU Myriad 2 Myriad X Jetson TX2 Jetson Xavier GPU (Titan X) FPGA (Arria 10)
Speed ratio w/r CPU Price
Flexibility
• Low throughput: CPU
• Moderate throughput or embedded: Myriad X or Jetson X
• High throughput: GPU
What’s next ? Workflows !
Image
Detector
Crop
Box 1 Label 1
Label 1 bisClassifier for « Label 1 »
Crop
Box N Label N
Label N bisClassifier for « Label 1 »
What’s next ? Workflows !
Image
Detector
Crop
Classifier for « Label 1 »
Box 1 Label 1
Label 1 bisClassifier for « Label 1 »
Classifier for « Label 1 »
Fuse
Crop
Classifier for « Label 1 »
Box N Label N
Label N bisClassifier for « Label 1 »
Classifier for « Label N »
Fuse
What’s next ? Workflows !
Image
Detector
Crop
Box K Label K
Label K bis
Classifier for « Label 1 »
Classifier for « Label 1 »
Classifier for « Label K »
Fuse
Jitter
Jitter
Jitter
What’s next ? Workflows !
Image
Detector
Crop
Box K Label K
Label K bis
Classifier for « Label 1 »
Classifier for « Label 1 »
Classifier for « Label K »
Fuse
Jitter
Jitter
Jitter
Tile
Box K Label K
Box K Label K
Box K Label K
Regroup
Facebook Tensor Comprehension
C := Ax
C := a.AB + bC
https://arxiv.org/pdf/1802.04730.pdf
Facebook Tensor Comprehension
https://arxiv.org/pdf/1802.04730.pdf
Computation
graph
Intermediate
representation
Backend code
(CUDA, cuDNN, etc…)
• Graph optimisation
• Operation scheduling
• Operation placement
• Just in time compilation + Autotuning
• For training and inference
Running deep learning onto heterogenous hardware
Merci !
PS: deepomatic recrute !
Et c’est par là: http://careers.deepomatic.com :)

More Related Content

PPTX
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
PDF
Ceph Day Beijing- Ceph Community Update
PPTX
AI on the Edge
PPTX
RISC-V 30946 manuel_offenberg_v3_notes
PDF
Implementing AI: Hardware Challenges: Ultra-Low Power AI at the Edge with Lat...
 
PDF
Implementing AI: Hardware Challenges: Heterogeneous and Adaptive Computing fo...
 
PPTX
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
PDF
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ceph Day Beijing- Ceph Community Update
AI on the Edge
RISC-V 30946 manuel_offenberg_v3_notes
Implementing AI: Hardware Challenges: Ultra-Low Power AI at the Edge with Lat...
 
Implementing AI: Hardware Challenges: Heterogeneous and Adaptive Computing fo...
 
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View

Similar to Running deep learning onto heterogenous hardware (20)

PDF
FPGA Hardware Accelerator for Machine Learning
PPTX
AI Hardware Landscape 2021
PPTX
IoT Tech Expo 2023_Pedro Trancoso presentation
PDF
Distributed deep learning optimizations for Finance
PDF
Enabling a hardware accelerated deep learning data science experience for Apa...
PPTX
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
PDF
Infrastructure and Tooling - Full Stack Deep Learning
PDF
Deep learning: Hardware Landscape
PDF
Deep Learning on ARM Platforms - SFO17-509
PDF
Breaking New Frontiers in Robotics and Edge Computing with AI
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
PDF
AI auf Edge-Geraeten
PDF
IRJET- Python Libraries and Packages for Deep Learning-A Survey
PDF
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
PPTX
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
PPTX
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
PDF
IBM Cloud Paris Meetup - 20190520 - IA & Power
PPTX
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
PPTX
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
PDF
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FPGA Hardware Accelerator for Machine Learning
AI Hardware Landscape 2021
IoT Tech Expo 2023_Pedro Trancoso presentation
Distributed deep learning optimizations for Finance
Enabling a hardware accelerated deep learning data science experience for Apa...
Jeff Johnson, Research Engineer, Facebook at MLconf NYC
Infrastructure and Tooling - Full Stack Deep Learning
Deep learning: Hardware Landscape
Deep Learning on ARM Platforms - SFO17-509
Breaking New Frontiers in Robotics and Edge Computing with AI
Innovation with ai at scale on the edge vt sept 2019 v0
AI auf Edge-Geraeten
IRJET- Python Libraries and Packages for Deep Learning-A Survey
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
Demystifying-AI-Frameworks-TensorFlow-PyTorch-JAX-and-More (1).pptx
IBM Cloud Paris Meetup - 20190520 - IA & Power
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
Ad

Recently uploaded (20)

PPTX
Feature types and data preprocessing steps
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPT
Total quality management ppt for engineering students
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Soil Improvement Techniques Note - Rabbi
PDF
737-MAX_SRG.pdf student reference guides
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Feature types and data preprocessing steps
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Automation-in-Manufacturing-Chapter-Introduction.pdf
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
Total quality management ppt for engineering students
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Exploratory_Data_Analysis_Fundamentals.pdf
Module 8- Technological and Communication Skills.pptx
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Visual Aids for Exploratory Data Analysis.pdf
Soil Improvement Techniques Note - Rabbi
737-MAX_SRG.pdf student reference guides
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Nature of X-rays, X- Ray Equipment, Fluoroscopy
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Ad

Running deep learning onto heterogenous hardware