SlideShare a Scribd company logo
Efficient Deep Learning
Amir Alush, PhD
DEEP Neural Networks on Edge Devices
● State-of-the-art in many AI applications
● High computational complexity
● Inference efficiency (not training)
● Edge not Cloud Not on a Pricey GPU
● Maintain accuracy, fast and slim
DEEP Learning Stack
HARDWARE
GPU, CPU, FPGA, ASIC
Deep Learning Libraries
CUDNN, MKL, BLAS, NNPACK, SNAPPY, Core ML
Deep Learning Frameworks
TF, CAFFE, PyTorch, MXNet, Theano
Algorithms
NN Architectures, Meta-Architectures
Deep Learning Hardware & Libraries
Hardware
Libraries
Frameworks
Algorithms
● Multiply-and-Accumulate(MAC)
● Highly parallel by DL libraries:
○ GPU→ cuBLAS/cuDNN
○ CPU → MKL/BLAS/NNPACK
○ ARM CPU → ARM CL, Qualcomm
SNAPPY
● AI accelerators: ASIC / FPGA are more
efficient in terms of energy!
CONV
FC
Deep Learning Frameworks
Hardware
Libraries
Frameworks
Algorithms
● Allows for rapid development and research of algorithms
and algorithms efficiency
● Hardware and Libraries are transparent for the user
● Mostly optimized for training not inference, not edge
Deep Learning Algorithms
Hardware
Libraries
Frameworks
Algorithms
Have a crucial role in efficiency since algorithms
define the model’s complexity and size
Evolution of CNN
Architectures
LeNet5 (1989, LeCun)
● 4 layers: 2 FC, 2 Conv layers.
● Convolution (5x5)→ pooling → nonlinearity (sigmoid)
● 60K weights, 341K MACs per image
● Convolutional Layers: 2.6K weights, 282K MACs
● Fully Connected Layers: 58K weights, 58K MACs
“Gradient-based learning applied to document recognition”, LeCun et al. 1989
AlexNet (2012, krizhevsky)
● 8 layers: 5 Conv, 3 FC
● Convolution (3x3 → 11x11)→ pooling → nonlinearity (relu)
● 61M weights, 724M MACs per image
● More weights more computations!
● Convolutional Layers: 2.3M weights, 666M MACs
● Fully Connected Layers: 58.6M weights, 58.6M MACs
“ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky et al. 2012
Image Source: Kaiming He, CVPR 2017 Tutorial
● 16/19 layers: 13 Conv, 3 FC
● Conv → relu → conv → relu → … → pooling
● 3x3 filters only (stacking for a 5x5 receptive field)
● 138M weights, 15.5G MACs per image
● Convolutional Layers: 14.7M weights, 15.3G MACs
● Fully Connected Layers: 124M weights, 124M MACs
VGG16/19 (2014, Simonyan)
Stack 2 3x3 conv
for a 5x5 conv receptive field.
“Very Deep Convolutional Networks for Large-Scale Image Recognition”, Simonyan et al. 2014
Image Source: Kaiming He, CVPR 2017 Tutorial & A. Karphati
1 fully connected
GoogLenet (2014, Szegedy)
“Going deeper with convolutions”, Szegedy et al. 2014
Image source: .”Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Sze et al.
9inceptionmodules
inception module
3 convolutions
● 21 layers deep: 57 Conv layers,1 FC layer
● Inception modules:
○ Multi-branching with Different filter sizes: 1x1, 3x3, 5x5
○ Shortcuts
○ 1x1 convs “bottleneck” used to reduce #channels
● 7M weights, 1.43G MACs per image
● Convolutional Layers: 6M weights, 1.43G MACs
● Fully Connected Layers: 1M weights, 1M MACs
Inception V1-V3 (Szegedy)
● Inception V1:
○ 30 layers deep
○ 5x5 convs replaced by 2 3x3 convs
○ 9M weights, 1.86G MACs
○ Introduced Batch Normalization
● Inception V2:
○ 42 layers deeps
○ 2.86G Macs
○ Incorporated pooling in convolution
● Inception V3:
○ 25M weights, 5G Macs (+200%)
“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Szegedy et al. 2015
“Rethinking the Inception Architecture for Computer Vision”, Szegedy et al. 2015
Residual Networks (2016, He)
“Deep Residual Learning for Image Recognition”, He et al. 2016
Residual building block
● More than 1000 layers
● Residual connections: more accurate, easier to train,
deeper
● Bottlenecks to make deeper with same complexity
● ResNet 34: 3.6G MACs
● ResNet 50: 3.8G MACs, 25M weights
● ResNet 152: 11.3G MACs
Densely Connected Convolutional Networks (2017, Huang)
● Shortcuts: Inspired by previous architectures (inception,
resnet) allowing data flow from early layers to later layers.
● Connecting all layers (with matching feature sizes) to
each-other.
● Needs less parameters: no need to relearn features!
● Increases data flow and gradient flow = easier to train
● ~x2 less parameters and MACs compared to ResNets
“Densely Connected Convolutional Networks”, Huang et al. 2016
ResNeXT(2017, Xie)
● Inspired by Inception and ResNet
● Introduced Cardinality instead of depth or width
● Keeping run-time complexity and #parameters
like ResNets but improving accuracy.
● Shortcuts, bottlenecks & multi-branching
“Aggregated Residual Transformations for Deep Neural Networks”, Xie et al. 2016
Architectures Thus Far...
● Accuracy is the highest
priority for most researchers,
even when able to reduce
computations, deeper more
complex models are used!
● CNNs complexity Increases
● MACs increases
“An Analysis of Deep Neural Network Models for Practical Applications”, Canziani et al. 2017
Fitting to Hardware
Reduce Model Size & Number of Operations
● Pruning redundant weights and retrain (a.k.a “Brain Damage”):
○ According to some criteria: impact on training-loss, energy
○ Simply remove small weights
● Custom hardware to support sparse matrix multiplications: e.g. EIE
”Optimal Brain Damage”, LeCun et al. 1990
“Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning “, Yang et al. 2017
“Learning both weights and connections for efficient neural networks”, Yang et al. 2015
“EIE: efficient inference engine on compressed deep neural network”, Han et al. 2016
Reduce Model Size & Number of Operations
● Structured pruning: no special hardware
● Low Rank Approximations: e.g. Tucker Decomposition
● Compact networks: refactoring convolutions: MobileNets
● Knowledge Distillation: Student-Teacher Networks
“Distilling the Knowledge in a Neural Network
“Learning structured sparsity in deep neural networks“, Wen, 2016
“COMPRESSION OF DEEP CONVOLUTIONAL NEURAL NETWORKS FOR FAST AND LOW POWER MOBILE APPLICATIONS”, Kim et al. 2016
“MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Application”, Howard et al. 2017
Reduce Precision (quantization) of Weights & Activations
● 32-bit float → 16 / 8 / 4 / 2 / 1 -bit fixed-point
● Weights/Activations quantization: reduces storage / computation
● Different schemes: linear, non-linear, clustering (weight sharing)
● Can be fixed / variable (depending on weights, activations, layers, channels distribution)
● Reduces processing time!
● Decreases accuracy, Re-training helps
”Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Sze et al. 2017
Brodmann17
Research vs Real Life
Research Real-Life
Research vs Real Life
● Flickr/Google Images
● Large objects
● Center location
● Close classes distribution (1 of many)
● Balanced dataset (pos/neg)
● Unlimited Run-Time Resources
● In the wild
● small/medium/large objects
● All over
● Unconstrained (1 vs infinity)
● Highly UnBalanced
● Tight memory/storage/run-time
Research Real-Life
Real-life Applications On Edge Devices Checklist
1. Low memory footprint
2. High throughput
3. High Recall
4. FPR → 0
General Deep Learning Computer Vision Recipe
Recipe:
1. CNN as a powerful feature extractor
2. Specialized NN on top of 1 (classification/regression/segmentation…)
3. Deep meta algorithm for applying 1 + 2
Current Approaches Our Technology
large redundant CNN
Non redundant CNN
compress, approximate, code-butchering
train
Object Detection (what? + where?)
● Much more time consuming than classification models
● Detection CNN = CNN feature extractor + Regression/Classification NN
● Numerous popular algorithms exist today:
”Deep Learning for Objects and Scenes, CVPR2017 Tutorial”, Girshick 2017
”Speed/Accuracy Tradeoffs for Modern Convolutional Object Detectors ”, Huang et al. 2017
Popular Detection Algorithms - run time
Speed is a factor of:
● Image resolution /
Object Size
● Network complexity
Popular Detection Algorithms - object size
”Speed/Accuracy Tradeoffs for Modern Convolutional Object Detectors ”, Huang et al. 2017
Accuracy (higher is better)
Case Study
Method DR @ 0.1
FPPI
DR @ 0.01
FPPI
FPS
(Titan X GPU)
Brodmann17 89.25% 81.88% 200
DeepIR 88.45% 82.19% <=1
Xiaomi (Faster R-CNN) 87.82% 77.99% 2?
Faceness 86.04% 79.67% 1
Hyperface 85.63% 80.68% 0.33
DP2MFD 85.57% 76.73% <0.05
FDDB: 2845 images, 5171 faces
http://vis-www.cs.umass.edu/fddb/results.html
Looking for brilliant
researchers
cv@brodmann17.com
Nir
Netanell
Ben
Ben
Yossi
Shai
30 FPS on 1xA72!

More Related Content

PDF
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
PPTX
Deep Learning Tutorial
PDF
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
PDF
DeepFix: a fully convolutional neural network for predicting human fixations...
PDF
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
PDF
Learning where to look: focus and attention in deep vision
PDF
Convolutional neural networks for image classification — evidence from Kaggle...
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning and Domain Adaptation (D2L3 2017 UPC Deep Learning for Comp...
Deep Learning Tutorial
Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...
DeepFix: a fully convolutional neural network for predicting human fixations...
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Learning where to look: focus and attention in deep vision
Convolutional neural networks for image classification — evidence from Kaggle...
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)

What's hot (20)

PDF
Architecture Design for Deep Neural Networks I
PDF
CNNs: from the Basics to Recent Advances
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PDF
A brief introduction to recent segmentation methods
PDF
Convolutional Neural Network
PDF
[PR12] PR-063: Peephole predicting network performance before training
PPTX
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
Understanding Convolutional Neural Networks
PDF
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
PDF
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
PDF
End-to-End Object Detection with Transformers
PDF
DNR - Auto deep lab paper review ppt
PPTX
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
PDF
An Introduction to Neural Architecture Search
PPTX
Deep Learning in Computer Vision
PPTX
Convolutional neural networks 이론과 응용
PDF
Big data 2.0, deep learning and financial Usecases
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Architecture Design for Deep Neural Networks I
CNNs: from the Basics to Recent Advances
Transfer Learning and Fine-tuning Deep Neural Networks
Object Detection Beyond Mask R-CNN and RetinaNet I
A brief introduction to recent segmentation methods
Convolutional Neural Network
[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Understanding Convolutional Neural Networks
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Big Data Intelligence: from Correlation Discovery to Causal Reasoning
End-to-End Object Detection with Transformers
DNR - Auto deep lab paper review ppt
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
An Introduction to Neural Architecture Search
Deep Learning in Computer Vision
Convolutional neural networks 이론과 응용
Big data 2.0, deep learning and financial Usecases
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Ad

Similar to DLD meetup 2017, Efficient Deep Learning (20)

PDF
Recent developments in Deep Learning
PDF
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
PDF
CNN Algorithm
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
PPTX
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PDF
Computer vision for transportation
PPTX
Deep Learning in Low Power Devices
PPTX
Deep learning and computer vision
PPTX
Deep learning
PPTX
adlkchiuabcndjhvkajnfdkjhcfatgcbajkbcyudfctauygb
PPTX
Introduction to computer vision
PPTX
Introduction to computer vision with Convoluted Neural Networks
PDF
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
PDF
“Introduction to Computer Vision with Convolutional Neural Networks,” a Prese...
PPTX
Computer Vision for Beginners
PDF
Finding the best solution for Image Processing
PDF
imageclassification-160206090009.pdf
Recent developments in Deep Learning
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
CNN Algorithm
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Deep Learning for Image Processing on 16 June 2025 MITS.pptx
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Computer vision for transportation
Deep Learning in Low Power Devices
Deep learning and computer vision
Deep learning
adlkchiuabcndjhvkajnfdkjhcfatgcbajkbcyudfctauygb
Introduction to computer vision
Introduction to computer vision with Convoluted Neural Networks
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
“Introduction to Computer Vision with Convolutional Neural Networks,” a Prese...
Computer Vision for Beginners
Finding the best solution for Image Processing
imageclassification-160206090009.pdf
Ad

More from Brodmann17 (9)

PDF
5 Practical Steps to a Successful Deep Learning Research
PDF
Advanced deep learning based object detection methods
PDF
Fast methods for deep learning based object detection
PDF
Deep learning based object detection basics
PDF
Introduction to object detection
PDF
Deep Learning on Everyday Devices
PDF
Brodmann17 I The rise of edge vision intelligence I Adi Pinhas I DLD 2017
PDF
Brodmann17 CVPR 2017 review - meetup slides
PDF
Geektime 2017
5 Practical Steps to a Successful Deep Learning Research
Advanced deep learning based object detection methods
Fast methods for deep learning based object detection
Deep learning based object detection basics
Introduction to object detection
Deep Learning on Everyday Devices
Brodmann17 I The rise of edge vision intelligence I Adi Pinhas I DLD 2017
Brodmann17 CVPR 2017 review - meetup slides
Geektime 2017

Recently uploaded (20)

PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
neck nodes and dissection types and lymph nodes levels
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
An interstellar mission to test astrophysical black holes
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Biophysics 2.pdffffffffffffffffffffffffff
Derivatives of integument scales, beaks, horns,.pptx
neck nodes and dissection types and lymph nodes levels
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
The KM-GBF monitoring framework – status & key messages.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Classification Systems_TAXONOMY_SCIENCE8.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
TOTAL hIP ARTHROPLASTY Presentation.pptx
Cell Membrane: Structure, Composition & Functions
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
bbec55_b34400a7914c42429908233dbd381773.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
7. General Toxicologyfor clinical phrmacy.pptx
An interstellar mission to test astrophysical black holes
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...

DLD meetup 2017, Efficient Deep Learning

  • 2. DEEP Neural Networks on Edge Devices ● State-of-the-art in many AI applications ● High computational complexity ● Inference efficiency (not training) ● Edge not Cloud Not on a Pricey GPU ● Maintain accuracy, fast and slim
  • 3. DEEP Learning Stack HARDWARE GPU, CPU, FPGA, ASIC Deep Learning Libraries CUDNN, MKL, BLAS, NNPACK, SNAPPY, Core ML Deep Learning Frameworks TF, CAFFE, PyTorch, MXNet, Theano Algorithms NN Architectures, Meta-Architectures
  • 4. Deep Learning Hardware & Libraries Hardware Libraries Frameworks Algorithms ● Multiply-and-Accumulate(MAC) ● Highly parallel by DL libraries: ○ GPU→ cuBLAS/cuDNN ○ CPU → MKL/BLAS/NNPACK ○ ARM CPU → ARM CL, Qualcomm SNAPPY ● AI accelerators: ASIC / FPGA are more efficient in terms of energy! CONV FC
  • 5. Deep Learning Frameworks Hardware Libraries Frameworks Algorithms ● Allows for rapid development and research of algorithms and algorithms efficiency ● Hardware and Libraries are transparent for the user ● Mostly optimized for training not inference, not edge
  • 6. Deep Learning Algorithms Hardware Libraries Frameworks Algorithms Have a crucial role in efficiency since algorithms define the model’s complexity and size
  • 8. LeNet5 (1989, LeCun) ● 4 layers: 2 FC, 2 Conv layers. ● Convolution (5x5)→ pooling → nonlinearity (sigmoid) ● 60K weights, 341K MACs per image ● Convolutional Layers: 2.6K weights, 282K MACs ● Fully Connected Layers: 58K weights, 58K MACs “Gradient-based learning applied to document recognition”, LeCun et al. 1989
  • 9. AlexNet (2012, krizhevsky) ● 8 layers: 5 Conv, 3 FC ● Convolution (3x3 → 11x11)→ pooling → nonlinearity (relu) ● 61M weights, 724M MACs per image ● More weights more computations! ● Convolutional Layers: 2.3M weights, 666M MACs ● Fully Connected Layers: 58.6M weights, 58.6M MACs “ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky et al. 2012 Image Source: Kaiming He, CVPR 2017 Tutorial
  • 10. ● 16/19 layers: 13 Conv, 3 FC ● Conv → relu → conv → relu → … → pooling ● 3x3 filters only (stacking for a 5x5 receptive field) ● 138M weights, 15.5G MACs per image ● Convolutional Layers: 14.7M weights, 15.3G MACs ● Fully Connected Layers: 124M weights, 124M MACs VGG16/19 (2014, Simonyan) Stack 2 3x3 conv for a 5x5 conv receptive field. “Very Deep Convolutional Networks for Large-Scale Image Recognition”, Simonyan et al. 2014 Image Source: Kaiming He, CVPR 2017 Tutorial & A. Karphati
  • 11. 1 fully connected GoogLenet (2014, Szegedy) “Going deeper with convolutions”, Szegedy et al. 2014 Image source: .”Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Sze et al. 9inceptionmodules inception module 3 convolutions ● 21 layers deep: 57 Conv layers,1 FC layer ● Inception modules: ○ Multi-branching with Different filter sizes: 1x1, 3x3, 5x5 ○ Shortcuts ○ 1x1 convs “bottleneck” used to reduce #channels ● 7M weights, 1.43G MACs per image ● Convolutional Layers: 6M weights, 1.43G MACs ● Fully Connected Layers: 1M weights, 1M MACs
  • 12. Inception V1-V3 (Szegedy) ● Inception V1: ○ 30 layers deep ○ 5x5 convs replaced by 2 3x3 convs ○ 9M weights, 1.86G MACs ○ Introduced Batch Normalization ● Inception V2: ○ 42 layers deeps ○ 2.86G Macs ○ Incorporated pooling in convolution ● Inception V3: ○ 25M weights, 5G Macs (+200%) “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, Szegedy et al. 2015 “Rethinking the Inception Architecture for Computer Vision”, Szegedy et al. 2015
  • 13. Residual Networks (2016, He) “Deep Residual Learning for Image Recognition”, He et al. 2016 Residual building block ● More than 1000 layers ● Residual connections: more accurate, easier to train, deeper ● Bottlenecks to make deeper with same complexity ● ResNet 34: 3.6G MACs ● ResNet 50: 3.8G MACs, 25M weights ● ResNet 152: 11.3G MACs
  • 14. Densely Connected Convolutional Networks (2017, Huang) ● Shortcuts: Inspired by previous architectures (inception, resnet) allowing data flow from early layers to later layers. ● Connecting all layers (with matching feature sizes) to each-other. ● Needs less parameters: no need to relearn features! ● Increases data flow and gradient flow = easier to train ● ~x2 less parameters and MACs compared to ResNets “Densely Connected Convolutional Networks”, Huang et al. 2016
  • 15. ResNeXT(2017, Xie) ● Inspired by Inception and ResNet ● Introduced Cardinality instead of depth or width ● Keeping run-time complexity and #parameters like ResNets but improving accuracy. ● Shortcuts, bottlenecks & multi-branching “Aggregated Residual Transformations for Deep Neural Networks”, Xie et al. 2016
  • 16. Architectures Thus Far... ● Accuracy is the highest priority for most researchers, even when able to reduce computations, deeper more complex models are used! ● CNNs complexity Increases ● MACs increases “An Analysis of Deep Neural Network Models for Practical Applications”, Canziani et al. 2017
  • 18. Reduce Model Size & Number of Operations ● Pruning redundant weights and retrain (a.k.a “Brain Damage”): ○ According to some criteria: impact on training-loss, energy ○ Simply remove small weights ● Custom hardware to support sparse matrix multiplications: e.g. EIE ”Optimal Brain Damage”, LeCun et al. 1990 “Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning “, Yang et al. 2017 “Learning both weights and connections for efficient neural networks”, Yang et al. 2015 “EIE: efficient inference engine on compressed deep neural network”, Han et al. 2016
  • 19. Reduce Model Size & Number of Operations ● Structured pruning: no special hardware ● Low Rank Approximations: e.g. Tucker Decomposition ● Compact networks: refactoring convolutions: MobileNets ● Knowledge Distillation: Student-Teacher Networks “Distilling the Knowledge in a Neural Network “Learning structured sparsity in deep neural networks“, Wen, 2016 “COMPRESSION OF DEEP CONVOLUTIONAL NEURAL NETWORKS FOR FAST AND LOW POWER MOBILE APPLICATIONS”, Kim et al. 2016 “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Application”, Howard et al. 2017
  • 20. Reduce Precision (quantization) of Weights & Activations ● 32-bit float → 16 / 8 / 4 / 2 / 1 -bit fixed-point ● Weights/Activations quantization: reduces storage / computation ● Different schemes: linear, non-linear, clustering (weight sharing) ● Can be fixed / variable (depending on weights, activations, layers, channels distribution) ● Reduces processing time! ● Decreases accuracy, Re-training helps ”Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Sze et al. 2017
  • 22. Research vs Real Life Research Real-Life
  • 23. Research vs Real Life ● Flickr/Google Images ● Large objects ● Center location ● Close classes distribution (1 of many) ● Balanced dataset (pos/neg) ● Unlimited Run-Time Resources ● In the wild ● small/medium/large objects ● All over ● Unconstrained (1 vs infinity) ● Highly UnBalanced ● Tight memory/storage/run-time Research Real-Life
  • 24. Real-life Applications On Edge Devices Checklist 1. Low memory footprint 2. High throughput 3. High Recall 4. FPR → 0
  • 25. General Deep Learning Computer Vision Recipe Recipe: 1. CNN as a powerful feature extractor 2. Specialized NN on top of 1 (classification/regression/segmentation…) 3. Deep meta algorithm for applying 1 + 2
  • 26. Current Approaches Our Technology large redundant CNN Non redundant CNN compress, approximate, code-butchering train
  • 27. Object Detection (what? + where?) ● Much more time consuming than classification models ● Detection CNN = CNN feature extractor + Regression/Classification NN ● Numerous popular algorithms exist today: ”Deep Learning for Objects and Scenes, CVPR2017 Tutorial”, Girshick 2017
  • 28. ”Speed/Accuracy Tradeoffs for Modern Convolutional Object Detectors ”, Huang et al. 2017 Popular Detection Algorithms - run time Speed is a factor of: ● Image resolution / Object Size ● Network complexity
  • 29. Popular Detection Algorithms - object size ”Speed/Accuracy Tradeoffs for Modern Convolutional Object Detectors ”, Huang et al. 2017 Accuracy (higher is better)
  • 30. Case Study Method DR @ 0.1 FPPI DR @ 0.01 FPPI FPS (Titan X GPU) Brodmann17 89.25% 81.88% 200 DeepIR 88.45% 82.19% <=1 Xiaomi (Faster R-CNN) 87.82% 77.99% 2? Faceness 86.04% 79.67% 1 Hyperface 85.63% 80.68% 0.33 DP2MFD 85.57% 76.73% <0.05 FDDB: 2845 images, 5171 faces http://vis-www.cs.umass.edu/fddb/results.html