Dov Nimratz, Roman Chobik "Embedded artificial intelligence"

Confidential
2
Embedded Artificial Intelligence
Dov Nimratz & Roman Chobik
Solution Architect
March 2019

Confidential
3
● 30+ years in R&D
● 17 years in Israel HighTech
● ECI, Telrad, RAD, Audiocodes companies
● HW, SW, Mechanical design engineer
● Project & Product Manager
● Business developer for EMEA & CIS
countries
● Solution Architect
● 22 publications, US patent
● Counseling & SW development teaching
About us
● Over 7 years of IT experience
● Embedded Linux programming
● IoT related project.
● C, Python, BLE, Mesh networking, IoT, Embedded, Linux,
ZeroMQ, nRF51, STM8, UART, SPI
● National Technical University of Ukraine Kiev Polytechnic
Institute
● MS in Electronics Engineering

Confidential
4
1. AI algorithms overview
2. Application examples and request for embedded installation
3. Intel Neural Compute Stick overview
4. NCS demonstration for Classification & Detection problems
5. Hardware for Embedded AI
Agenda

Confidential
5
AI algorithms overview

Confidential
6
Image collection

Confidential
7
(assume given set of discrete labels)
{dog, cat, truck, plane, ...}
Image classification - Core stack in ML vision
Cat

Confidential
8
Image classification

Confidential
9
Convolutional network - CNN

Confidential
10
Hardware for recognition

Confidential
11
● Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition,
2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. [PDF]
● Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A discriminatively trained, multiscale, deformable part model."
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008 [PDF]
● Everingham, Mark, et al. "The pascal visual object classes (VOC) challenge." International Journal of Computer Vision 88.2 (2010):
303-338. [PDF]
● Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern Recognition, 2009. CVPR 2009.
IEEE Conference on. IEEE, 2009. [PDF]
● Russakovsky, Olga, et al. "Imagenet Large Scale Visual Recognition Challenge." arXiv:1409.0575. [PDF]
● Lin, Yuanqing, et al. "Large-scale image classification: fast feature extraction and SVM training."
● Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [PDF]
● Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
● convolutional neural networks." Advances in neural information processing systems. 2012. [PDF]
● Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
● [PDF]
● Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014). [PDF]
● He, Kaiming, et al. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition." arXiv preprint arXiv:1406.4729
(2014). [PDF]
● LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[PDF]
● Fei-Fei, Li, et al. "What do we perceive in a glance of a real-world scene?." Journal of vision 7.1 (2007): 10. [PDF]
Reference

Confidential
12
Classification examples on embedded device

Confidential
13
• Secure access control
• Actuators driving for different animal types
• Counting animals
Security camera in yard

Confidential
14
• Sorting garbage or waste
• Integrity control
• Completeness check
Industry or retail sorting

Confidential
15
• Intrusion detection
• Barrier integrity control
• Early warning alarm
Restricted area secure

Confidential
16
• Secure for employees
• Much chipper
• Detect and measure better than human
Construction inspection

Confidential
17
• Power consumption
• Dimensions and weight
• Real time operation
• No network connections
For such application we have challenges
• Optimized model
• Special hardware

Confidential
18
Limit the number of input
channels by adding an extra 1x1
convolution before the 3x3 and 5x5
convolutions
Factorize 5x5 convolution to two
3x3 convolution operations to
improve computational speed
Inception model - next level of engineering optimization

Confidential
19
1. Replace 3x3 filters with 1x1 filters - Fire layer
2. Decrease the number of input channels to 3x3 filters
3. Pooling layer in place of FC layer in the end.
SqueezeNet - 510× smaller than AlexNet
Major principle - use CNN
only where high input exist

Confidential
20
Intel Neural Computer Stick

Confidential
21
Ultra-Low Power with over 1 TOPS
Deep neural network processing unit
VPU architecture which minimizes power by
reducing data movement on-chip
Imaging and vision hardware accelerators
based on VLIW vector processors
16 Programmable 128-bit VLIW Vector
Processors
16 Configurable MIPI Lanes
On-chip memory architecture allows for up to
400 GB/sec of internal bandwidth
Movidius VPU - Vision Processing Unit

Confidential
22
Movidius Myriad X chip

Confidential
23
Implementation on Intel Stick

Confidential
24
Delivery limitations

Confidential
25
The Inference Engine deployment process assumes you used the Model
Optimizer to convert your trained model to an Intermediate Representation.
Deployment Workflow

Confidential
26
79 different topology models
https://github.com/opencv/open_model_zoo/tre
e/2018/model_downloader
Default configuration file is around 4.3 GB
List of Available topologies
densenet-121
densenet-161
densenet-169
densenet-201
squeezenet1.0
squeezenet1.1
mtcnn-p
mtcnn-r
mtcnn-o
mobilenet-ssd
vgg19
vgg16
ssd512
ssd300
inception-resnet-v2
dilation
googlenet-v1
googlenet-v2
googlenet-v4
alexnet
ssd_mobilenet_v2_
coco
resnet-50
resnet-101
resnet-152
googlenet-v3
age-gender-
recognitionemotions-
recognition
face-detection-adas
face-detection-retail
face-reidentification
facial-landmarks
human-pose-
estimationlandmarks-
regression
license-plate-recognition-
barrier
pedestrian-and-vehicle-
detector-adas-0001
pedestrian-and-vehicle-
detector-adas-0001-fp16
pedestrian-detection-adas-
0002
pedestrian-and-
vehicle-detector-
adas-0001-fp16
pedestrian-detection-
adas-0002
pedestrian-detection-
adas-0002-fp16
person-attributes-
recognition-
crossroad-0031
person-attributes-
recognition-
crossroad-0031-fp16
person-detection-
action-recognition-
0003
person-detection-
action-recognition-
0003-fp16
person-detection-
retail-0001

Confidential
27
A summary of the steps for optimizing and deploying a trained model:
• Configure the Model Optimizer for your framework.
- Caffe models
- TensorFlow models
- MXNet models
- ONNX models
- Kaldi models
• Convert a trained model to produce an optimized Intermediate Representation (IR)
- Produce a valid Intermediate Representation. (.xml and .bin)
- Produce an optimized Intermediate Representation. Dropout some layers
• Test the model in the Intermediate Representation format using the Inference Engine
• Integrate the Inference Engine into your application to deploy the model in the target environment.
Module Optimizer

Confidential
28
Caffe*:
● AlexNet
● CaffeNet
● GoogleNet (Inception) v1, v2, v4
● VGG family (VGG16, VGG19)
● SqueezeNet v1.0, v1.1
● ResNet v1 family (18** ***, 50, 101,
152)
● MobileNet
● Inception ResNet v2
● DenseNet family** (121,161,169,201)
● SSD-300, SSD-512, SSD-MobileNet,
SSD-GoogleNet, SSD-SqueezeNet
Supported networks:
MXNet*:
● AlexNet and CaffeNet
● DenseNet family**
(121,161,169,201)
● SqueezeNet v1.1
● MobileNet v1, v2
● NiN
● ResNet v1 (101, 152)
● SqueezeNet v1.1
● VGG family (VGG16,
VGG19)
● SSD-Inception-v3,
SSD-MobileNet, SSD-
ResNet-50, SSD-300
TensorFlow*:
● AlexNet
● Inception v1, v2, v3, v4
● Inception ResNet v2
● MobileNet v1, v2
● ResNet v1 family (50, 101,
152)
● SqueezeNet v1.0, v1.1
● VGG family (VGG16,
VGG19)

Confidential
29
Deployment NN using OpenVINO library

Confidential
31
Inference engine structure

Confidential
32
NCS demonstration

Confidential
33
Main board hardware - Intel Up

Confidential
34
Embedded Hardware

Confidential
36
Next Step
Road Map project - Object classificator:
Integrate few Sticks
Robot comes to the toy and plays relevant
sound:
● Cat
● Dog
● Car, etc
+

Confidential
37
Embedded Word - March 2019 Nuremberg
Google come to the arena - Coral
USB Accelerator
A USB accessory featuring the Edge TPU that
brings ML inferencing to existing systems.
● Supported OS: Debian Linux
● Compatible with Raspberry Pi boards
● Supported Framework: TensorFlow
Lite

Confidential
38
Google ←→ Intel

Confidential
39
• GCP AI based on Coral
• Only TensorFlow light framework
Coral project
• Three type of pre-trained models:
- Image classification
• MobileNet V1/V2
• Inception V1/V2/V3/V4
- Object detection
• MobileNet v1/v2
- Embedded extractor (Classification)
• MobileNet v1
• Possibility to retrain only lat layer or full network
• Two frequency modes

Confidential
40
Real time object detection with Coral Dev Board
Edge TPU Performance Demo
The video demonstrates the real time
processing power of the Edge TPU by running
a MovileNer SSD model that can identify and
classify multiple objects.
The footage of the cars is a recording, but the
MobileNet model is executing in realtime on
CoralDev Board to detect each car included
with a box (limited to 20 detected cars).

Confidential
41
With:
Desktop CPU: 64-bit Intel(R) Xeon(R)
E5–1650 v4 @ 3.60GHz
Embedded CPU: Quad-core Cortex-A53
@ 1.5GHz
Dev Board: Quad-core Cortex-A53 @
1.5GHz + Edge TPU
Google performance test

Confidential
42
Intel:
- Async & Sync calls
- May integrated many
sticks in HUB
- OpenVino library ML
framework independent
solution
- Required OpenVino
installation
- User friendly SDK
- No difference found USB
2/3 for image classification
Compare Intel - Google USB Accelerators
Google:
- 3 time less power
consumption in Standby
mode
- 4 time better
performance with USB 3
- Only TensorFlow light
framework
- Quick training mode with
pretrained model
- Two operation clock
modes
- Nothing to be installed

Confidential
43
Image detection video power consumption:
Intel Neural Network Stick 350 mA (1,75 Watt) with 140
ms detection time
Google Coral Stick 60 ma (300 mWatt) with 17 ms
detection time
Power consumption and performance comparison

Confidential
44
• Inference at the edge
• Offline Inference
• Minimal latency - Real Real-
Time
• Privacy and security
What it does mean

Confidential
46
Terminator had born

Confidential
47
Thank you
Name
Title
Your.name@globallogic.com
+1-000-333-4444
Name
Title
Your.name@globallogic.com
+1-000-333-4444

Dov Nimratz, Roman Chobik "Embedded artificial intelligence"

More Related Content

What's hot (19)

Similar to Dov Nimratz, Roman Chobik "Embedded artificial intelligence" (20)

More from Lviv Startup Club (20)

Recently uploaded (20)

Dov Nimratz, Roman Chobik "Embedded artificial intelligence"

Editor's Notes