SlideShare a Scribd company logo
Confidential © 2020 Arm Limited
AI Virtual Tech Talks Series
Kobus Marneweck, Product Manager, Arm
Anthony Huereca, Systems Engineer, NXP
Machine learning for embedded
systems at the edge
Arm and NXP
June 16, 2020
1
AI VIRTUAL TECH TALKS SERIES
Date Title Host
Today Machine learning for embedded systems at the edge Arm and NXP
June, 30 tinyML development with Tensorflow Lite for Microcontrollers and CMSIS-NN Arm
July, 14 Demystify artificial intelligence on Arm MCUs Cartesiam.ai
July, 28 Speech recognition on Arm Cortex-M Fluent.ai
August, 11
Getting started with Arm Cortex-M software development and Arm Development
Studio
Arm
August, 25 Efficient ML across Arm from Cortex-M to Web Assembly Edge Impulse
Visit: developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks
2
SPEAKERS
Kobus Marneweck, Senior Product Manager
Arm
Anthony Huereca, Embedded Systems Engineer
NXP Semiconductor
3
AGENDA
• ML on the edge
• eIQ deployment
− Arm support for TFLµ
− TensorFlow
− Glow
− Getting started
• The future
• Wrap-up
4
4
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
Machine Learning on
the Edge
5
EXAMPLE EMBEDDED AI APPLICATIONS
Image Classification
• Identify what camera is looking at
− Coffee pods
− Empty vs full trucks
− Factory defects on manufacturing
line
− Produce on supermarket scale
• Personalization based on facial
recognition
− Appliances
− Home
− Toys
− Auto
• Security Video Analysis
Audio Analysis
− Keyword actions
§ “Alexa”/“Hey Google”
− Voice commands
− Alarm Analytics
§ Breaking glass
§ Crying baby
Anomaly Detection
− Identify factory issues before
they become catastrophic
− Smartwatch health monitoring
− Motor performance monitoring
− Sensor Analysis
!
6
MACHINE LEARNING PROCESS
Collect and
Prepare Data Train Model Test Model
Deployed
Model
Iterate on parameters and
algorithm to get best model
Input
Prediction
Training Phase
Inference Phase
1. Training Phase
2. Inference Phase
7
INFERENCE ON THE EDGE
• Inference is using a model to make a prediction on new data
• Data can come from embedded camera, microphone, or sensors
Two possibilities:
Inference on
the Edge
• Increased privacy and security
• Faster response time and throughput
• Lower Power
• Don’t need internet connectivity
Inference on
the Cloud
• Requires network bandwidth
• Latency issues
• Cloud compute costs
8
8
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
NXP Enablement for
Machine Learning
9
NXP BROAD-BASED MACHINE LEARNING SOLUTIONS AND SUPPORT
eIQ™ ML Enablement
• eIQ (edge intelligence) for edge AI/ML inference enablement
• Based on open source technologies (TensorFlow Lite, Arm NN, Glow, ONNX, OpenCV)
• Support for i.MX 8 family, i.MX RT1050/1060/600
• Fully integrated into NXP development environments (MCUXpresso, Yocto/Linux)
• BYOM – Bring Your Own Model
Third Party SW and HW
• Coral Dev Board
• i.MX 8M Development Kit for Amazon®
Alexa Voice Service w/ DSP Concepts
• Au-Zone Network Development Tools
• Arcturus video applications
• SensiML tools for sensor analysis
Turnkey Solutions
• Alexa Voice Services (AVS) solution
• i.MX RT106A (kit – SLN-ALEXA-IOT)
• Local voice control solution
• i.MX RT106L (kit – SLN-LOCAL-IOT)
• Face & emotion recognition solution
• i.MX RT106F (kit – SLN-VIZN-IOT)
DIY
Fully Tested
eIQ
Coral
…. And more
SLN-ALEXA-IOT
1 0
High
performance
Performance
efficiency
Lowest
power & area
Cortex-M0
Lowest cost,
low power
Cortex-M0+
Highest
energy
efficiency
Cortex-M4
Mainstream
control and
DSP
Cortex-M3
Performance
efficiency
Cortex-M7
Maximum
performance,
control and
DSP
Cortex-M23
Smallest
area, lowest
power
Cortex-M33
Flexibility,
control and
DSP
Armv6-M Armv7-M Armv8-M
TrustZone
Cortex-M55
Helium vector
extensions
Optimized for
DSP & ML
Well suited for ML & DSP applications
ARM CORTEX-M PORTFOLIO
1 1
CORTEX-M7: HIGHEST PERFORMANCE CORTEX-M
High performance – dual-issue processor
− Achieves 2.14 DMIPS/MHz, 5.01 CoreMark/MHz
− Achieves 1.4GHz in 16FFC (typ config with caches and FPU)
Retains all of the Cortex-M benefits
− Ease-of-use, low interrupt latency
Flexible memory interfaces
− Up to 16MB TCM for critical data and code
− Up to 64KB I-cache and D-cache
− AXI master interface
Performance
− Floating-point Unit (FPU) – Single precision (SP) and double precision
(DP), sustained 2x 32bit or 2x 16bit MACs per cycle
− Digital signal processing (DSP) extension
1 2
CORTEX-M33: NEXT-GENERATION CORTEX-M WITH TRUSTZONE SECURITY
Industry-standard 32bit processor
− 3-stage pipeline, Harvard architecture
− Extremely flexible design configurations
Wide choice of options for differentiated products
− TrustZone security foundation with up to two memory protection
units (MPUs)
− Digital signal processing (DSP) extension with SIMD, single-cycle
MAC, saturating arithmetic
− Floating-point Unit (FPU)
− Coprocessor interface
− Arm Custom Instructions
− Powerful debug and non-intrusive real-time trace (ETM, MTB)
1 3
1 3
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
eIQ
1 4
eIQ
Inference engines
& ML examples
Collect and
Prepare
Data
Train Model
Test Model
TensorFlow
Lite
CMSIS-NN
Inference
Training
TensorFlow
Keras
Definition
Caffe
PyTorch
other…
Model Frameworks
tflite_convert.py
model_compiler
Convert
code_gen.py
Note: There is no unified method
for converting neural networks
from different frameworks to run
on Arm Cortex-M products
Custom script…
MACHINE LEARNING PROCESS
Pruning
Quantization
Framework dependent
Optimize (optional)
Glow
• CMSIS-NN – Can be used for several different model frameworks
• TensorFlow Lite – Used for TensorFlow model frameworks
• Glow – Machine Learning compiler for several different model frameworks (Coming in July)
i.MX RT eIQ inference engine options:
1 5
eIQ – EDGE INTELLIGENCE
Collection of Libraries and Development Tools for Building Machine Learning Apps
Targeting NXP MCUs and App Processors
Deploying open-source inference
engines
Integrated into Yocto Linux BSP
and MCUXpresso SDK
Supporting materials
for ease of use
Integration and optimization of neural net (NN)
inference engines (Arm NN, Arm CMSIS-NN,
OpenCV, TFLite, ONNX, etc.)
End-to-end examples demonstrating customer
use-cases (e.g. camera à inference engine)
Support for emerging neural net compilers
(e.g. Glow)
Suite of classical ML algorithms such as
support vector machine (SVM) and random
forest
BYOM – Bring Your Own Model
No separate SDK or release to download
• iMX: New layer meta-imx-
machinelearning in Yocto
• MCU: Integrated in MCUXpresso
SDK middleware
Documentation: eIQ White Paper, Release
Notes, eIQ User’s Guide, Demo User’s Guide
Guidelines for importing pretrained models
based on popular NN frameworks (e.g.
TensorFlow, Caffe)
Training collateral for CAS, DFAEs and
customers (e.g. lectures, hands-on, video)
1 6
eIQ DEMO
• Retrained a Mobilenet model written in TensorFlow to identify 5 different flower types
• Use eIQ to run model on i.MX RT1060 EVK
− Lab at https://community.nxp.com/docs/DOC-343827
− Lab steps can be used for any types of images you’re interested in
1 7
1 7
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
eIQ Deployment
Overview
1 8
ADDITIONAL FLAVORS of NXP eIQ™ MACHINE LEARNING DEVELOPMENT ENVIRONMENT
User Application with eIQ Deployment NN Models
Cortex-M DSP Cortex-A GPU ML Accelerator
NXP EIQ
INFERENCE
ENGINES &
LIBRARIES
COMPUTE
ENGINES
NPU µNPU
i.MX RT600
i.MX RT1050
i.MX RT1060
i.MX RT1170
i.MX 8M Plus
i.MX 8QM
i.MX 8QXP
i.MX 8M Quad/Nano
i.MX 8M Mini
i.MX 8M Plus
i.MX 8QM
i.MX 8QXP
i.MX 8M Quad/Nano
i.MX 8M Plus
i.MX
RT600
Future MCU
with
CMSIS-NN
ML CLOUD TRAINING
01001
00101
UNTRAINED
MODEL
UNTRAINED
MODEL
TRAINED
OPTIMIZED
QUANTIZED
MODEL
MICROSOFT
AZURE
GOOGLE
CLOUD
AMAZON
WEB
SERVICES
1 9
eIQ ADVANTAGES
• eIQ implements performance enhancements with CMSIS-NN for Cortex M cores and DSP
− Up to 2.4x improvement in inference time in TensorFlow Lite over original code
• eIQ inference engines work out-of-the-box and are already tested and optimized.
− Get up and running in minutes instead of weeks
Import eIQ
Project
Click
Compile
Button
Click
Program
Button
Use Model
Output
Download
source from
Github
Figure out
which files
are needed
for
embedded
inference
engine
Setup cross-
compiler for
target device
and create
MAKE file
Successfully
compile
project,
working
through any
known bugs
Create
camera
input code
Create LCD
display code
Integrate
camera/LCD
code with
inference
engine code
Configure
Jlink
programing
Download to
board using
Jlink
commands
Check
output
Use Model
Output
Debug
inference,
camera,
LCD, and
integration
code
NXP eIQ Enablement
Roll Your Own
2 0
eIQ FOR I.MX RT
Input
Prediction
Pre-trained
Model
Optimizations
(Quantization/
Pruning)
Convert
Inference
Engine
i.MX RT Device
eIQ
Camera / Microphone / Sensor / Other
Inference engines available with eIQ for i.MX RT:
• CMSIS-NN – Can be used for several different model frameworks
• TensorFlow Lite – Used for TensorFlow model frameworks
• Glow – Machine Learning compiler for several different model frameworks (Coming in July)
Optional
Customer‘s or a third-party model
trained on a CPU, GPU or in the Cloud
PC
2 1
2 1
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
Arm support for TFLµ
2 2
CMSIS-NN INFERENCE
• Developed by Arm
• API to implement common model layers such as convolution, fully-connected, pooling, activation,
etc, efficiently at a low level
• Conversion scripts (provided by Arm) to convert models into CMSIS-NN API calls.
• CMSIS-NN optimized the implementation of inference engines like TFLite micro
(https://www.tensorflow.org/lite/microcontrollers)
2 3
CMSIS-NN OPTIMIZED FOR PERFORMANCE
• Key ML function support
− Aiming for best-in-class performance for Cortex-M CPUs
(compared to other libraries)
− Available now through open source license
• Consistent interface to all Cortex-M CPUs
− Extending to Arm v8-M
• Open-source, via Apache 2.0 license
− https://github.com/ARM-software/CMSIS_5
0
2
4
6
1 2 3 4
Relative
Ops
per
Joule
Energy efficiency improvement
4.9x
higher
eff.
CMSIS-NN
Optimised for Cortex-M CPUs
Armv7-M Armv8.1-M
0
2
4
6
1 2 3 4
Relative
throughtput
CNN Runtime improvement
Series1 Series2
4.6x
higher
perf.
2 4
TOOLS & TFLΜ OPERATOR SUPPORT – CMSIS-NN AND ETHOS MICRONPU
TensorFlow
Lite
Input
File
Micro
TensorFlow
Lite
runtime
Reference kernels
CMSIS-NN
optimized operators
Ethos microNPU
driver
Ethos-U
microNPU
Cortex-M
Armv6M Armv7M
Armv8M Armv8.1M (MVE)
Cortex-M
Offline
Optimization
TensorFlow
Lite
.TF
Input
File
Modified
.TF
Input
File
Optimized custom operators for the microNPU
start
2 5
2 5
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
eIQ TensorFlow
2 6
TENSORFLOW LITE INFERENCE ENGINE
• Developed by Google
−TensorFlow à Training and Inference
−TensorFlow Lite eIQ à NXP’s implementation of TF Lite for MCUs
−TensorFlow Lite Micro à TensorFlow’s implementation of TF Lite for MCUs
• Can only be used with TensorFlow models
• Use tflite_convert utility (provided by TensorFlow) to convert a TensorFlow model to
a .tflite binary
• TFLite flat buffer binary is read from memory by TFLite inference engine running on
i.MX RT
2 7
TENSORFLOW LITE CONVERSION PROCESS
1. Transform a TensorFlow .pb model to TFLite flat buffer file.
2. Convert TFLite flat buffer file to C array in a .h header
3. Copy .h header file into eIQ TensorFlow Lite SDK example
.pb .tflite .h i.MX RT
tflite_convert xxd
eIQ
Import
2 8
TENSORFLOW LITE CODE FLOW
• Import model
#include “mobilenet_model.h”
model = tflite::FlatBufferModel::BuildFromBuffer(mobilenet_model, mobilenet_model_len);
• Get input
/* Extract image from camera to data buffer. */
CSI2Image(data, Rec_w, Rec_h, pExtract, true);
/* Resize image to input tensor size. */
ResizeImage(interpreter->tensor(input), data, Rec_h, Rec_w, image_height, image_width, image_channels, &s);
• Run inference
interpreter->Invoke();
• Get Results
std::vector<std::pair<float, int>> top_results;
GetTopN<float>(interpreter->typed_output_tensor<float>(0), output_size, s->number_of_results, threshold, &top_results, true);
auto result = top_results.front(); //Get results
const float confidence = result.first; //Get confidence level
const int index = result.second; //Get highest class
2 9
GEMMLOWP ASSEMBLY-CODED DSP OPTIMIZATION BENEFITS FOR TENSORFLOW LITE
DSP Optimized (-O2) Reference Kernel (-O2)
Label Image 186 ms 370 ms
CIFAR-10 61 ms 229 ms
IAR EW 8.32.3
DSP Optimized Original
Label Image 217 ms 307 ms
CIFAR-10 67 ms 159 ms
GCC Arm® 8-2018-q4
DSP Optimized Original
Label Image 178 ms 198 ms
CIFAR-10 64 ms 87 ms
Keil MDK 5.27
0
50
100
150
200
250
300
350
400
GCC ARM 8 IAR 8 MDK 5
DSP Optimized Reference Kernel
3 0
3 0
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
eIQ Glow
3 1
GLOW (COMING IN JULY)
• Developed by Facebook
• Glow is a compiler that turns a model into an machine executable binary for the target
device
− Both the model and the inference engine are compiled into the binary that is generated.
− Integrate the generated binary into an SDK software project
− Can make use of compiler optimizations
− Supports ONNX (universal model format) and Caffe2 models
• Cutting-edge inference technology
3 2
PERFORMANCE COMPARISON USING CIFAR-10 MODEL ON RT1050
0
10
20
30
40
50
60
70
Glow w/CMSIS-NN Optimized TensorFlow Lite
3 3
OPTIMIZATIONS FOR GLOW
• NXP developed optimizations for Glow on i.MX RT devices
• Operations can be dispatched to the HiFi4 DSP on RT685
− HiFi4 DSP increases performance up to 34x
• Operations can also use CMSIS-NN library optimizations for all Glow supported devices
Glow Inference Time on RT685 (in milliseconds) MNIST Model CIFAR10 Model
Floating Point Model 104.63 213.78
Floating Point Model using HiFi4 DSP 3.02 13.36
Quantized Model 59.77 165.37
Quantized Model using CMSIS-NN 28.52 89.95
Quantized Model using CMSIS-NN + HiFi4 DSP 2.50 6.70
0.00
20.00
40.00
60.00
80.00
100.00
120.00
No optimizations Using HiFi4
Floating Point MNIST Model
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
No optimizations Using CMSIS-NN Using CMSIS-NN + HiFi4 DSP
Quantized MNIST Model
3 4
GLOW
1. Transform model to the universal ONNX format.
2. Optimize model with profiler to create profile.yml file for quantization
3. Compile with Glow model_compiler to generate compiled files and weights.
4. Copy binary files into eIQ Glow SDK example.
.pb .onnx
.inc
.o
.weights
i.MX RT
tf2onnx model_compiler
eIQ
Import
.yml
image_classifier
3 5
ADD COMPILED CODE TO PROJECT
• Add <network_name>.o compiled file to project settings
• Include <network_name>.h file
• Set input data
• Run model
• Get result
3 6
GLOW MEMORY USAGE
• Glow does not use dynamically allocated memory (heap).
• All the memory requirements of a compiled model can be found in the auto-generated
header file.
// Memory sizes (bytes).
#define CIFAR10_CONSTANT_MEM_SIZE 34176 // Stores model weights. Can be stored in Flash or RAM.
#define CIFAR10_MUTABLE_MEM_SIZE 12352 // Stores model inputs/outputs. Must be in RAM.
#define CIFAR10_ACTIVATIONS_MEM_SIZE 71680 // Store model scratch memory required for intermediate computations. Must be in RAM.
3 7
3 7
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
Getting eIQ
3 8
eIQ IN MCUXPRESSO SDK
• eIQ for the i.MX RT family is included as part of MCUXpresso SDK
(https://mcuxpresso.nxp.com/en/welcome)
• Make sure eIQ selected in MCUXpresso SDK builder:
3 9
eIQ RT1060 SDK examples available:
CIFAR-10
Keyword Spotting
(KWS)
Label Image Anomaly Detection
Description
Classifies 32x32
image from camera
input into one of 10
categories
Detects specific
keywords from
microphone input
Classifies 128x128
image from camera
input into one of 1000
categories using
Mobilenet model
Use FRDM-STBC-
AGM01 sensor board
for accelerometer
anomaly analysis
(Select “agm01” board)
TensorFlow Lite
Example
CMSIS-NN
Example
eIQ EXAMPLES
4 0
TensorFlow Lite Source Code
Source Code for CMSIS-NN Examples
Project Files for Examples
CMSIS-NN Source Code
eIQ FOLDER STRUCTURE
Source Code for TensorFlow Lite Examples
Project Files for TensorFlow Lite Library
4 1
eIQ APP NOTES
• Anomaly Detection with eIQ using K-Means clustering in TF-Lite (AN12766)
• Handwritten Digit Recognition using TensorFlow Lite (AN12603)
Coming Soon:
• Transfer Learning and Datasets
4 2
INFERENCE TIMES
• Benchmarking ongoing and optimizations still under development. Numbers subject to
change
• Inference time heavily dependent on the particular model
− Different images (if same size) will not affect inference time
• Each eIQ example reports inference time
Image Classification (ms) CIFAR-10 (32x32 input) Mobilenet (128x128 input)
RT685 w/ HiFi4 Glow 6.7 61
RT1060 Glow 24 74
RT1060 TensorFlow Lite 64 178
4 3
MEMORY REQUIREMENTS
• Flash: Model, inference engine code, and input data
• RAM: Intermediate products of the model layers
- Size depends on amount of data, size, and type of the layers and is very model dependent
Benchmark and optimizations ongoing. Numbers subject to change:
Model Inference Engine Flash RAM
CIFAR-10 CMSIS-NN 110KB 50KB
CIFAR-10 TensorFlow Lite 600KB (92KB model, 450KB engine) 320KB
CIFAR-10 Glow 69KB 131KB
Mobilenet v1 TensorFlow Lite 1.5MB (450KB model, 450KB engine) 2.5MB
Mobilenet v1 Glow 507KB 1MB
4 4
4 4
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
The future
4 5
PUSHING THE BOUNDARIES FOR REAL-TIME ON-DEVICE PROCESSING
Relative ML and DSP performance
Cortex-M today
Relative
control code
performance Cortex-M35P
Cortex-M33
Cortex-M7
Cortex-M3
Cortex-M23
Cortex-M0+
Cortex-M0
Cortex-M1
Cortex-M4
New Cortex-M CPU
enabled by Helium
Cortex-M55
Arm microNPUs
Signal conditioning
and ML foundation
ML performance and efficiency
Cortex-M55 + Ethos-U55
(multiple performance points available)
Well suited for ML & DSP applications
4 6
CORTEX-M55 & ETHOS-U55: TRANSFORMING CAPABILITIES OF THE SMALLEST DEVICES
Boosting signal processing and ML performance for millions of developers
*Compared to existing Armv8-M implementations
Signal
conditioning
Feature
extraction
Decision
algorithm
Signal processing Machine learning
Cortex-M55
Up to 5x higher
signal processing
performance
(CFFT in int32)
Cortex-M55
Up to 15x
higher ML
performance*
(matrix multiplication in
int8)
Cortex-M55 &
Ethos-U55
Up to 480x
higher ML
performance*
(matrix multiplication in
int8)
4 7
4 7
CONFIDENTIAL & PROPRIETARY
NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V.
ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V.
Summary
4 8
FURTHER READING
• NXP eIQ
• TensorFlow Lite
• Glow
• CMSIS-NN
Machine Learning Courses:
• Video series on Neural Network basics
• Arm Embedded Machine Learning for Dummies
• Google TensorFlow Lab
• Google Machine Learning Crash Course
• Google Image Classification Practical
• YouTube series on the basics of ML and TensorFlow (ML Zero to Hero Series)
Book:
You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It's Making the World a
Weirder Place
4 9
GIT REPOS
• TensorFlow Lite
− https://github.com/tensorflow/tensorflow/tree/v1.13.1/tensorflow/lite
• TensorFlow Lite for Microcontrollers
− https://www.tensorflow.org/lite/microcontrollers
• CMSIS-NN
− https://github.com/ARM-software/CMSIS_5/tree/master/CMSIS/NN
− CIFAR-10: https://github.com/ARM-software/ML-examples/tree/master/cmsisnn-cifar10
− KWS: https://github.com/ARM-software/ML-KWS-for-MCU
• Glow
− https://github.com/pytorch/glow
5 0
NXP eIQ RESOURCES
• eIQ for i.MX RT is included in MCUXpresso SDK
− https://mcuxpresso.nxp.com
− TF-Lite and CMSIS-NN eIQ User Guides in SDK documents
• eIQ available for i.MX RT1050 and i.MX RT1060 today
− Can also run on i.MX RT1064: https://community.nxp.com/docs/DOC-344225
• eIQ available for i.MX RT685 in July
• Transfer Learning Lab: https://community.nxp.com/docs/DOC-343827
• Anomaly Detection App Note: https://www.nxp.com/docs/en/application-note/AN12766.pdf
• Handwritten Digit Recognition: https://www.nxp.com/docs/en/application-note/AN12603.pdf
Virtual Tech Talks Series
Thank You
Danke
Merci
谢谢
ありがとう
Gracias
Kiitos
감사합니다
ध"यवाद
‫ﺷ‬
‫ﻛ‬
‫ر‬
ً
‫ا‬
‫ת‬
‫ו‬
‫ד‬
‫ה‬
Confidential © 2020 Arm Limited
AI Virtual Tech Talks Series
Join our next virtual tech talk:
tinyML development with
Tensorflow Lite for
Microcontrollers and CMSIS-NN
Tuesday 30 June
Register here:
developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
Virtual Tech Talk Series
Thank You
Danke
Merci
谢谢
ありがとう
Gracias
Kiitos
감사합니다
ध"यवाद
‫ﺷ‬
‫ﻛ‬
‫ر‬
ً
‫ا‬
‫ת‬
‫ו‬
‫ד‬
‫ה‬

More Related Content

PPTX
Introduction to FPGA acceleration
PPT
Chapter One
PPTX
What is word2vec?
PPTX
Tokenization using nlp | NLP Course
PPT
Aspetti motori
PDF
UVM ARCHITECTURE FOR VERIFICATION
PDF
Ume 130521 seminario-esri_seguridad_emergencias
PPTX
Natural Language Processing (NLP) - Introduction
Introduction to FPGA acceleration
Chapter One
What is word2vec?
Tokenization using nlp | NLP Course
Aspetti motori
UVM ARCHITECTURE FOR VERIFICATION
Ume 130521 seminario-esri_seguridad_emergencias
Natural Language Processing (NLP) - Introduction

What's hot (20)

PDF
Wp unit 1 (1)
PDF
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
PPTX
System verilog coverage
POTX
LDA Beginner's Tutorial
PPTX
NLP_KASHK:Smoothing N-gram Models
PDF
System verilog verification building blocks
PPTX
CXL chapter1 and chapter 2 presentation.pptx
PPTX
Return Oriented Programming
PDF
ACRiウェビナー:小野様ご講演資料
PPTX
System verilog control flow
PDF
Pci express technology 3.0
PPTX
AMBA 3 APB Protocol
PPTX
Microcontroller
PDF
Verilog Tasks & Functions
PPTX
The pocl Kernel Compiler
PPTX
Nltk
PPTX
視覺化程式語言Logo
PPTX
Video Background Music Generation with Controllable Music Transformer
PPT
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
Wp unit 1 (1)
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
System verilog coverage
LDA Beginner's Tutorial
NLP_KASHK:Smoothing N-gram Models
System verilog verification building blocks
CXL chapter1 and chapter 2 presentation.pptx
Return Oriented Programming
ACRiウェビナー:小野様ご講演資料
System verilog control flow
Pci express technology 3.0
AMBA 3 APB Protocol
Microcontroller
Verilog Tasks & Functions
The pocl Kernel Compiler
Nltk
視覺化程式語言Logo
Video Background Music Generation with Controllable Music Transformer
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
Ad

Similar to ML for embedded systems at the edge - NXP and Arm - FINAL.pdf (20)

PDF
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
PDF
HKG18-312 - CMSIS-NN
PDF
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
PDF
Webinar: Machine Learning para Microcontroladores
PPTX
Onnx at lf oss na 20200629 v5
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
Implementing AI: Running AI at the Edge
 
PDF
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
PDF
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
PDF
Vertex Perspectives | AI Optimized Chipsets | Part II
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
PDF
Intel Powered AI Applications for Telco
PDF
AIDC Summit LA- Hands-on Training
PPTX
AI Hardware Landscape 2021
PDF
Understanding why Artificial Intelligence will become the most prevalent serv...
PDF
Edge AI: Bringing Intelligence to Embedded Devices
PDF
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
PDF
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
PPTX
AI on the Edge
“Introducing the i.MX 93: Your “Go-to” Processor for Embedded Vision,” a Pres...
HKG18-312 - CMSIS-NN
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
Webinar: Machine Learning para Microcontroladores
Onnx at lf oss na 20200629 v5
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Implementing AI: Running AI at the Edge
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
State-Of-The Art Machine Learning Algorithms and How They Are Affected By Nea...
Vertex Perspectives | AI Optimized Chipsets | Part II
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Intel Powered AI Applications for Telco
AIDC Summit LA- Hands-on Training
AI Hardware Landscape 2021
Understanding why Artificial Intelligence will become the most prevalent serv...
Edge AI: Bringing Intelligence to Embedded Devices
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
AI on the Edge
Ad

Recently uploaded (20)

PDF
EC290C NL EC290CNL Volvo excavator specs.pdf
PDF
Todays Technician Automotive Heating & Air Conditioning Classroom Manual and ...
PDF
Marketing project 2024 for marketing students
PDF
Volvo EC300D L EC300DL excavator weight Manuals.pdf
PPTX
laws of thermodynamics with diagrams details
PPTX
IMMUNITY TYPES PPT.pptx very good , sufficient
PDF
EC300D LR EC300DLR - Volvo Service Repair Manual.pdf
PDF
Caterpillar CAT 311B EXCAVATOR (8GR00001-UP) Operation and Maintenance Manual...
PDF
Volvo EC290C NL EC290CNL Excavator Service Repair Manual Instant Download.pdf
PPTX
Independence_Day_Patriotic theme (1).pptx
PDF
Caterpillar Cat 315C Excavator (Prefix CJC) Service Repair Manual Instant Dow...
PDF
Volvo EC20C Excavator Step-by-step Maintenance Instructions pdf
PPTX
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
PDF
Journal Meraj.pdfuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
PDF
Presentation.pdf ...............gjtn....tdubsr..........
PPTX
Small Fleets, Big Change: Market Acceleration by Niki Okuk
PPTX
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PDF
Volvo EC290C NL EC290CNL excavator weight.pdf
PPT
ACCOMPLISHMENT REPOERTS AND FILE OF GRADE 12 2021.ppt
PPTX
Intro to ISO 9001 2015.pptx for awareness
EC290C NL EC290CNL Volvo excavator specs.pdf
Todays Technician Automotive Heating & Air Conditioning Classroom Manual and ...
Marketing project 2024 for marketing students
Volvo EC300D L EC300DL excavator weight Manuals.pdf
laws of thermodynamics with diagrams details
IMMUNITY TYPES PPT.pptx very good , sufficient
EC300D LR EC300DLR - Volvo Service Repair Manual.pdf
Caterpillar CAT 311B EXCAVATOR (8GR00001-UP) Operation and Maintenance Manual...
Volvo EC290C NL EC290CNL Excavator Service Repair Manual Instant Download.pdf
Independence_Day_Patriotic theme (1).pptx
Caterpillar Cat 315C Excavator (Prefix CJC) Service Repair Manual Instant Dow...
Volvo EC20C Excavator Step-by-step Maintenance Instructions pdf
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
Journal Meraj.pdfuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Presentation.pdf ...............gjtn....tdubsr..........
Small Fleets, Big Change: Market Acceleration by Niki Okuk
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Volvo EC290C NL EC290CNL excavator weight.pdf
ACCOMPLISHMENT REPOERTS AND FILE OF GRADE 12 2021.ppt
Intro to ISO 9001 2015.pptx for awareness

ML for embedded systems at the edge - NXP and Arm - FINAL.pdf

  • 1. Confidential © 2020 Arm Limited AI Virtual Tech Talks Series Kobus Marneweck, Product Manager, Arm Anthony Huereca, Systems Engineer, NXP Machine learning for embedded systems at the edge Arm and NXP June 16, 2020
  • 2. 1 AI VIRTUAL TECH TALKS SERIES Date Title Host Today Machine learning for embedded systems at the edge Arm and NXP June, 30 tinyML development with Tensorflow Lite for Microcontrollers and CMSIS-NN Arm July, 14 Demystify artificial intelligence on Arm MCUs Cartesiam.ai July, 28 Speech recognition on Arm Cortex-M Fluent.ai August, 11 Getting started with Arm Cortex-M software development and Arm Development Studio Arm August, 25 Efficient ML across Arm from Cortex-M to Web Assembly Edge Impulse Visit: developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks
  • 3. 2 SPEAKERS Kobus Marneweck, Senior Product Manager Arm Anthony Huereca, Embedded Systems Engineer NXP Semiconductor
  • 4. 3 AGENDA • ML on the edge • eIQ deployment − Arm support for TFLµ − TensorFlow − Glow − Getting started • The future • Wrap-up
  • 5. 4 4 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. Machine Learning on the Edge
  • 6. 5 EXAMPLE EMBEDDED AI APPLICATIONS Image Classification • Identify what camera is looking at − Coffee pods − Empty vs full trucks − Factory defects on manufacturing line − Produce on supermarket scale • Personalization based on facial recognition − Appliances − Home − Toys − Auto • Security Video Analysis Audio Analysis − Keyword actions § “Alexa”/“Hey Google” − Voice commands − Alarm Analytics § Breaking glass § Crying baby Anomaly Detection − Identify factory issues before they become catastrophic − Smartwatch health monitoring − Motor performance monitoring − Sensor Analysis !
  • 7. 6 MACHINE LEARNING PROCESS Collect and Prepare Data Train Model Test Model Deployed Model Iterate on parameters and algorithm to get best model Input Prediction Training Phase Inference Phase 1. Training Phase 2. Inference Phase
  • 8. 7 INFERENCE ON THE EDGE • Inference is using a model to make a prediction on new data • Data can come from embedded camera, microphone, or sensors Two possibilities: Inference on the Edge • Increased privacy and security • Faster response time and throughput • Lower Power • Don’t need internet connectivity Inference on the Cloud • Requires network bandwidth • Latency issues • Cloud compute costs
  • 9. 8 8 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. NXP Enablement for Machine Learning
  • 10. 9 NXP BROAD-BASED MACHINE LEARNING SOLUTIONS AND SUPPORT eIQ™ ML Enablement • eIQ (edge intelligence) for edge AI/ML inference enablement • Based on open source technologies (TensorFlow Lite, Arm NN, Glow, ONNX, OpenCV) • Support for i.MX 8 family, i.MX RT1050/1060/600 • Fully integrated into NXP development environments (MCUXpresso, Yocto/Linux) • BYOM – Bring Your Own Model Third Party SW and HW • Coral Dev Board • i.MX 8M Development Kit for Amazon® Alexa Voice Service w/ DSP Concepts • Au-Zone Network Development Tools • Arcturus video applications • SensiML tools for sensor analysis Turnkey Solutions • Alexa Voice Services (AVS) solution • i.MX RT106A (kit – SLN-ALEXA-IOT) • Local voice control solution • i.MX RT106L (kit – SLN-LOCAL-IOT) • Face & emotion recognition solution • i.MX RT106F (kit – SLN-VIZN-IOT) DIY Fully Tested eIQ Coral …. And more SLN-ALEXA-IOT
  • 11. 1 0 High performance Performance efficiency Lowest power & area Cortex-M0 Lowest cost, low power Cortex-M0+ Highest energy efficiency Cortex-M4 Mainstream control and DSP Cortex-M3 Performance efficiency Cortex-M7 Maximum performance, control and DSP Cortex-M23 Smallest area, lowest power Cortex-M33 Flexibility, control and DSP Armv6-M Armv7-M Armv8-M TrustZone Cortex-M55 Helium vector extensions Optimized for DSP & ML Well suited for ML & DSP applications ARM CORTEX-M PORTFOLIO
  • 12. 1 1 CORTEX-M7: HIGHEST PERFORMANCE CORTEX-M High performance – dual-issue processor − Achieves 2.14 DMIPS/MHz, 5.01 CoreMark/MHz − Achieves 1.4GHz in 16FFC (typ config with caches and FPU) Retains all of the Cortex-M benefits − Ease-of-use, low interrupt latency Flexible memory interfaces − Up to 16MB TCM for critical data and code − Up to 64KB I-cache and D-cache − AXI master interface Performance − Floating-point Unit (FPU) – Single precision (SP) and double precision (DP), sustained 2x 32bit or 2x 16bit MACs per cycle − Digital signal processing (DSP) extension
  • 13. 1 2 CORTEX-M33: NEXT-GENERATION CORTEX-M WITH TRUSTZONE SECURITY Industry-standard 32bit processor − 3-stage pipeline, Harvard architecture − Extremely flexible design configurations Wide choice of options for differentiated products − TrustZone security foundation with up to two memory protection units (MPUs) − Digital signal processing (DSP) extension with SIMD, single-cycle MAC, saturating arithmetic − Floating-point Unit (FPU) − Coprocessor interface − Arm Custom Instructions − Powerful debug and non-intrusive real-time trace (ETM, MTB)
  • 14. 1 3 1 3 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. eIQ
  • 15. 1 4 eIQ Inference engines & ML examples Collect and Prepare Data Train Model Test Model TensorFlow Lite CMSIS-NN Inference Training TensorFlow Keras Definition Caffe PyTorch other… Model Frameworks tflite_convert.py model_compiler Convert code_gen.py Note: There is no unified method for converting neural networks from different frameworks to run on Arm Cortex-M products Custom script… MACHINE LEARNING PROCESS Pruning Quantization Framework dependent Optimize (optional) Glow • CMSIS-NN – Can be used for several different model frameworks • TensorFlow Lite – Used for TensorFlow model frameworks • Glow – Machine Learning compiler for several different model frameworks (Coming in July) i.MX RT eIQ inference engine options:
  • 16. 1 5 eIQ – EDGE INTELLIGENCE Collection of Libraries and Development Tools for Building Machine Learning Apps Targeting NXP MCUs and App Processors Deploying open-source inference engines Integrated into Yocto Linux BSP and MCUXpresso SDK Supporting materials for ease of use Integration and optimization of neural net (NN) inference engines (Arm NN, Arm CMSIS-NN, OpenCV, TFLite, ONNX, etc.) End-to-end examples demonstrating customer use-cases (e.g. camera à inference engine) Support for emerging neural net compilers (e.g. Glow) Suite of classical ML algorithms such as support vector machine (SVM) and random forest BYOM – Bring Your Own Model No separate SDK or release to download • iMX: New layer meta-imx- machinelearning in Yocto • MCU: Integrated in MCUXpresso SDK middleware Documentation: eIQ White Paper, Release Notes, eIQ User’s Guide, Demo User’s Guide Guidelines for importing pretrained models based on popular NN frameworks (e.g. TensorFlow, Caffe) Training collateral for CAS, DFAEs and customers (e.g. lectures, hands-on, video)
  • 17. 1 6 eIQ DEMO • Retrained a Mobilenet model written in TensorFlow to identify 5 different flower types • Use eIQ to run model on i.MX RT1060 EVK − Lab at https://community.nxp.com/docs/DOC-343827 − Lab steps can be used for any types of images you’re interested in
  • 18. 1 7 1 7 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. eIQ Deployment Overview
  • 19. 1 8 ADDITIONAL FLAVORS of NXP eIQ™ MACHINE LEARNING DEVELOPMENT ENVIRONMENT User Application with eIQ Deployment NN Models Cortex-M DSP Cortex-A GPU ML Accelerator NXP EIQ INFERENCE ENGINES & LIBRARIES COMPUTE ENGINES NPU µNPU i.MX RT600 i.MX RT1050 i.MX RT1060 i.MX RT1170 i.MX 8M Plus i.MX 8QM i.MX 8QXP i.MX 8M Quad/Nano i.MX 8M Mini i.MX 8M Plus i.MX 8QM i.MX 8QXP i.MX 8M Quad/Nano i.MX 8M Plus i.MX RT600 Future MCU with CMSIS-NN ML CLOUD TRAINING 01001 00101 UNTRAINED MODEL UNTRAINED MODEL TRAINED OPTIMIZED QUANTIZED MODEL MICROSOFT AZURE GOOGLE CLOUD AMAZON WEB SERVICES
  • 20. 1 9 eIQ ADVANTAGES • eIQ implements performance enhancements with CMSIS-NN for Cortex M cores and DSP − Up to 2.4x improvement in inference time in TensorFlow Lite over original code • eIQ inference engines work out-of-the-box and are already tested and optimized. − Get up and running in minutes instead of weeks Import eIQ Project Click Compile Button Click Program Button Use Model Output Download source from Github Figure out which files are needed for embedded inference engine Setup cross- compiler for target device and create MAKE file Successfully compile project, working through any known bugs Create camera input code Create LCD display code Integrate camera/LCD code with inference engine code Configure Jlink programing Download to board using Jlink commands Check output Use Model Output Debug inference, camera, LCD, and integration code NXP eIQ Enablement Roll Your Own
  • 21. 2 0 eIQ FOR I.MX RT Input Prediction Pre-trained Model Optimizations (Quantization/ Pruning) Convert Inference Engine i.MX RT Device eIQ Camera / Microphone / Sensor / Other Inference engines available with eIQ for i.MX RT: • CMSIS-NN – Can be used for several different model frameworks • TensorFlow Lite – Used for TensorFlow model frameworks • Glow – Machine Learning compiler for several different model frameworks (Coming in July) Optional Customer‘s or a third-party model trained on a CPU, GPU or in the Cloud PC
  • 22. 2 1 2 1 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. Arm support for TFLµ
  • 23. 2 2 CMSIS-NN INFERENCE • Developed by Arm • API to implement common model layers such as convolution, fully-connected, pooling, activation, etc, efficiently at a low level • Conversion scripts (provided by Arm) to convert models into CMSIS-NN API calls. • CMSIS-NN optimized the implementation of inference engines like TFLite micro (https://www.tensorflow.org/lite/microcontrollers)
  • 24. 2 3 CMSIS-NN OPTIMIZED FOR PERFORMANCE • Key ML function support − Aiming for best-in-class performance for Cortex-M CPUs (compared to other libraries) − Available now through open source license • Consistent interface to all Cortex-M CPUs − Extending to Arm v8-M • Open-source, via Apache 2.0 license − https://github.com/ARM-software/CMSIS_5 0 2 4 6 1 2 3 4 Relative Ops per Joule Energy efficiency improvement 4.9x higher eff. CMSIS-NN Optimised for Cortex-M CPUs Armv7-M Armv8.1-M 0 2 4 6 1 2 3 4 Relative throughtput CNN Runtime improvement Series1 Series2 4.6x higher perf.
  • 25. 2 4 TOOLS & TFLΜ OPERATOR SUPPORT – CMSIS-NN AND ETHOS MICRONPU TensorFlow Lite Input File Micro TensorFlow Lite runtime Reference kernels CMSIS-NN optimized operators Ethos microNPU driver Ethos-U microNPU Cortex-M Armv6M Armv7M Armv8M Armv8.1M (MVE) Cortex-M Offline Optimization TensorFlow Lite .TF Input File Modified .TF Input File Optimized custom operators for the microNPU start
  • 26. 2 5 2 5 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. eIQ TensorFlow
  • 27. 2 6 TENSORFLOW LITE INFERENCE ENGINE • Developed by Google −TensorFlow à Training and Inference −TensorFlow Lite eIQ à NXP’s implementation of TF Lite for MCUs −TensorFlow Lite Micro à TensorFlow’s implementation of TF Lite for MCUs • Can only be used with TensorFlow models • Use tflite_convert utility (provided by TensorFlow) to convert a TensorFlow model to a .tflite binary • TFLite flat buffer binary is read from memory by TFLite inference engine running on i.MX RT
  • 28. 2 7 TENSORFLOW LITE CONVERSION PROCESS 1. Transform a TensorFlow .pb model to TFLite flat buffer file. 2. Convert TFLite flat buffer file to C array in a .h header 3. Copy .h header file into eIQ TensorFlow Lite SDK example .pb .tflite .h i.MX RT tflite_convert xxd eIQ Import
  • 29. 2 8 TENSORFLOW LITE CODE FLOW • Import model #include “mobilenet_model.h” model = tflite::FlatBufferModel::BuildFromBuffer(mobilenet_model, mobilenet_model_len); • Get input /* Extract image from camera to data buffer. */ CSI2Image(data, Rec_w, Rec_h, pExtract, true); /* Resize image to input tensor size. */ ResizeImage(interpreter->tensor(input), data, Rec_h, Rec_w, image_height, image_width, image_channels, &s); • Run inference interpreter->Invoke(); • Get Results std::vector<std::pair<float, int>> top_results; GetTopN<float>(interpreter->typed_output_tensor<float>(0), output_size, s->number_of_results, threshold, &top_results, true); auto result = top_results.front(); //Get results const float confidence = result.first; //Get confidence level const int index = result.second; //Get highest class
  • 30. 2 9 GEMMLOWP ASSEMBLY-CODED DSP OPTIMIZATION BENEFITS FOR TENSORFLOW LITE DSP Optimized (-O2) Reference Kernel (-O2) Label Image 186 ms 370 ms CIFAR-10 61 ms 229 ms IAR EW 8.32.3 DSP Optimized Original Label Image 217 ms 307 ms CIFAR-10 67 ms 159 ms GCC Arm® 8-2018-q4 DSP Optimized Original Label Image 178 ms 198 ms CIFAR-10 64 ms 87 ms Keil MDK 5.27 0 50 100 150 200 250 300 350 400 GCC ARM 8 IAR 8 MDK 5 DSP Optimized Reference Kernel
  • 31. 3 0 3 0 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. eIQ Glow
  • 32. 3 1 GLOW (COMING IN JULY) • Developed by Facebook • Glow is a compiler that turns a model into an machine executable binary for the target device − Both the model and the inference engine are compiled into the binary that is generated. − Integrate the generated binary into an SDK software project − Can make use of compiler optimizations − Supports ONNX (universal model format) and Caffe2 models • Cutting-edge inference technology
  • 33. 3 2 PERFORMANCE COMPARISON USING CIFAR-10 MODEL ON RT1050 0 10 20 30 40 50 60 70 Glow w/CMSIS-NN Optimized TensorFlow Lite
  • 34. 3 3 OPTIMIZATIONS FOR GLOW • NXP developed optimizations for Glow on i.MX RT devices • Operations can be dispatched to the HiFi4 DSP on RT685 − HiFi4 DSP increases performance up to 34x • Operations can also use CMSIS-NN library optimizations for all Glow supported devices Glow Inference Time on RT685 (in milliseconds) MNIST Model CIFAR10 Model Floating Point Model 104.63 213.78 Floating Point Model using HiFi4 DSP 3.02 13.36 Quantized Model 59.77 165.37 Quantized Model using CMSIS-NN 28.52 89.95 Quantized Model using CMSIS-NN + HiFi4 DSP 2.50 6.70 0.00 20.00 40.00 60.00 80.00 100.00 120.00 No optimizations Using HiFi4 Floating Point MNIST Model 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 No optimizations Using CMSIS-NN Using CMSIS-NN + HiFi4 DSP Quantized MNIST Model
  • 35. 3 4 GLOW 1. Transform model to the universal ONNX format. 2. Optimize model with profiler to create profile.yml file for quantization 3. Compile with Glow model_compiler to generate compiled files and weights. 4. Copy binary files into eIQ Glow SDK example. .pb .onnx .inc .o .weights i.MX RT tf2onnx model_compiler eIQ Import .yml image_classifier
  • 36. 3 5 ADD COMPILED CODE TO PROJECT • Add <network_name>.o compiled file to project settings • Include <network_name>.h file • Set input data • Run model • Get result
  • 37. 3 6 GLOW MEMORY USAGE • Glow does not use dynamically allocated memory (heap). • All the memory requirements of a compiled model can be found in the auto-generated header file. // Memory sizes (bytes). #define CIFAR10_CONSTANT_MEM_SIZE 34176 // Stores model weights. Can be stored in Flash or RAM. #define CIFAR10_MUTABLE_MEM_SIZE 12352 // Stores model inputs/outputs. Must be in RAM. #define CIFAR10_ACTIVATIONS_MEM_SIZE 71680 // Store model scratch memory required for intermediate computations. Must be in RAM.
  • 38. 3 7 3 7 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. Getting eIQ
  • 39. 3 8 eIQ IN MCUXPRESSO SDK • eIQ for the i.MX RT family is included as part of MCUXpresso SDK (https://mcuxpresso.nxp.com/en/welcome) • Make sure eIQ selected in MCUXpresso SDK builder:
  • 40. 3 9 eIQ RT1060 SDK examples available: CIFAR-10 Keyword Spotting (KWS) Label Image Anomaly Detection Description Classifies 32x32 image from camera input into one of 10 categories Detects specific keywords from microphone input Classifies 128x128 image from camera input into one of 1000 categories using Mobilenet model Use FRDM-STBC- AGM01 sensor board for accelerometer anomaly analysis (Select “agm01” board) TensorFlow Lite Example CMSIS-NN Example eIQ EXAMPLES
  • 41. 4 0 TensorFlow Lite Source Code Source Code for CMSIS-NN Examples Project Files for Examples CMSIS-NN Source Code eIQ FOLDER STRUCTURE Source Code for TensorFlow Lite Examples Project Files for TensorFlow Lite Library
  • 42. 4 1 eIQ APP NOTES • Anomaly Detection with eIQ using K-Means clustering in TF-Lite (AN12766) • Handwritten Digit Recognition using TensorFlow Lite (AN12603) Coming Soon: • Transfer Learning and Datasets
  • 43. 4 2 INFERENCE TIMES • Benchmarking ongoing and optimizations still under development. Numbers subject to change • Inference time heavily dependent on the particular model − Different images (if same size) will not affect inference time • Each eIQ example reports inference time Image Classification (ms) CIFAR-10 (32x32 input) Mobilenet (128x128 input) RT685 w/ HiFi4 Glow 6.7 61 RT1060 Glow 24 74 RT1060 TensorFlow Lite 64 178
  • 44. 4 3 MEMORY REQUIREMENTS • Flash: Model, inference engine code, and input data • RAM: Intermediate products of the model layers - Size depends on amount of data, size, and type of the layers and is very model dependent Benchmark and optimizations ongoing. Numbers subject to change: Model Inference Engine Flash RAM CIFAR-10 CMSIS-NN 110KB 50KB CIFAR-10 TensorFlow Lite 600KB (92KB model, 450KB engine) 320KB CIFAR-10 Glow 69KB 131KB Mobilenet v1 TensorFlow Lite 1.5MB (450KB model, 450KB engine) 2.5MB Mobilenet v1 Glow 507KB 1MB
  • 45. 4 4 4 4 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. The future
  • 46. 4 5 PUSHING THE BOUNDARIES FOR REAL-TIME ON-DEVICE PROCESSING Relative ML and DSP performance Cortex-M today Relative control code performance Cortex-M35P Cortex-M33 Cortex-M7 Cortex-M3 Cortex-M23 Cortex-M0+ Cortex-M0 Cortex-M1 Cortex-M4 New Cortex-M CPU enabled by Helium Cortex-M55 Arm microNPUs Signal conditioning and ML foundation ML performance and efficiency Cortex-M55 + Ethos-U55 (multiple performance points available) Well suited for ML & DSP applications
  • 47. 4 6 CORTEX-M55 & ETHOS-U55: TRANSFORMING CAPABILITIES OF THE SMALLEST DEVICES Boosting signal processing and ML performance for millions of developers *Compared to existing Armv8-M implementations Signal conditioning Feature extraction Decision algorithm Signal processing Machine learning Cortex-M55 Up to 5x higher signal processing performance (CFFT in int32) Cortex-M55 Up to 15x higher ML performance* (matrix multiplication in int8) Cortex-M55 & Ethos-U55 Up to 480x higher ML performance* (matrix multiplication in int8)
  • 48. 4 7 4 7 CONFIDENTIAL & PROPRIETARY NXP, THE NXP LOGO AND NXP SECURE CONNECTIONS FOR A SMARTER WORLD ARE TRADEMARKS OF NXP B.V. ALL OTHER PRODUCT OR SERVICE NAMES ARE THE PROPERTY OF THEIR RESPECTIVE OWNERS. © 2020 NXP B.V. Summary
  • 49. 4 8 FURTHER READING • NXP eIQ • TensorFlow Lite • Glow • CMSIS-NN Machine Learning Courses: • Video series on Neural Network basics • Arm Embedded Machine Learning for Dummies • Google TensorFlow Lab • Google Machine Learning Crash Course • Google Image Classification Practical • YouTube series on the basics of ML and TensorFlow (ML Zero to Hero Series) Book: You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It's Making the World a Weirder Place
  • 50. 4 9 GIT REPOS • TensorFlow Lite − https://github.com/tensorflow/tensorflow/tree/v1.13.1/tensorflow/lite • TensorFlow Lite for Microcontrollers − https://www.tensorflow.org/lite/microcontrollers • CMSIS-NN − https://github.com/ARM-software/CMSIS_5/tree/master/CMSIS/NN − CIFAR-10: https://github.com/ARM-software/ML-examples/tree/master/cmsisnn-cifar10 − KWS: https://github.com/ARM-software/ML-KWS-for-MCU • Glow − https://github.com/pytorch/glow
  • 51. 5 0 NXP eIQ RESOURCES • eIQ for i.MX RT is included in MCUXpresso SDK − https://mcuxpresso.nxp.com − TF-Lite and CMSIS-NN eIQ User Guides in SDK documents • eIQ available for i.MX RT1050 and i.MX RT1060 today − Can also run on i.MX RT1064: https://community.nxp.com/docs/DOC-344225 • eIQ available for i.MX RT685 in July • Transfer Learning Lab: https://community.nxp.com/docs/DOC-343827 • Anomaly Detection App Note: https://www.nxp.com/docs/en/application-note/AN12766.pdf • Handwritten Digit Recognition: https://www.nxp.com/docs/en/application-note/AN12603.pdf
  • 52. Virtual Tech Talks Series Thank You Danke Merci 谢谢 ありがとう Gracias Kiitos 감사합니다 ध"यवाद ‫ﺷ‬ ‫ﻛ‬ ‫ر‬ ً ‫ا‬ ‫ת‬ ‫ו‬ ‫ד‬ ‫ה‬
  • 53. Confidential © 2020 Arm Limited AI Virtual Tech Talks Series Join our next virtual tech talk: tinyML development with Tensorflow Lite for Microcontrollers and CMSIS-NN Tuesday 30 June Register here: developer.arm.com/solutions/machine-learning-on-arm/ai-virtual-tech-talks
  • 54. The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks
  • 55. Virtual Tech Talk Series Thank You Danke Merci 谢谢 ありがとう Gracias Kiitos 감사합니다 ध"यवाद ‫ﺷ‬ ‫ﻛ‬ ‫ر‬ ً ‫ا‬ ‫ת‬ ‫ו‬ ‫ד‬ ‫ה‬