Tensorflow Lite and ARM Compute Library

Tensorflow Lite
and
Arm Computer Library
Kobe Yu

Why on-device ML?
● Lower lantency, no server calls
● Works offline
● Data stays on device
● Power efficient
● All sensor data accessible on-device

On-device ML is hard
● Tight memory constraints
● Low energy usage to preserve batteries
● Little compute power

Tensorflow Lite size and speed
● Size
○ Core Interpreter + all supportedops:~400KB
○ How?
■ compact interpreter and flatbuffer parsing
■ tight dependencies
■ selective registration
● Speed
○ flatbuffer directily access data without parsing
○ prefusion operation
○ Hardware acceleration delegates

Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interprer Core
operation kernels
Hardware
accelerator
Mobile devicePC

Model
https://heartbeat.fritz.ai/intro-to-machine-learning-on-android-how-to-convert-a-custom-model-to-tensorflow-lite-e07d2d9d50e3

Tensorflow tools to optimize model (optimize_for_inference.py)
There are several common transformations that can be applied to GraphDefs
created to train a model, that help reduce the amount of computation needed
when the network is used only for inference. These include:
- Removing training-only operations like checkpoint saving.
- Stripping out parts of the graph that are never reached.
- Removing debug operations like CheckNumerics.
- Folding batch normalization ops into the pre-calculated weights.
- Fusing common operations into unified versions.

.tflite
TensorFlow Lite defines a new model file format, based on
FlatBuffers. FlatBuffers is an open-sourced, efficient cross
platform serialization library.

FlatBuffer
FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go,
Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google
for game development and other performance-critical applications.

FlatBuffer
class Person {
String name;
int friendshipStatus;
Person spouse;
List<Person>friends;
}

FlatBuffer
http://labs.gree.jp/blog/2015/11/14495/

Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interpre Core
operation kernels
Hardware
accelerator
Flatbuffer base model
Prefusion op kernel
Specially optimized kernels
optimized for NEON on ARM

ARM NN SDK
Arm NN bridges the gap between
existing NN frameworks and the
underlying IP. It enables efficient
translation of existing neural
network frameworks, such as
TensorFlow and Caffe, allowing
them to run efficiently – without
modification – across Arm Cortex
CPUs and Arm Mali GPUs.

ARM Computer Library
The Compute Library contains a comprehensive collection of software functions
implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm
Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized
functions that developers can source individually or use as part of complex
pipelines in order to accelerate their algorithms and applications.

ASUS ThinkerBoard
● CPU RK3288
○ Quad-core Cortex-A17 up to 1.8GHz
● GPU
○ ARM Mali™-T764
● Memory
○ 2GB LPDDR3

Run Alexnet on Thinkerboard / PC
CPU NN Framework
Thinker board
(RK3288 Quad-core Cortex-A17
up to 1.8GHz With NEON)
real 0m5.499s
user 0m13.050s
sys 0m0.750s
ARM Compute Library
Lenovo
(Intel(R) Core(TM) i7-6500U CPU
@ 2.50GHz)
real 0m16.067s
user 0m15.544s
sys 0m0.136s
OpenVX

Tensorflow Lite and ARM Compute Library

More Related Content

What's hot (20)

Similar to Tensorflow Lite and ARM Compute Library (20)

More from Kobe Yu (7)

Recently uploaded (20)

Tensorflow Lite and ARM Compute Library