SlideShare a Scribd company logo
Tensorflow Lite
and
Arm Computer Library
Kobe Yu
Why on-device ML?
● Lower lantency, no server calls
● Works offline
● Data stays on device
● Power efficient
● All sensor data accessible on-device
On-device ML is hard
● Tight memory constraints
● Low energy usage to preserve batteries
● Little compute power
Tensorflow Lite
Tensorflow Lite size and speed
● Size
○ Core Interpreter + all supportedops:~400KB
○ How?
■ compact interpreter and flatbuffer parsing
■ tight dependencies
■ selective registration
● Speed
○ flatbuffer directily access data without parsing
○ prefusion operation
○ Hardware acceleration delegates
Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interprer Core
operation kernels
Hardware
accelerator
Mobile devicePC
Model
https://heartbeat.fritz.ai/intro-to-machine-learning-on-android-how-to-convert-a-custom-model-to-tensorflow-lite-e07d2d9d50e3
Tensorflow tools to optimize model (optimize_for_inference.py)
There are several common transformations that can be applied to GraphDefs
created to train a model, that help reduce the amount of computation needed
when the network is used only for inference. These include:
- Removing training-only operations like checkpoint saving.
- Stripping out parts of the graph that are never reached.
- Removing debug operations like CheckNumerics.
- Folding batch normalization ops into the pre-calculated weights.
- Fusing common operations into unified versions.
.tflite
TensorFlow Lite defines a new model file format, based on
FlatBuffers. FlatBuffers is an open-sourced, efficient cross
platform serialization library.
FlatBuffer
FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go,
Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google
for game development and other performance-critical applications.
FlatBuffer
class Person {
String name;
int friendshipStatus;
Person spouse;
List<Person>friends;
}
FlatBuffer
http://labs.gree.jp/blog/2015/11/14495/
Tensorflow Lite Design
Converter
(to tensorflow lite
format)
Interpre Core
operation kernels
Hardware
accelerator
Flatbuffer base model
Prefusion op kernel
Specially optimized kernels
optimized for NEON on ARM
ARM NN SDK
Arm NN bridges the gap between
existing NN frameworks and the
underlying IP. It enables efficient
translation of existing neural
network frameworks, such as
TensorFlow and Caffe, allowing
them to run efficiently – without
modification – across Arm Cortex
CPUs and Arm Mali GPUs.
ARM Computer Library
The Compute Library contains a comprehensive collection of software functions
implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm
Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized
functions that developers can source individually or use as part of complex
pipelines in order to accelerate their algorithms and applications.
ASUS ThinkerBoard
● CPU RK3288
○ Quad-core Cortex-A17 up to 1.8GHz
● GPU
○ ARM Mali™-T764
● Memory
○ 2GB LPDDR3
Run Alexnet on Thinkerboard / PC
CPU NN Framework
Thinker board
(RK3288 Quad-core Cortex-A17
up to 1.8GHz With NEON)
real 0m5.499s
user 0m13.050s
sys 0m0.750s
ARM Compute Library
Lenovo
(Intel(R) Core(TM) i7-6500U CPU
@ 2.50GHz)
real 0m16.067s
user 0m15.544s
sys 0m0.136s
OpenVX

More Related Content

PDF
The Linux Kernel Scheduler (For Beginners) - SFO17-421
PPTX
Advanced Debugging with GDB
PDF
Intro to Embedded OS, RTOS and Communication Protocols
PDF
Qemu Introduction
PDF
系統程式 -- 第 7 章
PDF
Ruby 3の型推論やってます
PPTX
x86x64 SSE4.2 POPCNT
PDF
用十分鐘 向jserv學習作業系統設計
The Linux Kernel Scheduler (For Beginners) - SFO17-421
Advanced Debugging with GDB
Intro to Embedded OS, RTOS and Communication Protocols
Qemu Introduction
系統程式 -- 第 7 章
Ruby 3の型推論やってます
x86x64 SSE4.2 POPCNT
用十分鐘 向jserv學習作業系統設計

What's hot (20)

PDF
SpectreとMeltdown:最近のCPUの深い話
PDF
Making Linux do Hard Real-time
PPT
Sa by shekhar
PDF
Linux Internals - Interview essentials 4.0
PPTX
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
PDF
yieldとreturnの話
PPT
chapter 5 CPU scheduling.ppt
PPTX
ConfD で Linux にNetconfを喋らせてみた
PDF
Machine configoperatorのちょっとイイかもしれない話
PPTX
Multi Processor and Multi Computer Models
PPTX
කෘතිම බුද්ධිය Artificial Intelligence (AI)
PPTX
今さら聞けない人のためのDevOps超入門
PDF
Multiprocessor
PDF
OpenXR – State of the Union - Khronos GDC 2019
PDF
Function Level Analysis of Linux NVMe Driver
PPT
Multi core-architecture
PPTX
PHPerだってMicroservicesしたい!
PDF
覚えておきたいプログラミング作法
PDF
C++でCプリプロセッサを作ったり速くしたりしたお話
PDF
今、改めて考えるPostgreSQLプラットフォーム - マルチクラウドとポータビリティ -(PostgreSQL Conference Japan 20...
SpectreとMeltdown:最近のCPUの深い話
Making Linux do Hard Real-time
Sa by shekhar
Linux Internals - Interview essentials 4.0
Deep Dive into the Linux Kernel - メモリ管理におけるCompaction機能について
yieldとreturnの話
chapter 5 CPU scheduling.ppt
ConfD で Linux にNetconfを喋らせてみた
Machine configoperatorのちょっとイイかもしれない話
Multi Processor and Multi Computer Models
කෘතිම බුද්ධිය Artificial Intelligence (AI)
今さら聞けない人のためのDevOps超入門
Multiprocessor
OpenXR – State of the Union - Khronos GDC 2019
Function Level Analysis of Linux NVMe Driver
Multi core-architecture
PHPerだってMicroservicesしたい!
覚えておきたいプログラミング作法
C++でCプリプロセッサを作ったり速くしたりしたお話
今、改めて考えるPostgreSQLプラットフォーム - マルチクラウドとポータビリティ -(PostgreSQL Conference Japan 20...
Ad

Similar to Tensorflow Lite and ARM Compute Library (20)

PDF
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
PPTX
Cockatrice: A Hardware Design Environment with Elixir
PDF
NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) Architecture
PDF
TFLite NNAPI and GPU Delegates
PDF
Deep Learning on ARM Platforms - SFO17-509
DOCX
Glossary of terms (assignment...)
PPTX
Deep Learning with Spark and GPUs
PDF
Parallel and Distributed Computing Chapter 8
PPTX
Tensorflow
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PDF
Os Lamothe
PPTX
Assembly chapter One.pptx
PDF
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
PDF
Intel Knights Landing Slides
PDF
Accelerating Insights in the Technical Computing Transformation
PPTX
oneAPI: Industry Initiative & Intel Product
PDF
GPEH, PCHR, CHR, MR, SIG, CTUM, CELL TRACE, UETR Parsers - Innovile
PPTX
Stream Processing
PDF
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
Cockatrice: A Hardware Design Environment with Elixir
NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) Architecture
TFLite NNAPI and GPU Delegates
Deep Learning on ARM Platforms - SFO17-509
Glossary of terms (assignment...)
Deep Learning with Spark and GPUs
Parallel and Distributed Computing Chapter 8
Tensorflow
Preparing to program Aurora at Exascale - Early experiences and future direct...
Os Lamothe
Assembly chapter One.pptx
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Intel Knights Landing Slides
Accelerating Insights in the Technical Computing Transformation
oneAPI: Industry Initiative & Intel Product
GPEH, PCHR, CHR, MR, SIG, CTUM, CELL TRACE, UETR Parsers - Innovile
Stream Processing
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ad

More from Kobe Yu (7)

PDF
Neural Network File Format for Inference Framework
PDF
Halide - 2
PDF
Halide - 1
PDF
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
PDF
FarmHarvestBot 開源授權建議
PDF
Agrino 應用於農業感測的開源專案
PDF
機器學習應用於蔬果辨識
Neural Network File Format for Inference Framework
Halide - 2
Halide - 1
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
FarmHarvestBot 開源授權建議
Agrino 應用於農業感測的開源專案
機器學習應用於蔬果辨識

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPT
Introduction Database Management System for Course Database
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
top salesforce developer skills in 2025.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PDF
medical staffing services at VALiNTRY
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How to Migrate SBCGlobal Email to Yahoo Easily
Introduction Database Management System for Course Database
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Wondershare Filmora 15 Crack With Activation Key [2025
Odoo Companies in India – Driving Business Transformation.pdf
ai tools demonstartion for schools and inter college
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
top salesforce developer skills in 2025.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
medical staffing services at VALiNTRY
Operating system designcfffgfgggggggvggggggggg
Navsoft: AI-Powered Business Solutions & Custom Software Development
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Online Work Permit System for Fast Permit Processing

Tensorflow Lite and ARM Compute Library

  • 2. Why on-device ML? ● Lower lantency, no server calls ● Works offline ● Data stays on device ● Power efficient ● All sensor data accessible on-device
  • 3. On-device ML is hard ● Tight memory constraints ● Low energy usage to preserve batteries ● Little compute power
  • 5. Tensorflow Lite size and speed ● Size ○ Core Interpreter + all supportedops:~400KB ○ How? ■ compact interpreter and flatbuffer parsing ■ tight dependencies ■ selective registration ● Speed ○ flatbuffer directily access data without parsing ○ prefusion operation ○ Hardware acceleration delegates
  • 6. Tensorflow Lite Design Converter (to tensorflow lite format) Interprer Core operation kernels Hardware accelerator Mobile devicePC
  • 8. Tensorflow tools to optimize model (optimize_for_inference.py) There are several common transformations that can be applied to GraphDefs created to train a model, that help reduce the amount of computation needed when the network is used only for inference. These include: - Removing training-only operations like checkpoint saving. - Stripping out parts of the graph that are never reached. - Removing debug operations like CheckNumerics. - Folding batch normalization ops into the pre-calculated weights. - Fusing common operations into unified versions.
  • 9. .tflite TensorFlow Lite defines a new model file format, based on FlatBuffers. FlatBuffers is an open-sourced, efficient cross platform serialization library.
  • 10. FlatBuffer FlatBuffers is an efficient cross platform serialization library for C++, C#, C, Go, Java, JavaScript, TypeScript, PHP, and Python. It was originally created at Google for game development and other performance-critical applications.
  • 11. FlatBuffer class Person { String name; int friendshipStatus; Person spouse; List<Person>friends; }
  • 13. Tensorflow Lite Design Converter (to tensorflow lite format) Interpre Core operation kernels Hardware accelerator Flatbuffer base model Prefusion op kernel Specially optimized kernels optimized for NEON on ARM
  • 14. ARM NN SDK Arm NN bridges the gap between existing NN frameworks and the underlying IP. It enables efficient translation of existing neural network frameworks, such as TensorFlow and Caffe, allowing them to run efficiently – without modification – across Arm Cortex CPUs and Arm Mali GPUs.
  • 15. ARM Computer Library The Compute Library contains a comprehensive collection of software functions implemented for the Arm Cortex-A family of CPU processors(NEON) and the Arm Mali family of GPUs(OpenCL). It is a convenient repository of low-level optimized functions that developers can source individually or use as part of complex pipelines in order to accelerate their algorithms and applications.
  • 16. ASUS ThinkerBoard ● CPU RK3288 ○ Quad-core Cortex-A17 up to 1.8GHz ● GPU ○ ARM Mali™-T764 ● Memory ○ 2GB LPDDR3
  • 17. Run Alexnet on Thinkerboard / PC CPU NN Framework Thinker board (RK3288 Quad-core Cortex-A17 up to 1.8GHz With NEON) real 0m5.499s user 0m13.050s sys 0m0.750s ARM Compute Library Lenovo (Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz) real 0m16.067s user 0m15.544s sys 0m0.136s OpenVX