"New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

© Copyright Khronos Group 2016 - Page 1
Vision and Neural Net Update
November 2016
Neil Trevett
Vice President Developer Ecosystem, NVIDIA | President, Khronos
ntrevett@nvidia.com | @neilt3d

Khronos and Open Standards
Software
Silicon
Khronos is an Industry Consortium of over 100 companies
We create royalty-free, open standard APIs for hardware acceleration of
Graphics, Parallel Compute, Neural Networks and Vision

Accelerated API Landscape
Vision Frameworks
High-level Language-based
Acceleration Frameworks
Explicit
Kernels
GPU FPGA
DSP
Dedicated
Hardware
Neural Net Libraries
OpenVX Neural
Net Extension
3D
Graphics

OpenCL – Low-level Parallel Programing
• Low level programming of heterogeneous parallel compute resources
- One code tree can be executed on CPUs, GPUs, DSPs and FPGA
• OpenCL C language to write kernel programs to execute on any compute device
- Platform Layer API - to query, select and initialize compute devices
- Runtime API - to build and execute kernels programs on multiple devices
• New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14
- Adaptable and elegant sharable code – great for building libraries
- Templates enable meta-programming for highly adaptive software
- Lambdas used to implement nested/dynamic parallelism
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
Kernel code
compiled for
devices
Devices
CPU
Host
Runtime API
loads and executes
kernels across devices

OpenCL Conformant Implementations
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
1.0 | Jul13
1.0 | Aug09
1.0 | May09
1.0 | May10
1.0 | Feb11
1.0 | May09
1.0 | Jan10
1.1 | Aug10
1.1 | Jul11
1.2 | May12
1.2 | Jun12
1.1 | Feb11
1.1 |Mar11
1.1 | Jun10
1.1 | Aug12
1.1 | Nov12
1.1 | May13
1.1 | Apr12
1.2 | Apr14
1.2 | Sep13
1.2 | Dec12
Desktop
Mobile
FPGA
2.0 | Jul14
OpenCL 2.1
Specification
Nov15
1.2 | May15
2.0 | Dec14
1.0 | Dec14
1.2 | Dec14
1.2 | Sep14
Vendor timelines are
first implementation of
each spec generation
1.2 | May15
Embedded
1.2 | Aug15
1.2 | Mar16
2.0 | Nov15
2.1 | Jun15

SYCL for OpenCL
• Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
• Aligns the hardware acceleration of OpenCL with direction of the C++ standard
- C++14 with open source C++17 Parallel STL hosted by Khronos
C++ Kernel Language
Low Level Control
‘GPGPU’-style separation of
device-side kernel source
code and host code
Single-source C++
Programmer Familiarity
Approach also taken by
C++ AMP and OpenMP
Developer Choice
The development of the two specifications are aligned so
code can be easily shared between the two approaches

OpenVX – Low Power Vision Acceleration
• Targeted at vision acceleration in real-time, mobile and embedded platforms
- Precisely defined API for production deployment
• Higher abstraction than OpenCL for performance portability across diverse architectures
- Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…
• Extends portable vision acceleration to very low power domains
- Doesn’t require high-power CPU/GPU Complex or OpenCL precision
GPU
Vision Engine
Middleware
Application
DSP
Hardware
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
Compute
Multi-core
CPUX1
X10
X100
Vision Processing
Efficiency
Vision
DSPs

OpenVX Graphs
• OpenVX developers express a graph of image operations (‘Nodes’)
- Nodes can be on any hardware or processor coded in any language
• Graphs can execute almost autonomously
- Possible to Minimize host interaction during frame-rate graph execution
• Graphs are the key to run-time optimization opportunities…
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color
Conversion
Channel
Extract
Optical
Flow
Harris
Track
Image
Pyramid
RGB
Frame
Array of
Features
Ftrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph

OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a
single faster node
Kernel
Merge
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole system:
CPU / GPU /
dedicated HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity instead
of image
granularity
Data
Tiling
Better use of
data cache and
local memory

Example Relative Performance
1.1
2.9
8.7
1.5
2.5
0
1
2
3
4
5
6
7
8
9
10
Arithmetic Analysis Filter Geometric Overall
OpenCV (GPU accelerated)
OpenVX (GPU accelerated)
Relative Performance
NVIDIA
implementation
experience.
Geometric mean of
>2200 primitives,
grouped into each
categories,
running at different
image sizes and
parameter settings

Dedicated Vision
Hardware
Layered Vision Processing Ecosystem
Programmable Vision
Processors
Application
C/C++
Implementers may use OpenCL or Compute Shaders to
implement OpenVX nodes on programmable processors
And then developers can use OpenVX to enable a
developer to easily connect those nodes into a graph
The OpenVX graph enables implementers to optimize execution across
diverse hardware architectures an drive to lower power implementations
OpenVX enables the graph to be extended to include hardware
architectures that don’t support programmable APIs
AMD OpenVX
- Open source, highly optimized
for x86 CPU and OpenCL for GPU
- “Graph Optimizer” looks at
entire processing pipeline and
removes/replaces/merges
functions to improve
performance and bandwidth
- Scripting for rapid prototyping,
without re-compiling, at
production performance levels
http://gpuopen.com/compute-product/amd-openvx/

OpenVX 1.0 Shipping, OpenVX 1.1 Released!
• Multiple OpenVX 1.0 Implementations shipping – spec in October 2014
- Open source sample implementation and conformance tests available
• OpenVX 1.1 Specification released 2nd May 2016
- Expands node functionality AND enhances graph framework
• OpenVX is EXTENSIBLE
- Implementers can add their own nodes at any time to meet customer and market needs
= shipping implementations

OpenVX Neural Net Extension
• Convolution Neural Network topologies can be represented as OpenVX graphs
- Layers are represented as OpenVX nodes
- Layers connected by multi-dimensional tensors objects
- Layer types include convolution, activation, pooling, fully-connected, soft-max
- CNN nodes can be mixed with traditional vision nodes
• Import/Export Extension
- Efficient handling of network Weights/Biases or complete networks
• The specification is provisional
- Welcome feedback from the deep learning community
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN nodes
with traditional vision nodes

NNEF - Neural Network Exchange Format
NN Authoring Framework 1
Inference Engine 1
Inference Engine 2
Inference Engine 3
NNEF encapsulates neural network structure, data formats, commonly used operations
(such as convolution, pooling, normalization, etc.) and formal network semantics
Inference Engine 1
Inference Engine 2
Inference Engine 3
Neural Net
Exchange
Format
NNEF 1.0 is currently being defined
OpenVX will import NNEF files

Please Consider Joining Khronos!
Influence how
standards evolve!
Access draft specs to
build products faster!
Understand early
industry requirements!
Ship products that conform to
international standards!
Khronos is proven to RAPIDLY generate hardware API standards
that create significant market opportunities
Any company or organization is welcome to join Khronos
for a voice and a vote in any of its standards
www.khronos.org

"New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group (20)

More from Edge AI and Vision Alliance (20)

Recently uploaded (20)

"New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group