SlideShare a Scribd company logo
© Copyright Khronos Group 2016 - Page 1
Vision and Neural Net Update
November 2016
Neil Trevett
Vice President Developer Ecosystem, NVIDIA | President, Khronos
ntrevett@nvidia.com | @neilt3d
© Copyright Khronos Group 2016 - Page 2
Khronos and Open Standards
Software
Silicon
Khronos is an Industry Consortium of over 100 companies
We create royalty-free, open standard APIs for hardware acceleration of
Graphics, Parallel Compute, Neural Networks and Vision
© Copyright Khronos Group 2016 - Page 3
Accelerated API Landscape
Vision Frameworks
High-level Language-based
Acceleration Frameworks
Explicit
Kernels
GPU FPGA
DSP
Dedicated
Hardware
Neural Net Libraries
OpenVX Neural
Net Extension
3D
Graphics
© Copyright Khronos Group 2016 - Page 4
OpenCL – Low-level Parallel Programing
• Low level programming of heterogeneous parallel compute resources
- One code tree can be executed on CPUs, GPUs, DSPs and FPGA
• OpenCL C language to write kernel programs to execute on any compute device
- Platform Layer API - to query, select and initialize compute devices
- Runtime API - to build and execute kernels programs on multiple devices
• New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14
- Adaptable and elegant sharable code – great for building libraries
- Templates enable meta-programming for highly adaptive software
- Lambdas used to implement nested/dynamic parallelism
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
Kernel code
compiled for
devices
Devices
CPU
Host
Runtime API
loads and executes
kernels across devices
© Copyright Khronos Group 2016 - Page 5
OpenCL Conformant Implementations
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
1.0 | Jul13
1.0 | Aug09
1.0 | May09
1.0 | May10
1.0 | Feb11
1.0 | May09
1.0 | Jan10
1.1 | Aug10
1.1 | Jul11
1.2 | May12
1.2 | Jun12
1.1 | Feb11
1.1 |Mar11
1.1 | Jun10
1.1 | Aug12
1.1 | Nov12
1.1 | May13
1.1 | Apr12
1.2 | Apr14
1.2 | Sep13
1.2 | Dec12
Desktop
Mobile
FPGA
2.0 | Jul14
OpenCL 2.1
Specification
Nov15
1.2 | May15
2.0 | Dec14
1.0 | Dec14
1.2 | Dec14
1.2 | Sep14
Vendor timelines are
first implementation of
each spec generation
1.2 | May15
Embedded
1.2 | Aug15
1.2 | Mar16
2.0 | Nov15
2.1 | Jun15
© Copyright Khronos Group 2016 - Page 6
SYCL for OpenCL
• Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
• Aligns the hardware acceleration of OpenCL with direction of the C++ standard
- C++14 with open source C++17 Parallel STL hosted by Khronos
C++ Kernel Language
Low Level Control
‘GPGPU’-style separation of
device-side kernel source
code and host code
Single-source C++
Programmer Familiarity
Approach also taken by
C++ AMP and OpenMP
Developer Choice
The development of the two specifications are aligned so
code can be easily shared between the two approaches
© Copyright Khronos Group 2016 - Page 7
OpenVX – Low Power Vision Acceleration
• Targeted at vision acceleration in real-time, mobile and embedded platforms
- Precisely defined API for production deployment
• Higher abstraction than OpenCL for performance portability across diverse architectures
- Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…
• Extends portable vision acceleration to very low power domains
- Doesn’t require high-power CPU/GPU Complex or OpenCL precision
GPU
Vision Engine
Middleware
Application
DSP
Hardware
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
Compute
Multi-core
CPUX1
X10
X100
Vision Processing
Efficiency
Vision
DSPs
© Copyright Khronos Group 2016 - Page 8
OpenVX Graphs
• OpenVX developers express a graph of image operations (‘Nodes’)
- Nodes can be on any hardware or processor coded in any language
• Graphs can execute almost autonomously
- Possible to Minimize host interaction during frame-rate graph execution
• Graphs are the key to run-time optimization opportunities…
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color
Conversion
Channel
Extract
Optical
Flow
Harris
Track
Image
Pyramid
RGB
Frame
Array of
Features
Ftrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2016 - Page 9
OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a
single faster node
Kernel
Merge
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole system:
CPU / GPU /
dedicated HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity instead
of image
granularity
Data
Tiling
Better use of
data cache and
local memory
© Copyright Khronos Group 2016 - Page 10
Example Relative Performance
1.1
2.9
8.7
1.5
2.5
0
1
2
3
4
5
6
7
8
9
10
Arithmetic Analysis Filter Geometric Overall
OpenCV (GPU accelerated)
OpenVX (GPU accelerated)
Relative Performance
NVIDIA
implementation
experience.
Geometric mean of
>2200 primitives,
grouped into each
categories,
running at different
image sizes and
parameter settings
© Copyright Khronos Group 2016 - Page 11
Dedicated Vision
Hardware
Layered Vision Processing Ecosystem
Programmable Vision
Processors
Application
C/C++
Implementers may use OpenCL or Compute Shaders to
implement OpenVX nodes on programmable processors
And then developers can use OpenVX to enable a
developer to easily connect those nodes into a graph
The OpenVX graph enables implementers to optimize execution across
diverse hardware architectures an drive to lower power implementations
OpenVX enables the graph to be extended to include hardware
architectures that don’t support programmable APIs
AMD OpenVX
- Open source, highly optimized
for x86 CPU and OpenCL for GPU
- “Graph Optimizer” looks at
entire processing pipeline and
removes/replaces/merges
functions to improve
performance and bandwidth
- Scripting for rapid prototyping,
without re-compiling, at
production performance levels
http://gpuopen.com/compute-product/amd-openvx/
© Copyright Khronos Group 2016 - Page 12
OpenVX 1.0 Shipping, OpenVX 1.1 Released!
• Multiple OpenVX 1.0 Implementations shipping – spec in October 2014
- Open source sample implementation and conformance tests available
• OpenVX 1.1 Specification released 2nd May 2016
- Expands node functionality AND enhances graph framework
• OpenVX is EXTENSIBLE
- Implementers can add their own nodes at any time to meet customer and market needs
= shipping implementations
© Copyright Khronos Group 2016 - Page 13
OpenVX Neural Net Extension
• Convolution Neural Network topologies can be represented as OpenVX graphs
- Layers are represented as OpenVX nodes
- Layers connected by multi-dimensional tensors objects
- Layer types include convolution, activation, pooling, fully-connected, soft-max
- CNN nodes can be mixed with traditional vision nodes
• Import/Export Extension
- Efficient handling of network Weights/Biases or complete networks
• The specification is provisional
- Welcome feedback from the deep learning community
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph mixing CNN nodes
with traditional vision nodes
© Copyright Khronos Group 2016 - Page 14
NNEF - Neural Network Exchange Format
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
NNEF encapsulates neural network structure, data formats, commonly used operations
(such as convolution, pooling, normalization, etc.) and formal network semantics
NN Authoring Framework 1
NN Authoring Framework 2
NN Authoring Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Neural Net
Exchange
Format
NNEF 1.0 is currently being defined
OpenVX will import NNEF files
© Copyright Khronos Group 2016 - Page 15
Please Consider Joining Khronos!
Influence how
standards evolve!
Access draft specs to
build products faster!
Understand early
industry requirements!
Ship products that conform to
international standards!
Khronos is proven to RAPIDLY generate hardware API standards
that create significant market opportunities
Any company or organization is welcome to join Khronos
for a voice and a vote in any of its standards
www.khronos.org

More Related Content

PDF
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
PDF
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
PDF
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
PDF
Виктор Ерухимов Open VX mixar moscow sept'15
PDF
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
PDF
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
PDF
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
PDF
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
Виктор Ерухимов Open VX mixar moscow sept'15
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...

What's hot (20)

PDF
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
PDF
Imaging automotive 2015 addfor v002
PDF
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
PDF
Droidcon2013 triangles gangolells_imagination
PDF
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
PDF
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
PDF
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
PDF
“A Survey of CMOS Imagers and Lenses—and the Trade-offs You Should Consider,”...
PDF
Final apu13 phil-rogers-keynote-21
PDF
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
PDF
Introduction to Software Defined Visualization (SDVis)
PDF
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
PDF
OSPRay 1.0 and Beyond
PDF
DRIVE PX 2
PDF
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
PDF
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
PDF
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
PDF
Embedded and Reliable Computer Vision
PDF
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
Imaging automotive 2015 addfor v002
“OpenCV: Past, Present and Future,” a Presentation from OpenCV.org
Droidcon2013 triangles gangolells_imagination
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert He...
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
“A Survey of CMOS Imagers and Lenses—and the Trade-offs You Should Consider,”...
Final apu13 phil-rogers-keynote-21
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
Introduction to Software Defined Visualization (SDVis)
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
OSPRay 1.0 and Beyond
DRIVE PX 2
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Deep Learning on Arm Cortex-M Microcontrollers," a Presentation from Arm
"Dynamically Reconfigurable Processor Technology for Vision Processing," a Pr...
Embedded and Reliable Computer Vision
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
Ad

Viewers also liked (15)

PDF
"Image and Video Summarization," a Presentation from the University of Washin...
PDF
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
PDF
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
PDF
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
PDF
Introduction to OpenCL, 2010
PDF
Forward+ (EUROGRAPHICS 2012)
PDF
OpenCV를 활용한 컬러추적 문자 인식기의 구현
PDF
License Plate Recognition
PPTX
OpenCV 에서 OpenCL 살짝 써보기
PDF
OpenCV Introduction
PPTX
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
ODP
Image Processing with OpenCV
PDF
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
PDF
Building Non-Linear Narratives in Horizon Zero Dawn
PDF
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
"Image and Video Summarization," a Presentation from the University of Washin...
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
"The OpenCV Open Source Computer Vision Library: What’s New and What’s Coming...
Introduction to Monte Carlo Ray Tracing, OpenCL Implementation (CEDEC 2014)
Introduction to OpenCL, 2010
Forward+ (EUROGRAPHICS 2012)
OpenCV를 활용한 컬러추적 문자 인식기의 구현
License Plate Recognition
OpenCV 에서 OpenCL 살짝 써보기
OpenCV Introduction
이기종 멀티코어 프로세서를 위한 프로그래밍 언어 및 영상처리 오픈소스
Image Processing with OpenCV
[2017 GDC] Radeon ProRender and Radeon Rays in a Gaming Rendering Workflow
Building Non-Linear Narratives in Horizon Zero Dawn
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
Ad

Similar to "New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group (20)

PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
PDF
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
PDF
"The OpenVX Computer Vision and Neural Network Inference Library Standard for...
PDF
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
PDF
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
PDF
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
PDF
“OpenVX 1.3: An Open Standard for Computer Vision Software Acceleration,” a P...
PDF
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
PDF
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
PDF
Introduction to OpenVX
PDF
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
PDF
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
PDF
"Making OpenCV Code Run Fast," a Presentation from Intel
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
“Democratizing Computer Vision and Machine Learning with Open, Royalty-Free S...
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
PDF
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
PDF
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
PDF
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"The OpenVX Computer Vision and Neural Network Inference Library Standard for...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“OpenVX 1.3: An Open Standard for Computer Vision Software Acceleration,” a P...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Introduction to OpenVX
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
"Making OpenCV Code Run Fast," a Presentation from Intel
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
“Democratizing Computer Vision and Machine Learning with Open, Royalty-Free S...
OpenCL Overview Japan Virtual Open House Feb 2021
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Big Data Technologies - Introduction.pptx

"New Standards for Embedded Vision and Neural Networks," a Presentation from the Khronos Group

  • 1. © Copyright Khronos Group 2016 - Page 1 Vision and Neural Net Update November 2016 Neil Trevett Vice President Developer Ecosystem, NVIDIA | President, Khronos ntrevett@nvidia.com | @neilt3d
  • 2. © Copyright Khronos Group 2016 - Page 2 Khronos and Open Standards Software Silicon Khronos is an Industry Consortium of over 100 companies We create royalty-free, open standard APIs for hardware acceleration of Graphics, Parallel Compute, Neural Networks and Vision
  • 3. © Copyright Khronos Group 2016 - Page 3 Accelerated API Landscape Vision Frameworks High-level Language-based Acceleration Frameworks Explicit Kernels GPU FPGA DSP Dedicated Hardware Neural Net Libraries OpenVX Neural Net Extension 3D Graphics
  • 4. © Copyright Khronos Group 2016 - Page 4 OpenCL – Low-level Parallel Programing • Low level programming of heterogeneous parallel compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA • OpenCL C language to write kernel programs to execute on any compute device - Platform Layer API - to query, select and initialize compute devices - Runtime API - to build and execute kernels programs on multiple devices • New in OpenCL 2.2 - OpenCL C++ kernel language - a static subset of C++14 - Adaptable and elegant sharable code – great for building libraries - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code GPU DSP CPU CPU FPGA Kernel code compiled for devices Devices CPU Host Runtime API loads and executes kernels across devices
  • 5. © Copyright Khronos Group 2016 - Page 5 OpenCL Conformant Implementations OpenCL 1.0 Specification Dec08 Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification OpenCL 2.0 Specification Nov13 1.0 | Jul13 1.0 | Aug09 1.0 | May09 1.0 | May10 1.0 | Feb11 1.0 | May09 1.0 | Jan10 1.1 | Aug10 1.1 | Jul11 1.2 | May12 1.2 | Jun12 1.1 | Feb11 1.1 |Mar11 1.1 | Jun10 1.1 | Aug12 1.1 | Nov12 1.1 | May13 1.1 | Apr12 1.2 | Apr14 1.2 | Sep13 1.2 | Dec12 Desktop Mobile FPGA 2.0 | Jul14 OpenCL 2.1 Specification Nov15 1.2 | May15 2.0 | Dec14 1.0 | Dec14 1.2 | Dec14 1.2 | Sep14 Vendor timelines are first implementation of each spec generation 1.2 | May15 Embedded 1.2 | Aug15 1.2 | Mar16 2.0 | Nov15 2.1 | Jun15
  • 6. © Copyright Khronos Group 2016 - Page 6 SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code • Aligns the hardware acceleration of OpenCL with direction of the C++ standard - C++14 with open source C++17 Parallel STL hosted by Khronos C++ Kernel Language Low Level Control ‘GPGPU’-style separation of device-side kernel source code and host code Single-source C++ Programmer Familiarity Approach also taken by C++ AMP and OpenMP Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches
  • 7. © Copyright Khronos Group 2016 - Page 7 OpenVX – Low Power Vision Acceleration • Targeted at vision acceleration in real-time, mobile and embedded platforms - Precisely defined API for production deployment • Higher abstraction than OpenCL for performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware… • Extends portable vision acceleration to very low power domains - Doesn’t require high-power CPU/GPU Complex or OpenCL precision GPU Vision Engine Middleware Application DSP Hardware PowerEfficiency Computation Flexibility Dedicated Hardware GPU Compute Multi-core CPUX1 X10 X100 Vision Processing Efficiency Vision DSPs
  • 8. © Copyright Khronos Group 2016 - Page 8 OpenVX Graphs • OpenVX developers express a graph of image operations (‘Nodes’) - Nodes can be on any hardware or processor coded in any language • Graphs can execute almost autonomously - Possible to Minimize host interaction during frame-rate graph execution • Graphs are the key to run-time optimization opportunities… Array of Keypoints YUV Frame Gray Frame Camera Input Rendering Output Pyrt Color Conversion Channel Extract Optical Flow Harris Track Image Pyramid RGB Frame Array of Features Ftrt-1OpenVX Graph OpenVX Nodes Feature Extraction Example Graph
  • 9. © Copyright Khronos Group 2016 - Page 9 OpenVX Efficiency through Graphs.. Reuse pre-allocated memory for multiple intermediate data Memory Management Less allocation overhead, more memory for other applications Replace a sub- graph with a single faster node Kernel Merge Better memory locality, less kernel launch overhead Split the graph execution across the whole system: CPU / GPU / dedicated HW Graph Scheduling Faster execution or lower power consumption Execute a sub- graph at tile granularity instead of image granularity Data Tiling Better use of data cache and local memory
  • 10. © Copyright Khronos Group 2016 - Page 10 Example Relative Performance 1.1 2.9 8.7 1.5 2.5 0 1 2 3 4 5 6 7 8 9 10 Arithmetic Analysis Filter Geometric Overall OpenCV (GPU accelerated) OpenVX (GPU accelerated) Relative Performance NVIDIA implementation experience. Geometric mean of >2200 primitives, grouped into each categories, running at different image sizes and parameter settings
  • 11. © Copyright Khronos Group 2016 - Page 11 Dedicated Vision Hardware Layered Vision Processing Ecosystem Programmable Vision Processors Application C/C++ Implementers may use OpenCL or Compute Shaders to implement OpenVX nodes on programmable processors And then developers can use OpenVX to enable a developer to easily connect those nodes into a graph The OpenVX graph enables implementers to optimize execution across diverse hardware architectures an drive to lower power implementations OpenVX enables the graph to be extended to include hardware architectures that don’t support programmable APIs AMD OpenVX - Open source, highly optimized for x86 CPU and OpenCL for GPU - “Graph Optimizer” looks at entire processing pipeline and removes/replaces/merges functions to improve performance and bandwidth - Scripting for rapid prototyping, without re-compiling, at production performance levels http://gpuopen.com/compute-product/amd-openvx/
  • 12. © Copyright Khronos Group 2016 - Page 12 OpenVX 1.0 Shipping, OpenVX 1.1 Released! • Multiple OpenVX 1.0 Implementations shipping – spec in October 2014 - Open source sample implementation and conformance tests available • OpenVX 1.1 Specification released 2nd May 2016 - Expands node functionality AND enhances graph framework • OpenVX is EXTENSIBLE - Implementers can add their own nodes at any time to meet customer and market needs = shipping implementations
  • 13. © Copyright Khronos Group 2016 - Page 13 OpenVX Neural Net Extension • Convolution Neural Network topologies can be represented as OpenVX graphs - Layers are represented as OpenVX nodes - Layers connected by multi-dimensional tensors objects - Layer types include convolution, activation, pooling, fully-connected, soft-max - CNN nodes can be mixed with traditional vision nodes • Import/Export Extension - Efficient handling of network Weights/Biases or complete networks • The specification is provisional - Welcome feedback from the deep learning community Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes An OpenVX graph mixing CNN nodes with traditional vision nodes
  • 14. © Copyright Khronos Group 2016 - Page 14 NNEF - Neural Network Exchange Format NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 NNEF encapsulates neural network structure, data formats, commonly used operations (such as convolution, pooling, normalization, etc.) and formal network semantics NN Authoring Framework 1 NN Authoring Framework 2 NN Authoring Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Neural Net Exchange Format NNEF 1.0 is currently being defined OpenVX will import NNEF files
  • 15. © Copyright Khronos Group 2016 - Page 15 Please Consider Joining Khronos! Influence how standards evolve! Access draft specs to build products faster! Understand early industry requirements! Ship products that conform to international standards! Khronos is proven to RAPIDLY generate hardware API standards that create significant market opportunities Any company or organization is welcome to join Khronos for a voice and a vote in any of its standards www.khronos.org