SlideShare a Scribd company logo
© Copyright Khronos Group 2016 - Page 1
The Vision API Maze
Options and Trade-offs
Embedded Vision Summit, May 2016
Neil Trevett | Khronos President
NVIDIA Vice President Developer Ecosystem
© Copyright Khronos Group 2016 - Page 2
Accelerated Vision API Jungle
Vision Frameworks
Language-based
Acceleration Frameworks
Explicit
Kernels
GPU FPGA
DSP
Dedicated
Hardware
Neural Net Libraries
© Copyright Khronos Group 2016 - Page 3
http://hwstats.unity3d.com/mobile/gpu.html
OpenGL ES 2.0
OpenGL ES 3.x
OpenGL ES
2003
1.0
2004
1.1
2007
2.0
2012
3.0
2014
3.1
Driver
Update
Silicon
Update
Silicon
Update
Driver
Update
Compute Shaders
32-bit integers and floats
NPOT, 3D/depth textures
Texture arrays
Multiple Render Targets
Vertex and
fragment shaders
Fixed function
Pipeline
2015
3.2
Silicon
Update
Tessellation and geometry shaders
ASTC Texture Compression
Floating point render targets
Debug and robustness for security
Epic’s Rivalry demo using full Unreal Engine 4
https://www.youtube.com/watch?v=jRr-G95GdaM
+AEP
OpenGL ES 3.1 and
Android Extension Pack
since Android 5.0 (Lollipop)
Vertex and Fragment Shaders Compute Shaders
© Copyright Khronos Group 2016 - Page 4
New Generation GPU APIs
Only AppleOnly Windows 10
Cross Platform
Vulkan 1.0 launched in February
Shipping now on Windows, Linux, Android platforms from multiple vendors
‘Half Way New Gen’
Retains Traditional Binding Model
Mixes OpenGL ES 3.1/OpenCL 1.2
C++11-based kernel language
Objective-C or Swift
© Copyright Khronos Group 2016 - Page 5
Vulkan Explicit GPU Control
GPU
High-level Driver
Abstraction
Context management
Memory allocation
Full GLSL compiler
Error detection
Layered GPU Control
Application
Single thread per context
GPU
Thin Driver
Explicit GPU Control
Application
Memory allocation
Thread management
Synchronization
Multi-threaded generation
of command buffers
Language Front-end
Compilers
Initially GLSL
Loadable debug and
validation layers
Vulkan 1.0 provides access to
OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality
but with increased performance and flexibility
Loadable Layers
No error handling overhead in
production code
SPIR-V Pre-compiled Shaders:
No front-end compiler in driver
Future shading language flexibility
Simpler drivers:
Improved efficiency/performance
Reduced CPU bottlenecks
Lower latency
Increased portability
Graphics, compute and DMA queues:
Work dispatch flexibility
Command Buffers:
Command creation can be multi-threaded
Multiple CPU cores increase performance
Resource management in app code:
Less hitches and surprises
Vulkan Benefits
SPIR-V pre-compiled
shaders
© Copyright Khronos Group 2016 - Page 6
NVIDIA CUDA
• The industry’s original dedicated GPU Compute language
- C/C++11 language extensions for ‘single source’ programming
• Easy programmability and low level access to GPU
- Unified Memory, Virtual Addressing, Dynamic Parallelism etc.
• Mature and optimized tools and compute / imaging libraries
- Thrust, NPP, cuFFT, cuBLAS, cuda-gdb, nvprof etc.
• CUDA 7.5 released September 2015
- Added 16-bit floating point (FP16) data format support
• NVIDIA only, GPU only
CUDA 8 (Coming Soon)
Support for Pascal Unified Memory
Flexible Mixing of
CUDA (C++ language extensions)
OpenACC (parallelism through standard C++)
Thrust (C++ parallel library)
© Copyright Khronos Group 2016 - Page 7
OpenCL
• Heterogeneous parallel programming of diverse compute resources
- One code tree can be executed on CPUs, GPUs, DSPs and FPGA
• OpenCL = Two APIs and Two Kernel languages
- C Platform Layer API to query, select and initialize compute devices
- C Runtime API to build and execute kernels across multiple devices
- OpenCL C and OpenCL C++ kernel languages
• The OpenCL C++ kernel language is a static subset of C++14
- Adaptable and elegant sharable code – great for building libraries
- Templates enable meta-programming for highly adaptive software
- Lambdas used to implement nested/dynamic parallelism
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
Kernel code
compiled for
devices
Devices
CPU
Host
Runtime API
loads and executes
kernels across devices
© Copyright Khronos Group 2016 - Page 8
OpenCL Conformant Implementations
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
1.0 | Jul13
1.0 | Aug09
1.0 | May09
1.0 | May10
1.0 | Feb11
1.0 | May09
1.0 | Jan10
1.1 | Aug10
1.1 | Jul11
1.2 | May12
1.2 | Jun12
1.1 | Feb11
1.1 |Mar11
1.1 | Jun10
1.1 | Aug12
1.1 | Nov12
1.1 | May13
1.1 | Apr12
1.2 | Apr14
1.2 | Sep13
1.2 | Dec12
Desktop
Mobile
FPGA
2.0 | Jul14
OpenCL 2.1
Specification
Nov15
1.2 | May15
2.0 | Dec14
1.0 | Dec14
1.2 | Dec14
1.2 | Sep14
Vendor timelines are
first implementation of
each spec generation
1.2 | May15
Embedded
1.2 | Aug15
1.2 | Mar16
2.0 | Nov15
© Copyright Khronos Group 2016 - Page 9
OpenCL 2.2 - Top to Bottom C++
OpenCL 1.0
Specification
Dec08 Jun10
OpenCL 1.1
Specification
Nov11
OpenCL 1.2
Specification
OpenCL 2.0
Specification
Nov13
Device partitioning
Separate compilation and linking
Enhanced image support
Built-in kernels / custom devices
Enhanced DX and OpenGL Interop
Shared Virtual Memory
On-device dispatch
Generic Address Space
Enhanced Image Support
C11 Atomics
Pipes
Android ICD
3-component vectors
Additional image formats
Multiple hosts and devices
Buffer region operations
Enhanced event-driven execution
Additional OpenCL C built-ins
Improved OpenGL data/event interop
18 months 18 months 24 months
OpenCL 2.1
Specification
Nov1524 months
SPIR-V in Core
Subgroups into core
Subgroup query operations
clCloneKernel
Low-latency device
timer queries
OpenCL C++14
Kernel Language
into core
OpenCL 2.2
PROVISIONAL
May167months
Single Source C++ Programming
Full support for features in C++14 Kernel Language
API and Language Specs
Brings C++14 Kernel Language into core specification
Portable Kernel Intermediate Language
Support for C++14 kernel language e.g. constructors/destructors
© Copyright Khronos Group 2016 - Page 10
SPIR-V Ecosystem
LLVM
Third party kernel and
shader Languages
SPIR-V
• Khronos defined and controlled
cross-API intermediate language
• Native support for graphics
and parallel constructs
• 32-bit Word Stream
• Extensible and easily parsed
• Retains data object and control
flow information for effective
code generation and translation
OpenCL C++OpenCL C
GLSL
Khronos has open sourced
these tools and translators
IHV Driver
Runtimes
Other
Intermediate
Forms
SPIR-V Validator
SPIR-V Tools
SPIR-V (Dis)Assembler
LLVM to SPIR-V
Bi-directional
Translator
Khronos plans to open
source these tools soon
https://github.com/KhronosGroup/SPIR/tree/spirv-1.1
Open source C++ front-end released
© Copyright Khronos Group 2016 - Page 11
SYCL for OpenCL
• Single-source heterogeneous programming using STANDARD C++
- Use C++ templates and lambda functions for host & device code
• Aligns the hardware acceleration of OpenCL with direction of the C++ standard
- C++14 with open source C++17 Parallel STL hosted by Khronos
C++ Kernel Language
Low Level Control
‘GPGPU’-style separation of
device-side kernel source
code and host code
Single-source C++
Programmer Familiarity
Approach also taken by
C++ AMP and OpenMP
Developer Choice
The development of the two specifications are aligned so
code can be easily shared between the two approaches
© Copyright Khronos Group 2016 - Page 12
OpenCL Roadmap Discussions…
Embedded
Use cases: Signal and Pixel Processing
Roadmap: arbitrary precision for power
efficiency, hard real-time scheduling,
asynch DMA
FPGAs
Use cases: Network and
Stream Processing
Roadmap: enhanced execution model, self-
synchronized and self-scheduled graphs, fine-
grained synchronization between kernels,
DSL in C++
HPC, SciViz, Datacenter
Use case: Numerical Simulation, Virtualization
Roadmap: enhanced streaming processing,
enhanced library support
Vulkan Compute can leverage OpenCL?
Gaming Compute, Pixel Processing, Inference
Fine grain graphics and compute (no interop needed)
SPIR-V for shading language flexibility – C/C++
Low-latency, fine grain run-time
Google Android adoption
Competes well with Metal (=C++/OpenCL 1.2)
Roadmap: types, precision and accuracy
Pointers and address spaces, execution model
Desktop
Use cases: Video and Image Processing, Gaming Compute
Roadmap: Vulkan interop, arbitrary precision for
increased performance, pre-emption, collective
programming and improved execution model
Mobile
Use case: Photo and Vision Processing
Roadmap: arbitrary precision for
inference engine and pixel processing efficiency, pre-
emption and QoS scheduling for power efficiency
Possible learnings from Vulkan Philosophy
1. Explicit - provide direct access to hardware capabilities with thin driver
2. Feature Sets – enable diverse architectures to ship just relevant features
3. Open source conformance tests for deep community engagement
© Copyright Khronos Group 2016 - Page 13
OpenCV
• Extensive and widely used open source
vision library - written in optimized C/C++
- Free-use BSD license
• C++, C, Python and Java interfaces
- Windows, Linux, Mac OS, iOS and Android
• Increasingly taking advantage of
heterogeneous processing using OpenCL
- OpenCV 3.X Transparent API;
single API entry for each function/algorithm
- Dynamically loads OpenCL runtime if available;
otherwise falls back to CPU code
- Runtime Dispatching;
no recompilation!
CPU
Thread
CPU
Thread
CPU
Thread
…
ocl::Queue
ocl::Device
ocl::Queue ocl::Queue
ocl::Device
…
…
ocl::Context
OpenCV Application
OpenCV Transparent API for OpenCL Kernel Offload
• One OpenCL queue per CPU thread
• CPU threads can share a device
• OpenCL kernels are executed asynchronously
OpenCV is active open source - not an API specification
A strength and a weakness!
Production deployment often needs tightly defined callable API
© Copyright Khronos Group 2016 - Page 14
Vision Pipeline Challenges and Opportunities
22
Sensor ProliferationGrowing Camera Diversity Diverse Vision Processors
Flexible sensor and camera
control to GENERATE
an image stream
Use efficient acceleration to
PROCESS
the image stream
Combine vision output
with other sensor data
on device
© Copyright Khronos Group 2016 - Page 15
OpenVX – Low Power Vision Acceleration
• Precisely defined API for production deployment of vision acceleration
- Targeted at real-time mobile and embedded platforms
• Higher abstraction than OpenCL for performance portability across diverse architectures
- Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware…
• Extends portable vision acceleration to very low power domains
- Doesn’t require high-power CPU/GPU Complex or OpenCL precision
- Low-power host can setup and manage frame-rate graph
GPU
Vision Engine
Middleware
Application
DSP
Hardware
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
Compute
Multi-core
CPUX1
X10
X100
Vision Processing
EfficiencyVision
DSPs
© Copyright Khronos Group 2016 - Page 16
OpenVX Graphs
• OpenVX developers express a graph of image operations (‘Nodes’)
- Nodes can be on any hardware or processor coded in any language
• Graphs can execute almost autonomously
- Possible to Minimize host interaction during frame-rate graph execution
• Graphs are the key to run-time optimization opportunities…
Array of
Keypoints
YUV
Frame
Gray
Frame
Camera
Input
Rendering
Output
Pyrt
Color
Conversion
Channel
Extract
Optical
Flow
Harris
Track
Image
Pyramid
RGB
Frame
Array of
Features
Ftrt-1OpenVX Graph
OpenVX Nodes
Feature Extraction Example Graph
© Copyright Khronos Group 2016 - Page 17
OpenVX Efficiency through Graphs..
Reuse
pre-allocated
memory for
multiple
intermediate data
Memory
Management
Less allocation overhead,
more memory for
other applications
Replace a sub-
graph with a
single faster node
Kernel
Merge
Better memory
locality, less kernel
launch overhead
Split the graph
execution across
the whole system:
CPU / GPU /
dedicated HW
Graph
Scheduling
Faster execution
or lower power
consumption
Execute a sub-
graph at tile
granularity instead
of image
granularity
Data
Tiling
Better use of
data cache and
local memory
© Copyright Khronos Group 2016 - Page 18
Example Relative Performance
1.1
2.9
8.7
1.5
2.5
0
1
2
3
4
5
6
7
8
9
10
Arithmetic Analysis Filter Geometric Overall
OpenCV (GPU accelerated)
OpenVX (GPU accelerated)
Relative Performance
NVIDIA
implementation
experience.
Geometric mean of
>2200 primitives,
grouped into each
categories,
running at different
image sizes and
parameter settings
© Copyright Khronos Group 2016 - Page 19
Layered Vision Processing Ecosystem
Programmable Vision
Processors
Dedicated Vision
Hardware
Application
Processor Hardware
Powerful, flexible
low-level APIs / languages
Application Software
Engines/frameworks
C/C++
Implementers may use OpenCL or Compute Shaders to
implement OpenVX nodes on programmable processors
And then developers can use OpenVX to enable a
developer to easily connect those nodes into a graph
The OpenVX graph enables implementers to optimize execution across
diverse hardware architectures an drive to lower power implementations
OpenVX enables the graph to be extended to include hardware
architectures that don’t support programmable APIs
© Copyright Khronos Group 2016 - Page 20
OpenVX 1.0 Shipping, OpenVX 1.1 Released!
• Multiple OpenVX 1.0 Implementations shipping – spec in October 2014
- Open source sample implementation and conformance tests available
• OpenVX 1.1 Specification released 2nd May 2016 at Embedded Vision Summit
- Expands node functionality AND enhances graph framework
- Laplacian pyramids and enhanced filters
- Easier user nodes and control over execution on heterogeneous platforms
- Sample source and conformance tests will be updated to OpenVX 1.1 in 1H16
= shipping implementations
© Copyright Khronos Group 2016 - Page 21
OpenVX Roadmap and Safety Critical APIs
New Generation APIs for
safety certifiable vision,
graphics and compute
e.g. ISO 26262 and DO-178B/C
OpenGL ES 1.0 - 2003
Fixed function graphics
OpenGL ES 2.0 - 2007
Shader programmable pipeline
OpenGL SC 1.0 - 2005
Fixed function graphics subset
OpenGL SC 2.0 - April 2016
Shader programmable pipeline subset
Experience and Guidelines
Small driver size
Advanced functionality
Graphics and compute
OpenVX Roadmap Discussions
Significantly broaden node functionality
In-graph neural nets
Programmable nodes (OpenCL or SPIR-V?)
Market-specific feature sets
OpenVX SC?
© Copyright Khronos Group 2016 - Page 22
Get Involved!
• A diverse set of vision APIs in the industry
- Developer choice is good – but need to choose wisely!
• Many APIs originally created to program GPUs
- But vision processing needs are increasingly driving API roadmaps
• Industry will tend to consolidate around leading APIs
- Working toward a multi-layer API ecosystem
- Powerful foundational hardware APIs enabling rich middleware APIs and libraries
• Any company or organization is welcome to join Khronos
for a voice and a vote in any of its standards
- www.khronos.org
• Neil Trevett
- ntrevett@nvidia.com
- @neilt3d

More Related Content

PPTX
OpenVINO introduction
PDF
"Real-world Vision Systems Design: Challenges and Techniques," a Presentation...
PDF
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
PPTX
Real-world Vision Systems Design: Challenges and Techniques
PPTX
OpenCV for Embedded: Lessons Learned
PPTX
How to Get the Best Deep Learning performance with OpenVINO Toolkit
PPTX
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
PPTX
Develop and optimize CV/DL applications with Intel OpenVINO toolkit
OpenVINO introduction
"Real-world Vision Systems Design: Challenges and Techniques," a Presentation...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
Real-world Vision Systems Design: Challenges and Techniques
OpenCV for Embedded: Lessons Learned
How to Get the Best Deep Learning performance with OpenVINO Toolkit
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Develop and optimize CV/DL applications with Intel OpenVINO toolkit

What's hot (20)

PDF
Openvino ncs2
PDF
Виктор Ерухимов Open VX mixar moscow sept'15
PDF
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
PDF
"How to Get the Best Deep Learning Performance with the OpenVINO Toolkit," a ...
PDF
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
PDF
AI & Computer Vision (OpenVINO) - CPBR12
PDF
ScilabTEC 2015 - Xilinx
PDF
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
PPTX
Continuous Integration for BSP
PPTX
!Zpx Overview New
PPT
Using the Cypress PSoC Processor
PDF
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
PPTX
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
PDF
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
PDF
Jai kumar fpga_prototyping
PDF
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
PDF
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
PDF
PG-4039, RapidFire API, by Dmitry Kozlov
PDF
Tinychip PSoC Workshop
PDF
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
Openvino ncs2
Виктор Ерухимов Open VX mixar moscow sept'15
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"How to Get the Best Deep Learning Performance with the OpenVINO Toolkit," a ...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
AI & Computer Vision (OpenVINO) - CPBR12
ScilabTEC 2015 - Xilinx
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Continuous Integration for BSP
!Zpx Overview New
Using the Cypress PSoC Processor
CC-4001, Aparapi and HSA: Easing the developer path to APU/GPU accelerated Ja...
Tyrone-Intel oneAPI Webinar: Optimized Tools for Performance-Driven, Cross-Ar...
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...
Jai kumar fpga_prototyping
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
PG-4039, RapidFire API, by Dmitry Kozlov
Tinychip PSoC Workshop
Design and Testing Challenges for Chiplet Based Design: Assembly and Test View
Ad

Similar to "The Vision API Maze: Options and Trade-offs," a Presentation from the Khronos Group (20)

PDF
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
PDF
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
PDF
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
PDF
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
PDF
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
PDF
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
PPTX
Hands on OpenCL
PDF
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
PDF
Open CL For Speedup Workshop
PDF
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
PDF
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
PDF
Deep Learning on ARM Platforms - SFO17-509
PPTX
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
PPTX
MattsonTutorialSC14.pptx
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
MattsonTutorialSC14.pdf
PDF
clWrap: Nonsense free control of your GPU
PDF
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
PDF
"Making OpenCV Code Run Fast," a Presentation from Intel
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
OpenCL Overview Japan Virtual Open House Feb 2021
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
Hands on OpenCL
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
Open CL For Speedup Workshop
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
Deep Learning on ARM Platforms - SFO17-509
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
MattsonTutorialSC14.pptx
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
MattsonTutorialSC14.pdf
clWrap: Nonsense free control of your GPU
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
"Making OpenCV Code Run Fast," a Presentation from Intel
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

"The Vision API Maze: Options and Trade-offs," a Presentation from the Khronos Group

  • 1. © Copyright Khronos Group 2016 - Page 1 The Vision API Maze Options and Trade-offs Embedded Vision Summit, May 2016 Neil Trevett | Khronos President NVIDIA Vice President Developer Ecosystem
  • 2. © Copyright Khronos Group 2016 - Page 2 Accelerated Vision API Jungle Vision Frameworks Language-based Acceleration Frameworks Explicit Kernels GPU FPGA DSP Dedicated Hardware Neural Net Libraries
  • 3. © Copyright Khronos Group 2016 - Page 3 http://hwstats.unity3d.com/mobile/gpu.html OpenGL ES 2.0 OpenGL ES 3.x OpenGL ES 2003 1.0 2004 1.1 2007 2.0 2012 3.0 2014 3.1 Driver Update Silicon Update Silicon Update Driver Update Compute Shaders 32-bit integers and floats NPOT, 3D/depth textures Texture arrays Multiple Render Targets Vertex and fragment shaders Fixed function Pipeline 2015 3.2 Silicon Update Tessellation and geometry shaders ASTC Texture Compression Floating point render targets Debug and robustness for security Epic’s Rivalry demo using full Unreal Engine 4 https://www.youtube.com/watch?v=jRr-G95GdaM +AEP OpenGL ES 3.1 and Android Extension Pack since Android 5.0 (Lollipop) Vertex and Fragment Shaders Compute Shaders
  • 4. © Copyright Khronos Group 2016 - Page 4 New Generation GPU APIs Only AppleOnly Windows 10 Cross Platform Vulkan 1.0 launched in February Shipping now on Windows, Linux, Android platforms from multiple vendors ‘Half Way New Gen’ Retains Traditional Binding Model Mixes OpenGL ES 3.1/OpenCL 1.2 C++11-based kernel language Objective-C or Swift
  • 5. © Copyright Khronos Group 2016 - Page 5 Vulkan Explicit GPU Control GPU High-level Driver Abstraction Context management Memory allocation Full GLSL compiler Error detection Layered GPU Control Application Single thread per context GPU Thin Driver Explicit GPU Control Application Memory allocation Thread management Synchronization Multi-threaded generation of command buffers Language Front-end Compilers Initially GLSL Loadable debug and validation layers Vulkan 1.0 provides access to OpenGL ES 3.1 / OpenGL 4.X-class GPU functionality but with increased performance and flexibility Loadable Layers No error handling overhead in production code SPIR-V Pre-compiled Shaders: No front-end compiler in driver Future shading language flexibility Simpler drivers: Improved efficiency/performance Reduced CPU bottlenecks Lower latency Increased portability Graphics, compute and DMA queues: Work dispatch flexibility Command Buffers: Command creation can be multi-threaded Multiple CPU cores increase performance Resource management in app code: Less hitches and surprises Vulkan Benefits SPIR-V pre-compiled shaders
  • 6. © Copyright Khronos Group 2016 - Page 6 NVIDIA CUDA • The industry’s original dedicated GPU Compute language - C/C++11 language extensions for ‘single source’ programming • Easy programmability and low level access to GPU - Unified Memory, Virtual Addressing, Dynamic Parallelism etc. • Mature and optimized tools and compute / imaging libraries - Thrust, NPP, cuFFT, cuBLAS, cuda-gdb, nvprof etc. • CUDA 7.5 released September 2015 - Added 16-bit floating point (FP16) data format support • NVIDIA only, GPU only CUDA 8 (Coming Soon) Support for Pascal Unified Memory Flexible Mixing of CUDA (C++ language extensions) OpenACC (parallelism through standard C++) Thrust (C++ parallel library)
  • 7. © Copyright Khronos Group 2016 - Page 7 OpenCL • Heterogeneous parallel programming of diverse compute resources - One code tree can be executed on CPUs, GPUs, DSPs and FPGA • OpenCL = Two APIs and Two Kernel languages - C Platform Layer API to query, select and initialize compute devices - C Runtime API to build and execute kernels across multiple devices - OpenCL C and OpenCL C++ kernel languages • The OpenCL C++ kernel language is a static subset of C++14 - Adaptable and elegant sharable code – great for building libraries - Templates enable meta-programming for highly adaptive software - Lambdas used to implement nested/dynamic parallelism OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code GPU DSP CPU CPU FPGA Kernel code compiled for devices Devices CPU Host Runtime API loads and executes kernels across devices
  • 8. © Copyright Khronos Group 2016 - Page 8 OpenCL Conformant Implementations OpenCL 1.0 Specification Dec08 Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification OpenCL 2.0 Specification Nov13 1.0 | Jul13 1.0 | Aug09 1.0 | May09 1.0 | May10 1.0 | Feb11 1.0 | May09 1.0 | Jan10 1.1 | Aug10 1.1 | Jul11 1.2 | May12 1.2 | Jun12 1.1 | Feb11 1.1 |Mar11 1.1 | Jun10 1.1 | Aug12 1.1 | Nov12 1.1 | May13 1.1 | Apr12 1.2 | Apr14 1.2 | Sep13 1.2 | Dec12 Desktop Mobile FPGA 2.0 | Jul14 OpenCL 2.1 Specification Nov15 1.2 | May15 2.0 | Dec14 1.0 | Dec14 1.2 | Dec14 1.2 | Sep14 Vendor timelines are first implementation of each spec generation 1.2 | May15 Embedded 1.2 | Aug15 1.2 | Mar16 2.0 | Nov15
  • 9. © Copyright Khronos Group 2016 - Page 9 OpenCL 2.2 - Top to Bottom C++ OpenCL 1.0 Specification Dec08 Jun10 OpenCL 1.1 Specification Nov11 OpenCL 1.2 Specification OpenCL 2.0 Specification Nov13 Device partitioning Separate compilation and linking Enhanced image support Built-in kernels / custom devices Enhanced DX and OpenGL Interop Shared Virtual Memory On-device dispatch Generic Address Space Enhanced Image Support C11 Atomics Pipes Android ICD 3-component vectors Additional image formats Multiple hosts and devices Buffer region operations Enhanced event-driven execution Additional OpenCL C built-ins Improved OpenGL data/event interop 18 months 18 months 24 months OpenCL 2.1 Specification Nov1524 months SPIR-V in Core Subgroups into core Subgroup query operations clCloneKernel Low-latency device timer queries OpenCL C++14 Kernel Language into core OpenCL 2.2 PROVISIONAL May167months Single Source C++ Programming Full support for features in C++14 Kernel Language API and Language Specs Brings C++14 Kernel Language into core specification Portable Kernel Intermediate Language Support for C++14 kernel language e.g. constructors/destructors
  • 10. © Copyright Khronos Group 2016 - Page 10 SPIR-V Ecosystem LLVM Third party kernel and shader Languages SPIR-V • Khronos defined and controlled cross-API intermediate language • Native support for graphics and parallel constructs • 32-bit Word Stream • Extensible and easily parsed • Retains data object and control flow information for effective code generation and translation OpenCL C++OpenCL C GLSL Khronos has open sourced these tools and translators IHV Driver Runtimes Other Intermediate Forms SPIR-V Validator SPIR-V Tools SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translator Khronos plans to open source these tools soon https://github.com/KhronosGroup/SPIR/tree/spirv-1.1 Open source C++ front-end released
  • 11. © Copyright Khronos Group 2016 - Page 11 SYCL for OpenCL • Single-source heterogeneous programming using STANDARD C++ - Use C++ templates and lambda functions for host & device code • Aligns the hardware acceleration of OpenCL with direction of the C++ standard - C++14 with open source C++17 Parallel STL hosted by Khronos C++ Kernel Language Low Level Control ‘GPGPU’-style separation of device-side kernel source code and host code Single-source C++ Programmer Familiarity Approach also taken by C++ AMP and OpenMP Developer Choice The development of the two specifications are aligned so code can be easily shared between the two approaches
  • 12. © Copyright Khronos Group 2016 - Page 12 OpenCL Roadmap Discussions… Embedded Use cases: Signal and Pixel Processing Roadmap: arbitrary precision for power efficiency, hard real-time scheduling, asynch DMA FPGAs Use cases: Network and Stream Processing Roadmap: enhanced execution model, self- synchronized and self-scheduled graphs, fine- grained synchronization between kernels, DSL in C++ HPC, SciViz, Datacenter Use case: Numerical Simulation, Virtualization Roadmap: enhanced streaming processing, enhanced library support Vulkan Compute can leverage OpenCL? Gaming Compute, Pixel Processing, Inference Fine grain graphics and compute (no interop needed) SPIR-V for shading language flexibility – C/C++ Low-latency, fine grain run-time Google Android adoption Competes well with Metal (=C++/OpenCL 1.2) Roadmap: types, precision and accuracy Pointers and address spaces, execution model Desktop Use cases: Video and Image Processing, Gaming Compute Roadmap: Vulkan interop, arbitrary precision for increased performance, pre-emption, collective programming and improved execution model Mobile Use case: Photo and Vision Processing Roadmap: arbitrary precision for inference engine and pixel processing efficiency, pre- emption and QoS scheduling for power efficiency Possible learnings from Vulkan Philosophy 1. Explicit - provide direct access to hardware capabilities with thin driver 2. Feature Sets – enable diverse architectures to ship just relevant features 3. Open source conformance tests for deep community engagement
  • 13. © Copyright Khronos Group 2016 - Page 13 OpenCV • Extensive and widely used open source vision library - written in optimized C/C++ - Free-use BSD license • C++, C, Python and Java interfaces - Windows, Linux, Mac OS, iOS and Android • Increasingly taking advantage of heterogeneous processing using OpenCL - OpenCV 3.X Transparent API; single API entry for each function/algorithm - Dynamically loads OpenCL runtime if available; otherwise falls back to CPU code - Runtime Dispatching; no recompilation! CPU Thread CPU Thread CPU Thread … ocl::Queue ocl::Device ocl::Queue ocl::Queue ocl::Device … … ocl::Context OpenCV Application OpenCV Transparent API for OpenCL Kernel Offload • One OpenCL queue per CPU thread • CPU threads can share a device • OpenCL kernels are executed asynchronously OpenCV is active open source - not an API specification A strength and a weakness! Production deployment often needs tightly defined callable API
  • 14. © Copyright Khronos Group 2016 - Page 14 Vision Pipeline Challenges and Opportunities 22 Sensor ProliferationGrowing Camera Diversity Diverse Vision Processors Flexible sensor and camera control to GENERATE an image stream Use efficient acceleration to PROCESS the image stream Combine vision output with other sensor data on device
  • 15. © Copyright Khronos Group 2016 - Page 15 OpenVX – Low Power Vision Acceleration • Precisely defined API for production deployment of vision acceleration - Targeted at real-time mobile and embedded platforms • Higher abstraction than OpenCL for performance portability across diverse architectures - Multi-core CPUs, GPUs, DSPs and DSP arrays, ISPs, Dedicated hardware… • Extends portable vision acceleration to very low power domains - Doesn’t require high-power CPU/GPU Complex or OpenCL precision - Low-power host can setup and manage frame-rate graph GPU Vision Engine Middleware Application DSP Hardware PowerEfficiency Computation Flexibility Dedicated Hardware GPU Compute Multi-core CPUX1 X10 X100 Vision Processing EfficiencyVision DSPs
  • 16. © Copyright Khronos Group 2016 - Page 16 OpenVX Graphs • OpenVX developers express a graph of image operations (‘Nodes’) - Nodes can be on any hardware or processor coded in any language • Graphs can execute almost autonomously - Possible to Minimize host interaction during frame-rate graph execution • Graphs are the key to run-time optimization opportunities… Array of Keypoints YUV Frame Gray Frame Camera Input Rendering Output Pyrt Color Conversion Channel Extract Optical Flow Harris Track Image Pyramid RGB Frame Array of Features Ftrt-1OpenVX Graph OpenVX Nodes Feature Extraction Example Graph
  • 17. © Copyright Khronos Group 2016 - Page 17 OpenVX Efficiency through Graphs.. Reuse pre-allocated memory for multiple intermediate data Memory Management Less allocation overhead, more memory for other applications Replace a sub- graph with a single faster node Kernel Merge Better memory locality, less kernel launch overhead Split the graph execution across the whole system: CPU / GPU / dedicated HW Graph Scheduling Faster execution or lower power consumption Execute a sub- graph at tile granularity instead of image granularity Data Tiling Better use of data cache and local memory
  • 18. © Copyright Khronos Group 2016 - Page 18 Example Relative Performance 1.1 2.9 8.7 1.5 2.5 0 1 2 3 4 5 6 7 8 9 10 Arithmetic Analysis Filter Geometric Overall OpenCV (GPU accelerated) OpenVX (GPU accelerated) Relative Performance NVIDIA implementation experience. Geometric mean of >2200 primitives, grouped into each categories, running at different image sizes and parameter settings
  • 19. © Copyright Khronos Group 2016 - Page 19 Layered Vision Processing Ecosystem Programmable Vision Processors Dedicated Vision Hardware Application Processor Hardware Powerful, flexible low-level APIs / languages Application Software Engines/frameworks C/C++ Implementers may use OpenCL or Compute Shaders to implement OpenVX nodes on programmable processors And then developers can use OpenVX to enable a developer to easily connect those nodes into a graph The OpenVX graph enables implementers to optimize execution across diverse hardware architectures an drive to lower power implementations OpenVX enables the graph to be extended to include hardware architectures that don’t support programmable APIs
  • 20. © Copyright Khronos Group 2016 - Page 20 OpenVX 1.0 Shipping, OpenVX 1.1 Released! • Multiple OpenVX 1.0 Implementations shipping – spec in October 2014 - Open source sample implementation and conformance tests available • OpenVX 1.1 Specification released 2nd May 2016 at Embedded Vision Summit - Expands node functionality AND enhances graph framework - Laplacian pyramids and enhanced filters - Easier user nodes and control over execution on heterogeneous platforms - Sample source and conformance tests will be updated to OpenVX 1.1 in 1H16 = shipping implementations
  • 21. © Copyright Khronos Group 2016 - Page 21 OpenVX Roadmap and Safety Critical APIs New Generation APIs for safety certifiable vision, graphics and compute e.g. ISO 26262 and DO-178B/C OpenGL ES 1.0 - 2003 Fixed function graphics OpenGL ES 2.0 - 2007 Shader programmable pipeline OpenGL SC 1.0 - 2005 Fixed function graphics subset OpenGL SC 2.0 - April 2016 Shader programmable pipeline subset Experience and Guidelines Small driver size Advanced functionality Graphics and compute OpenVX Roadmap Discussions Significantly broaden node functionality In-graph neural nets Programmable nodes (OpenCL or SPIR-V?) Market-specific feature sets OpenVX SC?
  • 22. © Copyright Khronos Group 2016 - Page 22 Get Involved! • A diverse set of vision APIs in the industry - Developer choice is good – but need to choose wisely! • Many APIs originally created to program GPUs - But vision processing needs are increasingly driving API roadmaps • Industry will tend to consolidate around leading APIs - Working toward a multi-layer API ecosystem - Powerful foundational hardware APIs enabling rich middleware APIs and libraries • Any company or organization is welcome to join Khronos for a voice and a vote in any of its standards - www.khronos.org • Neil Trevett - ntrevett@nvidia.com - @neilt3d