SlideShare a Scribd company logo
Copyright © 2018 The Khronos Group 1
Copyright © 2018 The Khronos Group 2
The Search for Vision Performance and Portability
Leads to a Layered Acceleration Ecosystem
Explicit
Kernel APIs
Diverse Hardware
(GPUs, DSPs, FPGAs)
Single Source
C++ Languages
Libraries,
Frameworks and
run-times
Code Intermediate
Representations PTX HSAIL
GPU
FPGA
DSP
Dedicated
Hardware
Vision
Processing Libraries
Inferencing
Run-times WinML/DirectMLAR Libraries
Machine Learning
Training Frameworks
GPU-only APIs
Heterogenous
Computing
Copyright © 2018 The Khronos Group 3
Mobile Augmented Reality Libraries
ARKit
V1.0 is VR focused.
Subsequent
versions will
extend to cross-
platform AR
Encapsulated Vision-based
Functionality
Also leveraging motion sensors
AR Libraries typically
use ARKit/ARCore if
available or
implement own
tracking if not
Copyright © 2018 The Khronos Group 4
Vision and Neural Net
Inferencing Runtimes
Neural Network Workflow
Applications
Using Embedded
Vision and Inferencing
Desktop and
Cloud Hardware
Trained
Networks
cuDNN MIOpen MKL-DNN
FPGA DSP
GPU
Custom
Hardware
Diverse Inferencing
Acceleration Hardware
GPU
CPU
CPU
CPU
Framework
Specific
Formats
Training = Desktop / Cloud
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Neural Net Training
Frameworks
Deployment on Embedded Devices
Compilation and
Optimization
TPU
Authoring interchange
Embedded Deployment
WinML
Copyright © 2018 The Khronos Group 5
NNVM - Open Compiler for AI Inferencing
http://www.tvmlang.org/2017/08/17/tvm-release-announcement.html
SPIR-V IR for parallel accelerators
Backend in development
LLVM IR for CPUs
1.Import Trained
Network Description
2. Graph-level
Optimizations
3. Decompose into
primitive instructions
4. Emit programs
executable by run-times
Copyright © 2018 The Khronos Group 6
SPIR-V Ecosystem
LLVM
Third party kernel and
shader languages
• Khronos defined cross-API IR
• Native graphics and parallel compute
• Stable specification to complement LLVM
Open Source Project
OpenCL C++
Front-end
OpenCL C
Front-end
glslang
Khronos-hosted
Open Source Projects
IHV Driver
Runtimes
SPIR-V Validator
SPIR-V (Dis)Assembler
LLVM to SPIR-V
Bi-directional
Translators
https://github.com/KhronosGroup/SPIRV-Tools
GLSL HLSL
Khronos cooperating with
Clang/LLVM Community
SPIRV-Cross
GLSL
HLSL
MSL
SPIRV-opt | SPIRV-remap
Optimization Tools
DXC
SYCL
Front-end
Copyright © 2018 The Khronos Group 7
Platform Neural Network Stacks
Microsoft Windows
Windows Machine Learning (WinML)
Google Android
Neural Network API (NNAPI)
Apple MacOS and iOS
CoreML
Common Fundamental Steps
1. Import trained NN model file
2. Build optimized version of graph
3. Accelerate on GPU or other processor
using available low-level API
https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ https://developer.android.com/ndk/guides/neuralnetworks/ https://developer.apple.com/documentation/coreml
Core ML Model
Copyright © 2018 The Khronos Group 8
NNEF Ecosystem
Files
Syntax
Parser/
Validator
TensorFlow
and Caffe
Exporters
Comparing Neural Network
Exchange Industry Initiatives
NNEF open source projects hosted on
Khronos NNEF GitHub repository
Apache 2.0 license
https://github.com/KhronosGroup/NNEF-Tools
TensorFlow
and Caffe2
Importer /
Exporters
Google
NNAPI
Convertor
NNEF 1.0 Provisional
Released for industry
feedback before finalization
OpenVX
Ingestion &
Execution
Live
Imminent
Copyright © 2018 The Khronos Group 9
Network Data File
Binary format contains parameter tensors
Supports float and quantized (integer) data
Flexible bit widths and quantization algorithms
Quantization algorithms expressed as extensible
compound operations
Quantization info provided as hints for execution
Network Data File
Binary format contains parameter tensors
Supports float and quantized (integer) data
Flexible bit widths and quantization algorithms
Quantization algorithms expressed as extensible
compound operations
Quantization info provided as hints for execution
NNEF File Structure
Network Structure File
Distilled, platform independent network description
Human readable, syntactical elements from Python
Standardized Operations
Rigorously defined semantics
Linear, convolution, pooling, normalization, activation, unary/binary
Supports fully connected, convolutional, recurrent architectures
Two Levels of Expressiveness
FLAT: Basic transfer of computation graphs with standardized operations
Simple to parse and translate to vendor specific formats
COMPOSITIONAL: Define custom compound operations
Higher-level graph descriptions
More complex to parse but offers more optimization hints
Network Data File
Binary format contains parameter tensors
Supports float and quantized (integer) data
Flexible bit widths and quantization algorithms
Quantization algorithms expressed as extensible
compound operations
Quantization info provided as hints for execution
Split Structure and Data files
Easy independent access to network structure or individual parameter data
Set of files can use a container such as tar or zip with optional compression and encryption
Can associate multiple Data Files with one Network
Structure File e.g. the same data in multiple formats
Copyright © 2018 The Khronos Group 10
• Convolution Neural Network topologies can be represented as OpenVX graphs
• Can also combine traditional vision and neural network operations
• OpenVX Neural Network Extension
• Defines OpenVX nodes to represent many common NN layer types
• Layer types include convolution, activation, pooling, fully-connected, soft-max
• Defines multi-dimensional tensors objects to connect layers
• Kernel Import Extension
• Enables loading of external program representations into OpenVX graphs
NNEF Execution within OpenVX
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
An OpenVX graph
mixing CNN nodes with
traditional vision nodes
Importer converts NNEF
representation into OpenVX
Graphs using Kernel Import
Extension
Copyright © 2018 The Khronos Group 11
OpenCL Command Queue
OpenVX / OpenCL Interop Extension
Application
Fully asynchronous host-
device operations during
data exchange
RuntimeRuntime
Map or copy OpenVX data
objects into cl_mem buffers
Copy or export
cl_mem buffers into
OpenVX data objects
Enables custom OpenCL
acceleration to be used within
OpenVX User Kernels
OpenVX user-kernels can access
command queue and cl_mem objects
to asynchronously schedule OpenCL
kernel execution
Copyright © 2018 The Khronos Group 12
OpenCL Ecosystem Roadmap
2011
OpenCL 1.2
OpenCL C Kernel
Language
OpenCL 2.1
SPIR-V in Core
2015
SYCL 1.2
C++11 Single source
programming
OpenCL 2.2
C++ Kernel Language
2017
SYCL 1.2.1
C++11 Single source
programming
Work with industry to bring
Heterogeneous compute to
standard ISO C++
OpenCL ‘Next’
Flexible and efficient deployment of
parallel computation across diverse
processor architectures
Single source C++ programming.
Great for supporting C++ apps,
libraries and frameworks
More deployment options:
Enabling dispatch of
OpenCL C kernels from
Vulkan runtimes
Copyright © 2018 The Khronos Group 13
• Single-source heterogeneous programming using STANDARD C++
• Use C++ templates and lambda functions for host & device code
• Layered over OpenCL
• Fast and powerful path for bring C++ apps and libraries to OpenCL
• C++ Kernel Fusion - better performance on complex software than hand-coding
• SYCLBLAS, SYCL Eigen, SYCL TensorFlow, SYCL DNN, SYCL GTX, VisionCpp
• Close cooperation with ISO C++
• C++17 Parallel STL hosted by Khronos
• C++20 Parallel STL with Ranges
• Implementations
• triSYCL, ComputeCpp, ComputeCpp SDK …
• More information at http://sycl.tech
SYCL Ecosystem
Copyright © 2018 The Khronos Group 14
Pervasive Vulkan 1.0
Major GPU Companies supporting Vulkan for Desktop and Mobile Platforms
http://vulkan.gpuinfo.org/
Platforms
Embedded
Virtual Reality
Cloud Services
Mobile
(Android 7.0+)
Desktop Media Players
Consoles
Copyright © 2018 The Khronos Group 15
• Experimental collaboration between Google, Codeplay, and Adobe
- Successfully tested on over 200K lines of Adobe OpenCL C production code
• Compiles OpenCL C to Vulkan’s SPIR-V execution environment
- Proof-of-concept that OpenCL kernels can be brought seamlessly to Vulkan
- Significant parts OpenCL C 1.2 so far – shaped by submitted workloads
Clspv OpenCL C to Vulkan Compiler
Clspv
Compiler
OpenCL C
Source
Runtime
OpenCL
Host Code
Run-time
API Translator
Prototype open
source project
https://github.com/google/clspv
Possible future
project – if interest?
Increasing deployment options for
OpenCL kernel developers e.g.
Vulkan is a supported API on Android
Copyright © 2018 The Khronos Group 16
• Vendors can support ANY combination of features to suit their hardware/market
• If all exposed features are conformant – the implementation is conformant
• Khronos will define Feature Sets equivalent to current profiles
• Existing profiles and device types not going away! No changes to existing applications
• Opportunity to coalesce industry support around market-focused feature sets
• Khronos aiming to provide Feature Set infrastructure for the industry to leverage
OpenCL Next – Deployment Flexibility
OpenCL 2.2 Functionality = queryable, optional feature
Khronos-defined
OpenCL 2.2 Full Profile
Feature Set
Khronos-defined
OpenCL 1.2 Full Profile
Feature Set
Industry-defined
E.g. ‘Inferencing’
Feature Set
Copyright © 2018 The Khronos Group 17
Safety Critical APIs – Khronos Experience
Need for new-generation APIs
for safety certifiable vision,
graphics and compute
e.g. ISO 26262 and DO-178B/C
OpenGL ES 1.0 - 2003
Fixed function graphics
OpenGL ES 2.0 - 2007
Shader programmable
pipeline
OpenGL SC 1.0 - 2005
Fixed function graphics subset
OpenGL SC 2.0 - April 2016
Shader programmable pipeline subset
General lack of industry consensus on how
APIs should be designed to streamline
safety certification
OpenVX SC 1.1 - May 2017
Restricted “deployment”
implementation only executes pre-
compiled binary format
Copyright © 2018 The Khronos Group 18
Khronos Safety Critical Advisory Forum
OpenCL SC TSG
Working on OpenCL SC
Gathering requirements
SYCL SC
Guidelines to augment Industry
First Safe and Secure Parallel and
Heterogeneous C++
Safe AI for Automotive
AESIN
Automotive ADAS & AV + security
https://aesin.org.uk
MISRA C++
C++ WG23 Programming Vulnerabilities
ISO C Safe and Secure SG
ISO C++ Vulnerabilities Safety Critical SG
Generating guidelines for designing safety
critical APIs to ease system certification.
Open to Khronos member AND industry experts
https://www.khronos.org/advisors/kscaf
Industry outreach
and cooperation
Khronos SC
Activities
OpenVX SC 1.1 - May 2017
Restricted “deployment”
implementation only executes pre-
compiled binary format
We are inviting safety critical
experts to join KSCAF!
No cost or work commitment
Copyright © 2018 The Khronos Group 19
• Advanced control of sensor and camera subsystem – with cross-platform portability
• Generate sophisticated image stream for advanced imaging & vision apps
• No platform API currently fulfills all developer requirements
• Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays
• Cross sensor synch: e.g. synch of camera and MEMS sensors
• Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI
• Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Need for Camera Control API?
Image Signal
Processor (ISP)
Defines control of Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Auto Exposure (AE)
Auto White Balance (AWB)
Auto Focus (AF)
Stream of
Images for
Vision
Processing
OpenKCAM standard is
currently on ice – do we
need to restart?
Copyright © 2018 The Khronos Group 20
• Vision tools and API ecosystem becoming increasingly sophisticated
• Layering libraries, languages and run-times
• Machine learning stacks continue to grow in flexibility
• Exchange formats and compiler technologies essential to flexible deployment
• Open standards evolving alongside proprietary solutions
• For developers that value cross-platform deployment
• Safety-critical APIs becoming increasingly essential
• Many vision applications need system certification
• Still no cross-vendor camera APIs?
• Is the time yet right for this to be a target for standardization?
• Please join if your company interested in Khronos open standards
• Neil Trevett | ntrevett@nvidia.com | @neilt3d
Key Takeaways and What’s Next?

More Related Content

PDF
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
PDF
Develop, Deploy, and Innovate with Intel® Cluster Ready
PDF
“OpenVX 1.3: An Open Standard for Computer Vision Software Acceleration,” a P...
PDF
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
PDF
HKG18-312 - CMSIS-NN
PDF
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
PDF
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
PDF
Redfish and python-redfish for Software Defined Infrastructure
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
Develop, Deploy, and Innovate with Intel® Cluster Ready
“OpenVX 1.3: An Open Standard for Computer Vision Software Acceleration,” a P...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
HKG18-312 - CMSIS-NN
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Redfish and python-redfish for Software Defined Infrastructure

What's hot (20)

PDF
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
PDF
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
PPTX
TLDK - FD.io Sept 2016
PDF
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
PDF
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
PDF
IPMI is dead, Long live Redfish
PPTX
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
PDF
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
PDF
FMXLinux Introduction - Delphi's FireMonkey for Linux
PDF
ODP Presentation LinuxCon NA 2014
PDF
HPC network stack on ARM - Linaro HPC Workshop 2018
PDF
Isn’t it Ironic that a Redfish is software defining you
PDF
Lenovo HPC Strategy Update
PDF
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Evolving Virtual Networking with IO Visor
PDF
gRPC stack supporting Intel Resource Director technology (RDT)
PDF
BKK16-400B ODPI - Standardizing Hadoop
PDF
Intel the-latest-on-ofi
PDF
Data Plane and VNF Acceleration Mini Summit
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
TLDK - FD.io Sept 2016
SPACK: A Package Manager for Supercomputers, Linux, and MacOS
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
IPMI is dead, Long live Redfish
Vulkan Ray Tracing Update Japan Virtual Open House Feb 2021
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
FMXLinux Introduction - Delphi's FireMonkey for Linux
ODP Presentation LinuxCon NA 2014
HPC network stack on ARM - Linaro HPC Workshop 2018
Isn’t it Ironic that a Redfish is software defining you
Lenovo HPC Strategy Update
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
CUDA-Python and RAPIDS for blazing fast scientific computing
Evolving Virtual Networking with IO Visor
gRPC stack supporting Intel Resource Director technology (RDT)
BKK16-400B ODPI - Standardizing Hadoop
Intel the-latest-on-ofi
Data Plane and VNF Acceleration Mini Summit
Ad

Similar to "APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Presentation from Khronos (20)

PDF
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
PDF
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
PDF
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
PDF
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
PDF
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
PDF
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
PDF
"The OpenVX Computer Vision and Neural Network Inference Library Standard for...
PDF
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
PDF
Виктор Ерухимов Open VX mixar moscow sept'15
PDF
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
PDF
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
PDF
"Making OpenCV Code Run Fast," a Presentation from Intel
PPTX
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
PDF
“Democratizing Computer Vision and Machine Learning with Open, Royalty-Free S...
PDF
“Advancing Embedded Vision Systems: Harnessing Hardware Acceleration and Open...
PDF
Deep Learning on ARM Platforms - SFO17-509
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
OpenCL Overview Japan Virtual Open House Feb 2021
"Recent Developments in Khronos Standards for Embedded Vision," a Presentatio...
"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Li...
"The OpenVX Computer Vision and Neural Network Inference Library Standard for...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
Виктор Ерухимов Open VX mixar moscow sept'15
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
"Making OpenCV Code Run Fast," a Presentation from Intel
Open Standards for Cross-Platform Gaming, Virtual & Augmented Reality | Neil ...
“Democratizing Computer Vision and Machine Learning with Open, Royalty-Free S...
“Advancing Embedded Vision Systems: Harnessing Hardware Acceleration and Open...
Deep Learning on ARM Platforms - SFO17-509
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
sap open course for s4hana steps from ECC to s4
Building Integrated photovoltaic BIPV_UPV.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf

"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Presentation from Khronos

  • 1. Copyright © 2018 The Khronos Group 1
  • 2. Copyright © 2018 The Khronos Group 2 The Search for Vision Performance and Portability Leads to a Layered Acceleration Ecosystem Explicit Kernel APIs Diverse Hardware (GPUs, DSPs, FPGAs) Single Source C++ Languages Libraries, Frameworks and run-times Code Intermediate Representations PTX HSAIL GPU FPGA DSP Dedicated Hardware Vision Processing Libraries Inferencing Run-times WinML/DirectMLAR Libraries Machine Learning Training Frameworks GPU-only APIs Heterogenous Computing
  • 3. Copyright © 2018 The Khronos Group 3 Mobile Augmented Reality Libraries ARKit V1.0 is VR focused. Subsequent versions will extend to cross- platform AR Encapsulated Vision-based Functionality Also leveraging motion sensors AR Libraries typically use ARKit/ARCore if available or implement own tracking if not
  • 4. Copyright © 2018 The Khronos Group 4 Vision and Neural Net Inferencing Runtimes Neural Network Workflow Applications Using Embedded Vision and Inferencing Desktop and Cloud Hardware Trained Networks cuDNN MIOpen MKL-DNN FPGA DSP GPU Custom Hardware Diverse Inferencing Acceleration Hardware GPU CPU CPU CPU Framework Specific Formats Training = Desktop / Cloud Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Neural Net Training Frameworks Deployment on Embedded Devices Compilation and Optimization TPU Authoring interchange Embedded Deployment WinML
  • 5. Copyright © 2018 The Khronos Group 5 NNVM - Open Compiler for AI Inferencing http://www.tvmlang.org/2017/08/17/tvm-release-announcement.html SPIR-V IR for parallel accelerators Backend in development LLVM IR for CPUs 1.Import Trained Network Description 2. Graph-level Optimizations 3. Decompose into primitive instructions 4. Emit programs executable by run-times
  • 6. Copyright © 2018 The Khronos Group 6 SPIR-V Ecosystem LLVM Third party kernel and shader languages • Khronos defined cross-API IR • Native graphics and parallel compute • Stable specification to complement LLVM Open Source Project OpenCL C++ Front-end OpenCL C Front-end glslang Khronos-hosted Open Source Projects IHV Driver Runtimes SPIR-V Validator SPIR-V (Dis)Assembler LLVM to SPIR-V Bi-directional Translators https://github.com/KhronosGroup/SPIRV-Tools GLSL HLSL Khronos cooperating with Clang/LLVM Community SPIRV-Cross GLSL HLSL MSL SPIRV-opt | SPIRV-remap Optimization Tools DXC SYCL Front-end
  • 7. Copyright © 2018 The Khronos Group 7 Platform Neural Network Stacks Microsoft Windows Windows Machine Learning (WinML) Google Android Neural Network API (NNAPI) Apple MacOS and iOS CoreML Common Fundamental Steps 1. Import trained NN model file 2. Build optimized version of graph 3. Accelerate on GPU or other processor using available low-level API https://docs.microsoft.com/en-us/windows/uwp/machine-learning/ https://developer.android.com/ndk/guides/neuralnetworks/ https://developer.apple.com/documentation/coreml Core ML Model
  • 8. Copyright © 2018 The Khronos Group 8 NNEF Ecosystem Files Syntax Parser/ Validator TensorFlow and Caffe Exporters Comparing Neural Network Exchange Industry Initiatives NNEF open source projects hosted on Khronos NNEF GitHub repository Apache 2.0 license https://github.com/KhronosGroup/NNEF-Tools TensorFlow and Caffe2 Importer / Exporters Google NNAPI Convertor NNEF 1.0 Provisional Released for industry feedback before finalization OpenVX Ingestion & Execution Live Imminent
  • 9. Copyright © 2018 The Khronos Group 9 Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution NNEF File Structure Network Structure File Distilled, platform independent network description Human readable, syntactical elements from Python Standardized Operations Rigorously defined semantics Linear, convolution, pooling, normalization, activation, unary/binary Supports fully connected, convolutional, recurrent architectures Two Levels of Expressiveness FLAT: Basic transfer of computation graphs with standardized operations Simple to parse and translate to vendor specific formats COMPOSITIONAL: Define custom compound operations Higher-level graph descriptions More complex to parse but offers more optimization hints Network Data File Binary format contains parameter tensors Supports float and quantized (integer) data Flexible bit widths and quantization algorithms Quantization algorithms expressed as extensible compound operations Quantization info provided as hints for execution Split Structure and Data files Easy independent access to network structure or individual parameter data Set of files can use a container such as tar or zip with optional compression and encryption Can associate multiple Data Files with one Network Structure File e.g. the same data in multiple formats
  • 10. Copyright © 2018 The Khronos Group 10 • Convolution Neural Network topologies can be represented as OpenVX graphs • Can also combine traditional vision and neural network operations • OpenVX Neural Network Extension • Defines OpenVX nodes to represent many common NN layer types • Layer types include convolution, activation, pooling, fully-connected, soft-max • Defines multi-dimensional tensors objects to connect layers • Kernel Import Extension • Enables loading of external program representations into OpenVX graphs NNEF Execution within OpenVX Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes An OpenVX graph mixing CNN nodes with traditional vision nodes Importer converts NNEF representation into OpenVX Graphs using Kernel Import Extension
  • 11. Copyright © 2018 The Khronos Group 11 OpenCL Command Queue OpenVX / OpenCL Interop Extension Application Fully asynchronous host- device operations during data exchange RuntimeRuntime Map or copy OpenVX data objects into cl_mem buffers Copy or export cl_mem buffers into OpenVX data objects Enables custom OpenCL acceleration to be used within OpenVX User Kernels OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule OpenCL kernel execution
  • 12. Copyright © 2018 The Khronos Group 12 OpenCL Ecosystem Roadmap 2011 OpenCL 1.2 OpenCL C Kernel Language OpenCL 2.1 SPIR-V in Core 2015 SYCL 1.2 C++11 Single source programming OpenCL 2.2 C++ Kernel Language 2017 SYCL 1.2.1 C++11 Single source programming Work with industry to bring Heterogeneous compute to standard ISO C++ OpenCL ‘Next’ Flexible and efficient deployment of parallel computation across diverse processor architectures Single source C++ programming. Great for supporting C++ apps, libraries and frameworks More deployment options: Enabling dispatch of OpenCL C kernels from Vulkan runtimes
  • 13. Copyright © 2018 The Khronos Group 13 • Single-source heterogeneous programming using STANDARD C++ • Use C++ templates and lambda functions for host & device code • Layered over OpenCL • Fast and powerful path for bring C++ apps and libraries to OpenCL • C++ Kernel Fusion - better performance on complex software than hand-coding • SYCLBLAS, SYCL Eigen, SYCL TensorFlow, SYCL DNN, SYCL GTX, VisionCpp • Close cooperation with ISO C++ • C++17 Parallel STL hosted by Khronos • C++20 Parallel STL with Ranges • Implementations • triSYCL, ComputeCpp, ComputeCpp SDK … • More information at http://sycl.tech SYCL Ecosystem
  • 14. Copyright © 2018 The Khronos Group 14 Pervasive Vulkan 1.0 Major GPU Companies supporting Vulkan for Desktop and Mobile Platforms http://vulkan.gpuinfo.org/ Platforms Embedded Virtual Reality Cloud Services Mobile (Android 7.0+) Desktop Media Players Consoles
  • 15. Copyright © 2018 The Khronos Group 15 • Experimental collaboration between Google, Codeplay, and Adobe - Successfully tested on over 200K lines of Adobe OpenCL C production code • Compiles OpenCL C to Vulkan’s SPIR-V execution environment - Proof-of-concept that OpenCL kernels can be brought seamlessly to Vulkan - Significant parts OpenCL C 1.2 so far – shaped by submitted workloads Clspv OpenCL C to Vulkan Compiler Clspv Compiler OpenCL C Source Runtime OpenCL Host Code Run-time API Translator Prototype open source project https://github.com/google/clspv Possible future project – if interest? Increasing deployment options for OpenCL kernel developers e.g. Vulkan is a supported API on Android
  • 16. Copyright © 2018 The Khronos Group 16 • Vendors can support ANY combination of features to suit their hardware/market • If all exposed features are conformant – the implementation is conformant • Khronos will define Feature Sets equivalent to current profiles • Existing profiles and device types not going away! No changes to existing applications • Opportunity to coalesce industry support around market-focused feature sets • Khronos aiming to provide Feature Set infrastructure for the industry to leverage OpenCL Next – Deployment Flexibility OpenCL 2.2 Functionality = queryable, optional feature Khronos-defined OpenCL 2.2 Full Profile Feature Set Khronos-defined OpenCL 1.2 Full Profile Feature Set Industry-defined E.g. ‘Inferencing’ Feature Set
  • 17. Copyright © 2018 The Khronos Group 17 Safety Critical APIs – Khronos Experience Need for new-generation APIs for safety certifiable vision, graphics and compute e.g. ISO 26262 and DO-178B/C OpenGL ES 1.0 - 2003 Fixed function graphics OpenGL ES 2.0 - 2007 Shader programmable pipeline OpenGL SC 1.0 - 2005 Fixed function graphics subset OpenGL SC 2.0 - April 2016 Shader programmable pipeline subset General lack of industry consensus on how APIs should be designed to streamline safety certification OpenVX SC 1.1 - May 2017 Restricted “deployment” implementation only executes pre- compiled binary format
  • 18. Copyright © 2018 The Khronos Group 18 Khronos Safety Critical Advisory Forum OpenCL SC TSG Working on OpenCL SC Gathering requirements SYCL SC Guidelines to augment Industry First Safe and Secure Parallel and Heterogeneous C++ Safe AI for Automotive AESIN Automotive ADAS & AV + security https://aesin.org.uk MISRA C++ C++ WG23 Programming Vulnerabilities ISO C Safe and Secure SG ISO C++ Vulnerabilities Safety Critical SG Generating guidelines for designing safety critical APIs to ease system certification. Open to Khronos member AND industry experts https://www.khronos.org/advisors/kscaf Industry outreach and cooperation Khronos SC Activities OpenVX SC 1.1 - May 2017 Restricted “deployment” implementation only executes pre- compiled binary format We are inviting safety critical experts to join KSCAF! No cost or work commitment
  • 19. Copyright © 2018 The Khronos Group 19 • Advanced control of sensor and camera subsystem – with cross-platform portability • Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements • Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays • Cross sensor synch: e.g. synch of camera and MEMS sensors • Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI • Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing Need for Camera Control API? Image Signal Processor (ISP) Defines control of Sensor, Color Filter Array Lens, Flash, Focus, Aperture Auto Exposure (AE) Auto White Balance (AWB) Auto Focus (AF) Stream of Images for Vision Processing OpenKCAM standard is currently on ice – do we need to restart?
  • 20. Copyright © 2018 The Khronos Group 20 • Vision tools and API ecosystem becoming increasingly sophisticated • Layering libraries, languages and run-times • Machine learning stacks continue to grow in flexibility • Exchange formats and compiler technologies essential to flexible deployment • Open standards evolving alongside proprietary solutions • For developers that value cross-platform deployment • Safety-critical APIs becoming increasingly essential • Many vision applications need system certification • Still no cross-vendor camera APIs? • Is the time yet right for this to be a target for standardization? • Please join if your company interested in Khronos open standards • Neil Trevett | ntrevett@nvidia.com | @neilt3d Key Takeaways and What’s Next?