SlideShare a Scribd company logo
© 2020 The Khronos Group
Khronos Standard APIs for
Accelerating Vision and Inferencing
Neil Trevett
Khronos President
NVIDIA VP Developer Ecosystems
22nd September 2020
© 2020 The Khronos Group
Khronos Connects Software to Silicon
3D graphics, XR, parallel
programming, vision acceleration
and machine learning
Non-profit, member-driven
standards-defining industry
consortium
Open to any
interested company
All Khronos standards
are royalty-free
Well-defined IP Framework
protects participant’s
intellectual property
Founded in 2000
>150 Members ~ 40% US, 30% Europe, 30% Asia
Open interoperability standards to enable software to effectively
harness the power of 3D and multiprocessor acceleration
2
© 2020 The Khronos Group
Khronos Active Initiatives
3D Graphics
Desktop, Mobile, Web
Embedded and Safety Critical
3D Assets
Authoring
and Delivery
Portable XR
Augmented and
Virtual Reality
Parallel Computation
Vision, Inferencing,
Machine Learning
3
© 2020 The Khronos Group
Khronos Compute Acceleration Standards
Increasing industry
interest in parallel
compute acceleration
to combat the ‘End of
Moore’s Law’
GPU
GPU rendering +
compute
acceleration
Heterogeneous
compute
acceleration
Single source C++ programming
with compute acceleration
Graph-based vision and
inferencing acceleration
Lower-level APIs
Direct Hardware Control
Intermediate
Representation (IR)
supporting parallel
execution and
graphics
Higher-level
Languages and APIs
Streamlined development and
performance portability
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
AI/Tensor HW
Hardware
4
© 2020 The Khronos Group
Sensor
Data
Training
Data
Trained
Networks
Neural Network
Training
C++ Application
Code
Embedded Vision and Inferencing Acceleration
Compilation Ingestion
FPGA
DSP
Dedicated
Hardware
GPU
Vision / Inferencing
Engine
Compiled
Code
Hardware Acceleration APIs
Diverse Embedded Hardware
(GPUs, DSPs, FPGAs)
Applications link to compiled
inferencing code or call
vision/inferencing API
Networks trained on high-end
desktop and cloud systems
5
© 2020 The Khronos Group
NNEF Neural Network Exchange Format
Training Framework 1
Training Framework 2
Training Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Every Inferencing Engine needs a custom importer
from every Framework
Before - Training and Inferencing Fragmentation
After - NN Training and Inferencing Interoperability
Training Framework 1
Training Framework 2
Training Framework 3
Inference Engine 1
Inference Engine 2
Inference Engine 3
Common optimization
and processing tools
6
© 2020 The Khronos Group
NNEF and ONNX
NNEF V1.0 released in August 2018
After positive industry feedback on Provisional Specification.
Maintenance update issued in September 2019
Extensions to V1.0 released for expanded functionality
NNEF Working Group Participants
ONNX 1.6 Released in September 2019
Introduced support for Quantization
ONNX Runtime being integrated with GPU inferencing engines
such as NVIDIA TensorRT
ONNX Supporters
Embedded Inferencing Import Training Interchange
Defined Specification Open Source Project
Multi-company Governance at Khronos Initiated by Facebook & Microsoft
Stability for hardware deployment Software stack flexibility
ONNX and NNEF
are Complementary
ONNX moves quickly to track authoring
framework updates
NNEF provides a stable bridge from
training into edge inferencing engines
7
© 2020 The Khronos Group
NNEF Open Source Tools Ecosystem
Files
Caffe and
Caffe2
Import/Export
TensorFlow and
TensorFlow Lite
Import/Export
NNEF open source projects hosted on Khronos NNEF
GitHub repository under Apache 2.0
https://github.com/KhronosGroup/NNEF-Tools
ONNX
Import/Export
Syntax
Parser and
Validator
OpenVX
Ingestion and
Execution
NNEF Model Zoo
Now available on GitHub. Useful for
checking that ingested NNEF produces
acceptable results on target system
Compound operations
captured by exporting
graph Python script
NNEF adopts a rigorous approach to
design lifecycle
Especially important for safety-critical or
mission-critical applications in automotive,
industrial and infrastructure markets
8
© 2020 The Khronos Group
SYCL Single Source C++ Parallel Programming
GPU
FPGA DSP
Custom Hardware
GPU
CPU
CPU
CPU
Standard C++
Application
Code
C++
Libraries
ML
Frameworks
C++ Template
Libraries
C++ Template
Libraries
C++ Template
Libraries
SYCL Compiler
for OpenCL
CPU
Compiler
CPU
SYCL-BLAS, SYCL-DNN,
SYCL-Eigen,
SYCL Parallel STL
C++ templates and lambda
functions separate host &
accelerated device code
Accelerated code
passed into device
OpenCL compilers
Complex ML frameworks
can be directly compiled
and accelerated
SYCL is ideal for accelerating larger
C++-based engines and applications
with performance portability
C++ Kernel Fusion can give
better performance on
complex apps and libs than
hand-coding
AI/Tensor HW
9
© 2020 The Khronos Group
SYCL Implementations
Multiple Backends in Development
SYCL beginning to be supported on multiple
low-level APIs in addition to OpenCL
e.g. ROCm and CUDA
For more information: http://sycl.tech
SYCL enables Khronos to influence
ISO C++ to (eventually) support
heterogeneous compute
SYCL
Source Code
DPC++
Uses LLVM/Clang
Part of oneAPI
ComputeCpp
SYCL 1.2.1 on
multiple hardware
triSYCL
Open source
test bed
hipSYCL
SYCL 1.2.1 on
CUDA & HIP/ROCm
Any CPU
OpenCL +
SPIR-V
Any CPU
OpenCL +
SPIR(-V)
OpenCL+PTX
Intel CPUs
Intel GPUs
Intel FPGAs
Intel CPUs
Intel GPUs
Intel FPGAs
AMD GPUs
(depends on driver stack)
Arm Mali
IMG PowerVR
Renesas R-Car
NVIDIA GPUs
OpenMP
OpenCL +
SPIR/LLVM
XILINX FPGAs
POCL
(open source OpenCL supporting
CPUs and NVIDIA GPUs and more)
Any CPU
Experimental
OpenMP
ROCm
CUDA
AMD GPUs
NVIDIA GPUs
Any CPU
CUDA+PTX
NVIDIA GPUs
SYCL, OpenCL and SPIR-V, as open industry
standards, enable flexible integration and
deployment of multiple acceleration technologies
10
© 2020 The Khronos Group
OpenVX Cross-Vendor Vision and Inferencing
Vision
Node
Vision
Node
Vision
Node
Downstream
Application
Processing
Native
Camera
Control CNN Nodes
NNEF Translator converts NNEF
representation into OpenVX Node Graphs
OpenVX
High-level graph-based abstraction for portable, efficient vision processing
Graph can contain vision processing and NN nodes – enables global optimizations
Optimized OpenVX drivers created, optimized and shipped by processor vendors
Implementable on almost any hardware or processor with performance portability
Run-time graph execution need very little host CPU interaction
Performance comparable to hand-optimized, non-portable code
Real, complex applications on real, complex hardware
Much lower development effort than hand-optimized code
Hardware Implementations
OpenVX Graph
11
© 2020 The Khronos Group
OpenVX 1.3 Released October 2019
Deployment Flexibility through Feature Sets
Conformant Implementations ship one or more complete feature sets
Enables market-focused Implementations
- Baseline Graph Infrastructure (enables other Feature Sets)
- Default Vision Functions
- Enhanced Vision Functions (introduced in OpenVX 1.2)
- Neural Network Inferencing (including tensor objects)
- NNEF Kernel import (including tensor objects)
- Binary Images
- Safety Critical (reduced features for easier safety certification)
https://www.khronos.org/registry/OpenVX/specs/1.3/html/OpenVX_Specification_1_3.html
Functionality Consolidation into Core
Neural Net Extension, NNEF Kernel Import,
Safety Critical etc.
Open Source Conformance Test Suite
https://github.com/KhronosGroup/OpenVX-cts/tree/openvx_1.3
OpenCL Interop
Custom accelerated Nodes
OpenCL Command Queue
Application
cl_mem buffers
Fully asynchronous host-device
operations during data exchange
OpenVX data objects
Runtime
Runtime Map or copy OpenVX data objects
into cl_mem buffers
Copy or export
cl_mem buffers into OpenVX data
objects
OpenVX user-kernels can access command queue
and cl_mem objects to asynchronously schedule
OpenCL kernel execution
OpenVX/OpenCL Interop
12
© 2020 The Khronos Group
Open Source OpenVX & Samples
Open Source OpenVX Tutorial and Code Samples
https://github.com/rgiduthuri/openvx_tutorial
https://github.com/KhronosGroup/openvx-samples
Fully Conformant Open Source OpenVX 1.3
for Raspberry Pi
https://github.com/KhronosGroup/OpenVX-sample-impl/tree/openvx_1.3
Raspberry Pi 3 and 4 Model B with Raspbian OS
Memory access optimization via tiling/chaining
Highly optimized kernels on multimedia instruction set
Automatic parallelization for multicore CPUs and GPUs
Automatic merging of common kernel sequences
13
© 2020 The Khronos Group
OpenCL is Widely Deployed and Used
Accelerated Implementations
Modo
Desktop Creative Apps
CLBlast
SYCL-BLAS
Linear Algebra
Libraries
Parallel
Languages
Math and Physics
Libraries
Vision, Imaging
and Video Libraries
The industry’s most pervasive, cross-vendor, open standard
for low-level heterogeneous parallel programming
Arm Compute Library
SYCL-DNN
Machine Learning
Libraries and Frameworks
TI DL Library (TIDL)
VeriSilicon
Xiaomi
clDNN
Intel
Intel
Synopsis
MetaWare EV
NNAPI
https://en.wikipedia.org/wiki/List_of_OpenCL_applications
Vegas Pro
ForceBalance
Molecular Modelling Libraries
Machine Learning
Compilers
14
© 2020 The Khronos Group
OpenCL – Low-level Parallel Programing
Complements GPU-only APIs
Simpler programming model
Relatively lightweight run-time
More language flexibility, e.g. pointers
Rigorously defined numeric precision
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL
Kernel
Code
OpenCL C
Kernel
Code
GPU
DSP
CPU
CPU
FPGA
OpenCL
Devices
Host
CPU
NN HW
Runtime OpenCL API to
compile, load and execute
kernels across devices
Programming and Runtime Framework
for Application Acceleration
Offload compute-intensive kernels onto parallel
heterogeneous processors
CPUs, GPUs, DSPs, FPGAs, Tensor Processors
OpenCL C or C++ kernel languages
Platform Layer API
Query, select and initialize compute devices
Runtime API
Build and execute kernels programs on multiple devices
Explicit Application Control
Which programs execute on what device
Where data is stored in memories in the system
When programs are run, and what operations are
dependent on earlier operations
15
© 2020 The Khronos Group
OpenCL 3.0
OpenCL C:
- kernels,
- address spaces,
- special types,
...
Most of C++17:
- inheritance,
- templates,
- type deduction,
...
C++ for OpenCL
Increased Ecosystem Flexibility
All functionality beyond OpenCL 1.2 queryable plus
macros for optional OpenCL C language features
New extensions that become widely adopted will be
integrated into new OpenCL core specifications
OpenCL C++ for OpenCL
Open source C++ for OpenCL front end compiler
combines OpenCL C and C++17 replacing
OpenCL C++ language specification
Unified Specification
All versions of OpenCL in one specification for easier
maintenance, evolution and accessibility
Source on Khronos GitHub for community feedback,
functionality requests and bug fixes
Moving Applications to OpenCL 3.0
OpenCL 1.2 applications – no change
OpenCL 2.X applications - no code changes if all used
functionality is present
Queries recommended for future portability
C++ for OpenCL
Supported by Clang and uses the LLVM
compiler infrastructure
OpenCL C code is valid and fully compatible
Supports most C++17 features
Generates SPIR-V kernels
16
© 2020 The Khronos Group
Google Ports TensorFlow Lite to OpenCL
OpenCL providing ~2x inferencing
speedup over OpenGL ES
acceleration
TensorFlow Lite uses OpenGL ES as a
backup if OpenCL not available …
…but most mobile GPU vendors
provide an OpenCL drivers - even if
not exposed directly to Android
developers
OpenCL is increasingly used as
acceleration target for higher-level
framework and compilers
17
© 2020 The Khronos Group
Primary Machine Learning Compilers
Import Formats
Caffe, Keras,
MXNet, ONNX
TensorFlow Graph,
MXNet, PaddlePaddle,
Keras, ONNX
PyTorch, ONNX
TensorFlow Graph,
PyTorch, ONNX
Front-end / IR NNVM / Relay IR nGraph / Stripe IR Glow Core / Glow IR XLA HLO
Output
OpenCL, LLVM,
CUDA, Metal
OpenCL,
LLVM, CUDA
OpenCL
LLVM
LLVM, TPU IR, XLA IR
TensorFlow Lite / NNAPI
(inc. HW accel)
18
© 2020 The Khronos Group
ML Compiler Steps
1.Import Trained
Network Description
2. Apply graph-level
optimizations e.g. node fusion,
node lowering and memory tiling
3. Decompose to primitive
instructions and emit programs
for accelerated run-times
Consistent Steps
Fast progress but still area of intense research
If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and
won’t need complex metacommands (e.g. combined primitive commands like DirectML)
19
© 2020 The Khronos Group
Google MLIR and IREE Compilers
MLIR
Multi-level Intermediate Representation
Format and library of compiler utilities that sits
between the trained model representation and
low-level compilers/executors that generate
hardware-specific code
IREE
Intermediate Representation
Execution Environment
Lowers and optimizes ML models for real-time
accelerated inferencing on mobile/edge
heterogeneous hardware
Contains scheduling logic to communicate data
dependencies to low-level parallel pipelined
hardware/APIs like Vulkan, and execution logic
to encode dense computation in the form of
hardware/API-specific binaries like SPIR-V
IREE is a research project today. Google is working with Khronos
working groups to explore how SPIR-V code can provide effective
inferencing acceleration on APIs such as Vulkan through SPIR-V
Trained Models
Generate Hardware
Specific Binaries
Optimizes and Lowers
for Acceleration
20
© 2020 The Khronos Group
SPIR-V Language Ecosystem
OpenCL C
C++ for OpenCL
clspv
triSYCL
Intel DPC++
Codeplay
ComputeCpp
LLVM
Clang
SYCL
SPIR-V LLVM
IR Translator
Khronos Open Source
3rd Party Open Source
Language Definitions
Closed Source
Environment Specs
OpenCL Vulkan
OpenCLon12
Inc. Mesa SPIR-V to DXIL
SPIRV-Cross
GLSL
HLSL
Metal
Shading
Language
glslang
GLSL
HLSL DXC
DXIL
SPIR-V Tools
(Dis)Assembler
Validator
Optimize/Remap
Fuzzer
Reducer
OpenCL C
Online
Compilation
SPIR-V enables a rich ecosystem of languages and compilers to
target low-level APIs such as Vulkan and OpenCL, including
deployment flexibility: e.g. running OpenCL C kernels on Vulkan
IREE
21
© 2020 The Khronos Group
Khronos for Global Industry Collaboration
Khronos membership is open
to any company
Influence the design and direction
of key open standards that will
drive your business
Accelerate time-to-market with
early access to specification drafts
Provide industry thought
leadership and gain insights into
industry trends and directions
Benefit from Adopter discounts
www.khronos.org/members/
ntrevett@nvidia.com | @neilt3d
22
© 2020 The Khronos Group
Resources
• Khronos Website and home page for all Khronos Standards
• https://www.khronos.org/
• OpenCL Resources and C++ for OpenCL documentation
• https://www.khronos.org/opencl/resources
• https://github.com/KhronosGroup/Khronosdotorg/blob/master/api/opencl/assets/CXX_for_OpenCL.pdf
• OpenVX Tutorial, Samples and Sample Implementation
• https://github.com/rgiduthuri/openvx_tutorial
• https://github.com/KhronosGroup/openvx-samples
• https://github.com/KhronosGroup/OpenVX-sample-impl/tree/openvx_1.3
• NNEF Tools
• https://github.com/KhronosGroup/NNEF-Tools
• SYCL Resources
• http://sycl.tech
• SPIR-V User Guide
• https://github.com/KhronosGroup/SPIRV-Guide
• MLIR Blog
• https://blog.tensorflow.org/2019/04/mlir-new-intermediate-representation.html
• IREE GitHub Repository
• https://google.github.io/iree/
23

More Related Content

PDF
“How 5G is Pushing Processing to the Edge,” a Presentation from Inseego
PDF
“Deep Learning on Mobile Devices,” a Presentation from Siddha Ganju
PDF
Edge computing: Cord build 17 telefonica use cases
PDF
Edge Computing risks and Opportunities for Telco and hyperscalers
PDF
“Computer Vision for the Built Environment,” a Presentation from Nomad Go
PDF
“Productizing Edge AI Across Applications and Verticals: Case Study and Insig...
PDF
Telefonica innovation edge computing and services
PDF
Innovations in Edge Computing and MEC
“How 5G is Pushing Processing to the Edge,” a Presentation from Inseego
“Deep Learning on Mobile Devices,” a Presentation from Siddha Ganju
Edge computing: Cord build 17 telefonica use cases
Edge Computing risks and Opportunities for Telco and hyperscalers
“Computer Vision for the Built Environment,” a Presentation from Nomad Go
“Productizing Edge AI Across Applications and Verticals: Case Study and Insig...
Telefonica innovation edge computing and services
Innovations in Edge Computing and MEC

What's hot (20)

PDF
IoT Meetup HiveMQ and MQTT
PDF
Open Source 5G/Edge Automation via ONAP
PDF
Introducing the Vitis Unified Software Platform for Programming FPGAs
PDF
Cisco Service Provider Vision and Strategy: Business Transforming Through Inn...
PPTX
Edge Computing Architecture using GPUs and Kubernetes
PDF
Telefónica Edge Computing Case Study
PDF
Edge Computing: Bringing the Internet Closer to You
PDF
“Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection ...
PDF
5G Enablers and Use Cases, an European Pespective
PDF
"2D and 3D Sensing: Markets, Applications, and Technologies," a Presentation ...
PPTX
PPTX
LFI18-Solving the challenges of commissioning a wireless lighting infrastruc...
PDF
Virtualized Transport for Edge Computing Services
PDF
Orchestrating, operationalizing, monetizing SDN/NFV enabled networks
PDF
IoT based Industrial Gateway (IoT-SDK) built around Sitara™ AM437x processors...
PDF
Hey IT, Meet OT with Hima Mukkamala
PPT
Mr. Tam Kin Lui's presentation at QITCOM 2011
PDF
Ericsson introduces a hyperscale cloud solution
PDF
5 tips for open ran success
PDF
Industrial IoT bootcamp
IoT Meetup HiveMQ and MQTT
Open Source 5G/Edge Automation via ONAP
Introducing the Vitis Unified Software Platform for Programming FPGAs
Cisco Service Provider Vision and Strategy: Business Transforming Through Inn...
Edge Computing Architecture using GPUs and Kubernetes
Telefónica Edge Computing Case Study
Edge Computing: Bringing the Internet Closer to You
“Challenges and Approaches for Cascaded DNNs: A Case Study of Face Detection ...
5G Enablers and Use Cases, an European Pespective
"2D and 3D Sensing: Markets, Applications, and Technologies," a Presentation ...
LFI18-Solving the challenges of commissioning a wireless lighting infrastruc...
Virtualized Transport for Edge Computing Services
Orchestrating, operationalizing, monetizing SDN/NFV enabled networks
IoT based Industrial Gateway (IoT-SDK) built around Sitara™ AM437x processors...
Hey IT, Meet OT with Hima Mukkamala
Mr. Tam Kin Lui's presentation at QITCOM 2011
Ericsson introduces a hyperscale cloud solution
5 tips for open ran success
Industrial IoT bootcamp
Ad

Similar to “Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentation from the Khronos Group (20)

PDF
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
PDF
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
PDF
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
PDF
Coscup2018 itri android-in-cloud
PDF
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
PPTX
oneAPI: Industry Initiative & Intel Product
PDF
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
PDF
SYCL 2020 Specification
PDF
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
PDF
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
PDF
HKG18-100K1 - George Grey: Opening Keynote
PDF
OpenShift 4 installation
PDF
Software Tools for Building Industry 4.0 Applications
PDF
Canonical Ubuntu OpenStack Overview Presentation
PDF
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
PDF
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
PDF
La visualisation 3D distante sans compromis avec NICE DCV
PPTX
Transtec nice webinar v2
PDF
Red Hat Forum Benelux 2015
"Current and Planned Standards for Computer Vision and Machine Learning," a P...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
OpenCL Overview Japan Virtual Open House Feb 2021
Coscup2018 itri android-in-cloud
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
oneAPI: Industry Initiative & Intel Product
"Update on Khronos Standards for Vision and Machine Learning," a Presentation...
SYCL 2020 Specification
AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...
"APIs for Accelerating Vision and Inferencing: An Industry Overview of Option...
HKG18-100K1 - George Grey: Opening Keynote
OpenShift 4 installation
Software Tools for Building Industry 4.0 Applications
Canonical Ubuntu OpenStack Overview Presentation
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
La visualisation 3D distante sans compromis avec NICE DCV
Transtec nice webinar v2
Red Hat Forum Benelux 2015
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Cloud computing and distributed systems.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Programs and apps: productivity, graphics, security and other tools
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
Cloud computing and distributed systems.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentation from the Khronos Group

  • 1. © 2020 The Khronos Group Khronos Standard APIs for Accelerating Vision and Inferencing Neil Trevett Khronos President NVIDIA VP Developer Ecosystems 22nd September 2020
  • 2. © 2020 The Khronos Group Khronos Connects Software to Silicon 3D graphics, XR, parallel programming, vision acceleration and machine learning Non-profit, member-driven standards-defining industry consortium Open to any interested company All Khronos standards are royalty-free Well-defined IP Framework protects participant’s intellectual property Founded in 2000 >150 Members ~ 40% US, 30% Europe, 30% Asia Open interoperability standards to enable software to effectively harness the power of 3D and multiprocessor acceleration 2
  • 3. © 2020 The Khronos Group Khronos Active Initiatives 3D Graphics Desktop, Mobile, Web Embedded and Safety Critical 3D Assets Authoring and Delivery Portable XR Augmented and Virtual Reality Parallel Computation Vision, Inferencing, Machine Learning 3
  • 4. © 2020 The Khronos Group Khronos Compute Acceleration Standards Increasing industry interest in parallel compute acceleration to combat the ‘End of Moore’s Law’ GPU GPU rendering + compute acceleration Heterogeneous compute acceleration Single source C++ programming with compute acceleration Graph-based vision and inferencing acceleration Lower-level APIs Direct Hardware Control Intermediate Representation (IR) supporting parallel execution and graphics Higher-level Languages and APIs Streamlined development and performance portability GPU FPGA DSP Custom Hardware GPU CPU CPU CPU AI/Tensor HW Hardware 4
  • 5. © 2020 The Khronos Group Sensor Data Training Data Trained Networks Neural Network Training C++ Application Code Embedded Vision and Inferencing Acceleration Compilation Ingestion FPGA DSP Dedicated Hardware GPU Vision / Inferencing Engine Compiled Code Hardware Acceleration APIs Diverse Embedded Hardware (GPUs, DSPs, FPGAs) Applications link to compiled inferencing code or call vision/inferencing API Networks trained on high-end desktop and cloud systems 5
  • 6. © 2020 The Khronos Group NNEF Neural Network Exchange Format Training Framework 1 Training Framework 2 Training Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Every Inferencing Engine needs a custom importer from every Framework Before - Training and Inferencing Fragmentation After - NN Training and Inferencing Interoperability Training Framework 1 Training Framework 2 Training Framework 3 Inference Engine 1 Inference Engine 2 Inference Engine 3 Common optimization and processing tools 6
  • 7. © 2020 The Khronos Group NNEF and ONNX NNEF V1.0 released in August 2018 After positive industry feedback on Provisional Specification. Maintenance update issued in September 2019 Extensions to V1.0 released for expanded functionality NNEF Working Group Participants ONNX 1.6 Released in September 2019 Introduced support for Quantization ONNX Runtime being integrated with GPU inferencing engines such as NVIDIA TensorRT ONNX Supporters Embedded Inferencing Import Training Interchange Defined Specification Open Source Project Multi-company Governance at Khronos Initiated by Facebook & Microsoft Stability for hardware deployment Software stack flexibility ONNX and NNEF are Complementary ONNX moves quickly to track authoring framework updates NNEF provides a stable bridge from training into edge inferencing engines 7
  • 8. © 2020 The Khronos Group NNEF Open Source Tools Ecosystem Files Caffe and Caffe2 Import/Export TensorFlow and TensorFlow Lite Import/Export NNEF open source projects hosted on Khronos NNEF GitHub repository under Apache 2.0 https://github.com/KhronosGroup/NNEF-Tools ONNX Import/Export Syntax Parser and Validator OpenVX Ingestion and Execution NNEF Model Zoo Now available on GitHub. Useful for checking that ingested NNEF produces acceptable results on target system Compound operations captured by exporting graph Python script NNEF adopts a rigorous approach to design lifecycle Especially important for safety-critical or mission-critical applications in automotive, industrial and infrastructure markets 8
  • 9. © 2020 The Khronos Group SYCL Single Source C++ Parallel Programming GPU FPGA DSP Custom Hardware GPU CPU CPU CPU Standard C++ Application Code C++ Libraries ML Frameworks C++ Template Libraries C++ Template Libraries C++ Template Libraries SYCL Compiler for OpenCL CPU Compiler CPU SYCL-BLAS, SYCL-DNN, SYCL-Eigen, SYCL Parallel STL C++ templates and lambda functions separate host & accelerated device code Accelerated code passed into device OpenCL compilers Complex ML frameworks can be directly compiled and accelerated SYCL is ideal for accelerating larger C++-based engines and applications with performance portability C++ Kernel Fusion can give better performance on complex apps and libs than hand-coding AI/Tensor HW 9
  • 10. © 2020 The Khronos Group SYCL Implementations Multiple Backends in Development SYCL beginning to be supported on multiple low-level APIs in addition to OpenCL e.g. ROCm and CUDA For more information: http://sycl.tech SYCL enables Khronos to influence ISO C++ to (eventually) support heterogeneous compute SYCL Source Code DPC++ Uses LLVM/Clang Part of oneAPI ComputeCpp SYCL 1.2.1 on multiple hardware triSYCL Open source test bed hipSYCL SYCL 1.2.1 on CUDA & HIP/ROCm Any CPU OpenCL + SPIR-V Any CPU OpenCL + SPIR(-V) OpenCL+PTX Intel CPUs Intel GPUs Intel FPGAs Intel CPUs Intel GPUs Intel FPGAs AMD GPUs (depends on driver stack) Arm Mali IMG PowerVR Renesas R-Car NVIDIA GPUs OpenMP OpenCL + SPIR/LLVM XILINX FPGAs POCL (open source OpenCL supporting CPUs and NVIDIA GPUs and more) Any CPU Experimental OpenMP ROCm CUDA AMD GPUs NVIDIA GPUs Any CPU CUDA+PTX NVIDIA GPUs SYCL, OpenCL and SPIR-V, as open industry standards, enable flexible integration and deployment of multiple acceleration technologies 10
  • 11. © 2020 The Khronos Group OpenVX Cross-Vendor Vision and Inferencing Vision Node Vision Node Vision Node Downstream Application Processing Native Camera Control CNN Nodes NNEF Translator converts NNEF representation into OpenVX Node Graphs OpenVX High-level graph-based abstraction for portable, efficient vision processing Graph can contain vision processing and NN nodes – enables global optimizations Optimized OpenVX drivers created, optimized and shipped by processor vendors Implementable on almost any hardware or processor with performance portability Run-time graph execution need very little host CPU interaction Performance comparable to hand-optimized, non-portable code Real, complex applications on real, complex hardware Much lower development effort than hand-optimized code Hardware Implementations OpenVX Graph 11
  • 12. © 2020 The Khronos Group OpenVX 1.3 Released October 2019 Deployment Flexibility through Feature Sets Conformant Implementations ship one or more complete feature sets Enables market-focused Implementations - Baseline Graph Infrastructure (enables other Feature Sets) - Default Vision Functions - Enhanced Vision Functions (introduced in OpenVX 1.2) - Neural Network Inferencing (including tensor objects) - NNEF Kernel import (including tensor objects) - Binary Images - Safety Critical (reduced features for easier safety certification) https://www.khronos.org/registry/OpenVX/specs/1.3/html/OpenVX_Specification_1_3.html Functionality Consolidation into Core Neural Net Extension, NNEF Kernel Import, Safety Critical etc. Open Source Conformance Test Suite https://github.com/KhronosGroup/OpenVX-cts/tree/openvx_1.3 OpenCL Interop Custom accelerated Nodes OpenCL Command Queue Application cl_mem buffers Fully asynchronous host-device operations during data exchange OpenVX data objects Runtime Runtime Map or copy OpenVX data objects into cl_mem buffers Copy or export cl_mem buffers into OpenVX data objects OpenVX user-kernels can access command queue and cl_mem objects to asynchronously schedule OpenCL kernel execution OpenVX/OpenCL Interop 12
  • 13. © 2020 The Khronos Group Open Source OpenVX & Samples Open Source OpenVX Tutorial and Code Samples https://github.com/rgiduthuri/openvx_tutorial https://github.com/KhronosGroup/openvx-samples Fully Conformant Open Source OpenVX 1.3 for Raspberry Pi https://github.com/KhronosGroup/OpenVX-sample-impl/tree/openvx_1.3 Raspberry Pi 3 and 4 Model B with Raspbian OS Memory access optimization via tiling/chaining Highly optimized kernels on multimedia instruction set Automatic parallelization for multicore CPUs and GPUs Automatic merging of common kernel sequences 13
  • 14. © 2020 The Khronos Group OpenCL is Widely Deployed and Used Accelerated Implementations Modo Desktop Creative Apps CLBlast SYCL-BLAS Linear Algebra Libraries Parallel Languages Math and Physics Libraries Vision, Imaging and Video Libraries The industry’s most pervasive, cross-vendor, open standard for low-level heterogeneous parallel programming Arm Compute Library SYCL-DNN Machine Learning Libraries and Frameworks TI DL Library (TIDL) VeriSilicon Xiaomi clDNN Intel Intel Synopsis MetaWare EV NNAPI https://en.wikipedia.org/wiki/List_of_OpenCL_applications Vegas Pro ForceBalance Molecular Modelling Libraries Machine Learning Compilers 14
  • 15. © 2020 The Khronos Group OpenCL – Low-level Parallel Programing Complements GPU-only APIs Simpler programming model Relatively lightweight run-time More language flexibility, e.g. pointers Rigorously defined numeric precision OpenCL Kernel Code OpenCL Kernel Code OpenCL Kernel Code OpenCL C Kernel Code GPU DSP CPU CPU FPGA OpenCL Devices Host CPU NN HW Runtime OpenCL API to compile, load and execute kernels across devices Programming and Runtime Framework for Application Acceleration Offload compute-intensive kernels onto parallel heterogeneous processors CPUs, GPUs, DSPs, FPGAs, Tensor Processors OpenCL C or C++ kernel languages Platform Layer API Query, select and initialize compute devices Runtime API Build and execute kernels programs on multiple devices Explicit Application Control Which programs execute on what device Where data is stored in memories in the system When programs are run, and what operations are dependent on earlier operations 15
  • 16. © 2020 The Khronos Group OpenCL 3.0 OpenCL C: - kernels, - address spaces, - special types, ... Most of C++17: - inheritance, - templates, - type deduction, ... C++ for OpenCL Increased Ecosystem Flexibility All functionality beyond OpenCL 1.2 queryable plus macros for optional OpenCL C language features New extensions that become widely adopted will be integrated into new OpenCL core specifications OpenCL C++ for OpenCL Open source C++ for OpenCL front end compiler combines OpenCL C and C++17 replacing OpenCL C++ language specification Unified Specification All versions of OpenCL in one specification for easier maintenance, evolution and accessibility Source on Khronos GitHub for community feedback, functionality requests and bug fixes Moving Applications to OpenCL 3.0 OpenCL 1.2 applications – no change OpenCL 2.X applications - no code changes if all used functionality is present Queries recommended for future portability C++ for OpenCL Supported by Clang and uses the LLVM compiler infrastructure OpenCL C code is valid and fully compatible Supports most C++17 features Generates SPIR-V kernels 16
  • 17. © 2020 The Khronos Group Google Ports TensorFlow Lite to OpenCL OpenCL providing ~2x inferencing speedup over OpenGL ES acceleration TensorFlow Lite uses OpenGL ES as a backup if OpenCL not available … …but most mobile GPU vendors provide an OpenCL drivers - even if not exposed directly to Android developers OpenCL is increasingly used as acceleration target for higher-level framework and compilers 17
  • 18. © 2020 The Khronos Group Primary Machine Learning Compilers Import Formats Caffe, Keras, MXNet, ONNX TensorFlow Graph, MXNet, PaddlePaddle, Keras, ONNX PyTorch, ONNX TensorFlow Graph, PyTorch, ONNX Front-end / IR NNVM / Relay IR nGraph / Stripe IR Glow Core / Glow IR XLA HLO Output OpenCL, LLVM, CUDA, Metal OpenCL, LLVM, CUDA OpenCL LLVM LLVM, TPU IR, XLA IR TensorFlow Lite / NNAPI (inc. HW accel) 18
  • 19. © 2020 The Khronos Group ML Compiler Steps 1.Import Trained Network Description 2. Apply graph-level optimizations e.g. node fusion, node lowering and memory tiling 3. Decompose to primitive instructions and emit programs for accelerated run-times Consistent Steps Fast progress but still area of intense research If compiler optimizations are effective - hardware accelerator APIs can stay ‘simple’ and won’t need complex metacommands (e.g. combined primitive commands like DirectML) 19
  • 20. © 2020 The Khronos Group Google MLIR and IREE Compilers MLIR Multi-level Intermediate Representation Format and library of compiler utilities that sits between the trained model representation and low-level compilers/executors that generate hardware-specific code IREE Intermediate Representation Execution Environment Lowers and optimizes ML models for real-time accelerated inferencing on mobile/edge heterogeneous hardware Contains scheduling logic to communicate data dependencies to low-level parallel pipelined hardware/APIs like Vulkan, and execution logic to encode dense computation in the form of hardware/API-specific binaries like SPIR-V IREE is a research project today. Google is working with Khronos working groups to explore how SPIR-V code can provide effective inferencing acceleration on APIs such as Vulkan through SPIR-V Trained Models Generate Hardware Specific Binaries Optimizes and Lowers for Acceleration 20
  • 21. © 2020 The Khronos Group SPIR-V Language Ecosystem OpenCL C C++ for OpenCL clspv triSYCL Intel DPC++ Codeplay ComputeCpp LLVM Clang SYCL SPIR-V LLVM IR Translator Khronos Open Source 3rd Party Open Source Language Definitions Closed Source Environment Specs OpenCL Vulkan OpenCLon12 Inc. Mesa SPIR-V to DXIL SPIRV-Cross GLSL HLSL Metal Shading Language glslang GLSL HLSL DXC DXIL SPIR-V Tools (Dis)Assembler Validator Optimize/Remap Fuzzer Reducer OpenCL C Online Compilation SPIR-V enables a rich ecosystem of languages and compilers to target low-level APIs such as Vulkan and OpenCL, including deployment flexibility: e.g. running OpenCL C kernels on Vulkan IREE 21
  • 22. © 2020 The Khronos Group Khronos for Global Industry Collaboration Khronos membership is open to any company Influence the design and direction of key open standards that will drive your business Accelerate time-to-market with early access to specification drafts Provide industry thought leadership and gain insights into industry trends and directions Benefit from Adopter discounts www.khronos.org/members/ ntrevett@nvidia.com | @neilt3d 22
  • 23. © 2020 The Khronos Group Resources • Khronos Website and home page for all Khronos Standards • https://www.khronos.org/ • OpenCL Resources and C++ for OpenCL documentation • https://www.khronos.org/opencl/resources • https://github.com/KhronosGroup/Khronosdotorg/blob/master/api/opencl/assets/CXX_for_OpenCL.pdf • OpenVX Tutorial, Samples and Sample Implementation • https://github.com/rgiduthuri/openvx_tutorial • https://github.com/KhronosGroup/openvx-samples • https://github.com/KhronosGroup/OpenVX-sample-impl/tree/openvx_1.3 • NNEF Tools • https://github.com/KhronosGroup/NNEF-Tools • SYCL Resources • http://sycl.tech • SPIR-V User Guide • https://github.com/KhronosGroup/SPIRV-Guide • MLIR Blog • https://blog.tensorflow.org/2019/04/mlir-new-intermediate-representation.html • IREE GitHub Repository • https://google.github.io/iree/ 23