SlideShare a Scribd company logo
Community-Driven and Knowledge-Guided Optimization
of AI Applications Across the Whole SW/HW Stack
or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW ……
ARM Research SummitARM Research Summit
Cambridge, September 2017Cambridge, September 2017
Grigori FursinGrigori Fursin
CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK
Chief Scientist, cTuning foundationChief Scientist, cTuning foundation
… with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24)
A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) …
Various form factors:
IoT, mobile, data centers, supercomputers
Various constraints:
speed, energy, accuracy, size, resiliency, costs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24)
… leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24)
Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive?
AI users
We at dividiti.com perform
competitive analysis
and optimization
of the whole AI/SW/HW stack
for various realistic scenarios
(object detection,
image classification, etc)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24)
Scenario: image classification on mobile devices
800+ distinct mobile devices
mobile CPUs and GPUs
Caffe, TensorFlow
OpenBLAS, CLBlast, ViennaCL, Eigen
AlexNet, GoogleNet, SqueezeNet
ImageNet and user images
Requirement: speed vs cost
(vs energy vs accuracy
vs model size
vs memory usage
vs reliability…)
Price (euros)
Executiontime(sec)
Just a few winning "AI+SW+HW species"
must be optimized further
or may "extinct"
Obtained using our CK-based Android app to crowdsource experiments
across devices provided by volunteers (later in the talk)
cKnowledge.org/repo cKnowledge.org/ai
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24)
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Too many design and optimization choices at each level of continuously changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24)
Mobile device ServerMobile device Server
Data centersData centers
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs
Existing frameworks / algorithmsExisting frameworks / algorithms
Various modelsVarious models
User front-end (cloud, GRID,User front-end (cloud, GRID,
supercomputer, etc)
Algorithm / source codeAlgorithm / source code
Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…
Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK
CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …
C,C++,Fortran,Java,Python,byte code, assembler …
LLVM,GCC,ICC,Rose,PGI,Lift , functional programming …
cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,
clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons
diverse hardware: heterogeneous, out-of-order, caches
(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)
Linux (CentOS, Ubuntu, RedHat, SUSE, Debian),
Android, Windows, BSD, iOS, MacOS …
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Time to reinvent computer engineering
and enable open, collaborative and reproducible AI/SW/HW co-design!
Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming
Too many design and optimization choices at each level of continuosly changing SW/HW stack!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24)
cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack
Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Initial funding (2015)
Common experimental framework
for computer engineering and AI research
https://github.com/ctuning/ck
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24)
Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info)
Unified modelsUnified models
CK JSON APICK JSON API
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
AI frameworksAI frameworks
CK JSON APICK JSON API
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
… …
…
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24)
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API)
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24)
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Common JSON APICommon JSON API
Unified modelsUnified models
CK JSON APICK JSON API
AI frameworksAI frameworks
CK JSON APICK JSON API
… …
CK
API
CK
API
Image
classification
Image
classification
CK
API
CK
API
Object
detection
Object
detection
CK
API
CK
API
EmotionEmotion
analysis
Crowdsource AI expeirments
across diverse platforms
provided by volunteers
ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species)
cKnowledge.org/repo
Everyone is on the same page:
fair and reproducible competitions
CK metaCK metaMobileNets
GoogleNetGoogleNet
AlexNet
SqueezeNetSqueezeNet
ResNetResNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK metaTensorFlow
Caffe
Caffe2
CNTK
MxNetMxNet
CK metaCK meta
CK metaCK meta
CK metaCK meta
CK metaCK meta
…
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setupsetup softsoft
findfind
extract featuresextract features
datasetdataset
compilecompile
runrun
addadd
replayreplay
experimentexperiment
autotuneautotune
programprogram
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
with some desc.with some desc.
Ad-hoc scripts to perform some actions on some artifacts
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24)
CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components
setup soft
find
extract featuresextract features
dataset
compile
run
add
replay
experiment
autotune
program
TensorFlowTensorFlow
Caffe2Caffe2
ARM compute libARM compute lib
image classificationimage classification
object detectionobject detection
ImageNetImageNet
Car video streamCar video stream
Real surveillance cameraReal surveillance camera
GEMM OpenCLGEMM OpenCL
convolution CPUconvolution CPU
performance resultsperformance results
training / accuracytraining / accuracy
bugsbugs
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
JSON fileJSON file
/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info
Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
Collective Knowledge (github.com/ctuning/ck) –
$
$ ck pull
$ ck add
$ ck compile
$ ck run
Collective Knowledge (github.com/ctuning/ck) –
assists you in unifying, executing, sharing and reusing your artifacts:
$ sudo pip install ck
$ ck pull repo:ck-autotuning
$ ck add dataset:my-new-dataset (UID will be automatically generated)
$ ck compile program:cbench-automotive-susan
$ ck run program:cbench-automotive-susan
https://github.com/ctuning/ck/wiki/Shared-modules
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24)
We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
https://github.com/ctuning/ck/wiki/Shared-repos
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24)
We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK
ICC 17.0
CUDA 8.0CUDA 8.0
GCC 7.0
LLVM 4.0
Databases, local repositoriesDatabases, local repositories
Ad-hocinitAd-hocinit
scripts
Ad-hoc
scripts to
process CSV,
XLS, TXT, etc.
UnifiedAPI(input)UnifiedAPI(input)
Read
program
Read
program
meta
Detect all softwareDetect all software
dependencies; ask user
If multiple versions exists
Prepare
environment
CompileCompile
program
Run
program
UnifiedAPI(output)UnifiedAPI(output)
Ad-hoc experimental workflows
ProgramProgramCKprogram
CKpipeline
CK
compiler
CK AI
framework
CK math
library CK experiment
JSONJSON
CK program module can automatically adapt
to underlying environment via dependencies
Source files and auxiliary scriptsSource files and auxiliary scripts
CK program entry (native directory)CK program entry (native directory)
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
.cm/meta.json – describes soft dependencies ,
data sets, and how to compile and run this program
CK entries associated with a given
module describe a given object
using meta.json while storing all
necessary files and sub-directories
Caffe
OpenCL
Caffe CUDACaffe CUDA
TensorFlowTensorFlow
CPU/CUDA
MAGMA
cuBLAS
OpenBLASOpenBLAS
ViennaCL
CLBlast Stat. analysis,
predictive
analytics,
visualization
• github.com/dividiti/ck-caffe
• github.com/ctuning/ck-caffe2
• github.com/ctuning/ck-tensorflow
$ ck pull repo –url= github.com/dividiti/ck-caffe
$ ck compile program:caffe-classification
$ ck run program:caffe-classification
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24)
Automatically adapting workflow to any underlying software and hardware
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 03ca0be16962f471 / env.sh
Tags: compiler,cuda,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
local / env / 0a5ba198d48e3af3 / env.bat
Tags: lib,blas,cublas,v8.0
Soft entries in CK describe how
to detect if a given software is
already installed, how to set up
all its environment including
all paths (to binaries, libraries,
include, aux tools, etc),
and how to detect its version
$ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda
$ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc
$ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm
$ ck list soft:compiler*$ ck list soft:compiler*
$ ck detect soft:lib.cublas$ ck detect soft:lib.cublas
Env entries are created in CK local
repo for all found software
instances together with their meta
and an auto-generated environment
script env.sh (on Linux) or env.bat
(on Windows)
Package entries describe how to
install a given software if it is not
already installed (using CK Python
plugin together with install.sh
script on Linux host or install.bat
on Windows host)
$ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet
$ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val
$ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda
$ ck list package:*caffemodel*$ ck list package:*caffemodel*
LocalCKrepoLocalCKrepo
$ ck search soft --tags=blas$ ck search soft --tags=blas
$ ck show env$ ck show env
$ ck show env –tags=cublas$ ck show env –tags=cublas
$ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas
$ ck search package –tags=caffe$ ck search package –tags=caffe
$ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal
https://github.com/ctuning/ck/wiki/Portable-workflows
Multiple versions of tools may easily co-exist and plugged in to CK workflows!
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24)
Applying methodology from natural sciences to optimize computer systems
https://github.com/ctuning/ck/wiki/Autotuning
CK Python modules (wrappers) with a unified JSON API
CKinput(JSON/dict)
CKoutput(JSON/dict)
Unified input
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
ActionAction
Unified output
BehaviorBehavior
ChoicesChoices
FeaturesFeatures
StateState
b = B( c , f , s )
… … … …
Formalized function B
of a behavior of any CK object
Flattened CK JSON vectors
(dict converted to vector)
to simplify statistical analysis,
machine learning
and data mining
Some
actions
Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files
Chain CK modules to implement research workflows such as multi-objective autotuning and co-design
exploration
Choose
exploration
strategy
Perform SW/HW DSEPerform SW/HW DSE
(math transforms,
skeleton params,
compiler flags,
transformations …)
PerformPerform
stat.
analysis
Detect
(Pareto)
frontier
Model
optimizations
Model
behavior,
predict
optimizations
Reduce
complexity
SetSet
environment
for a given
tool version
CK program module
with pipeline function
CompileCompile
program
Run
code
i
i
i i
First expose coarse grain high-level choices, features, system state and behavior characteristics
Crowdsource benchmarking and random exploration across diverse inputs and devices;
Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24)
Prepare first proof-of-concept community experiments
Available libraries / skeletonsAvailable libraries / skeletons
CompilersCompilers
Binary or byte codeBinary or byte code
Hardware,
simulators
Hardware,
simulators
Run-time environmentRun-time environment
Run-time stateRun-time state
of the system
InputsInputs Various modelsVarious models
Algorithm / source codeAlgorithm / source code
AI frameworkAI framework
Algorithms: object classification, object detection
AI frameworks:
Caffe CPU, Caffe OpenCL, TensorFlow CPU
Math libraries:
OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN,
Eigen, gemmlowp
Compilers: GCC 5+
Models:
AlexNet, GoogleNet, VGG, ResNet,
SqueezeNet, SqueezeDet, SSD
Datasets: KITTI, COCO, VOC, ImageNet
Optimization choices: batch size, number of CPU threads
Characteristics:
total execution time (including OpenCL overheads),
top1/top5 model accuracy, static model size (MB),
device cost, max power consumption (if available)
System state: CPU/GPU frequency, memory
cKnowledge.org/repo
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
The number of distinct participated platforms:800+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Winning solutions
on various frontiers
Timeperimage(seconds)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24)
Crowdsource benchmarking across Android devices provided by volunteers
Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo
Winning solutions
on various frontiers
Firefly-RK3399
The number of distinct participated platforms:790+
The number of distinct CPUs: 260+
The number of distinct GPUs: 110+
The number of distinct OS: 280+
Power range: 1-10W
No need for a dedicated and expensive cloud –
volunteers help us validate research ideas
similar to SETI@HOME
Also collecting real images from users
for misclassifications to build an open
and continuously updated training set)!
Timeperimage(seconds)
Cost(euros)
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24)
Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399
Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom),
Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti)
Name Description Ranges
KWG 2D tiling at workgroup level {32,64}
KWI KWG kernel-loop can be unrolled by a factor KWI {1}
MDIMA Local Memory Re-shape {4,8}
MDIMC Local Memory Re-shape {8, 16, 32}
MWG 2D tiling at workgroup level {32, 64, 128}
NDIMB Local Memory Re-shape {8, 16, 32}
NDIMC Local Memory Re-shape {8, 16, 32}
NWG 2D tiling at workgroup level {16, 32}
SA manual caching using the local memory {0, 1}
SB manual caching using the local memory {0, 1}
STRM Striding within single thread for matrix A and C {0,1}
STRN Striding within single thread for matrix B {0,1}
VWM Vector width for loading A and C {8,16}
VWN Vector width for loading B {0,1}
Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast )
For now only two data sets (small & large)
Some extra constraints
to avoid illegal
combinations
Use different autotuners
under CK to speed up
design space exploration
based on probabilistic
focused search,
generic algorithms,
deep learning, SVM, KNN,
MARS, decision trees …
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24)
Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399
• Caffe with autotuned OpenBLAS (threads and batches) is the fastest
• Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with
OpenBLAS-based version– now worth making adaptive selection at run-time.
Sharing results in a reproducible way with the community for validation and improvement:
https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/
blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24)
• Bring together industry and academia to participate in open
and reproducible AI/SW/HW co-design competitions using CK framework
• Share more artifacts, workflows and results in a reusable
and customizable CK format (common JSON API and meta description)
• Collaboratively improve models and find missing features
• Gradually expose more design and optimization knobs at all AI/SW/HW levels
• Enable distributed on-line learning for self-optimizing and self-learning systems
http://cKnowledge.org/partners http://cKnowledge.org/publications
Join the growing Collective Knowledge community!

More Related Content

PDF
Accelerating open science and AI with automated, portable, customizable and r...
PDF
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
PDF
Enabling open and reproducible computer systems research: the good, the bad a...
PDF
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
PDF
Harnessing AI for the Benefit of All.
PDF
EPSRC CDT Conference
PDF
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
PDF
Enabling Artificial Intelligence - Alison B. Lowndes
Accelerating open science and AI with automated, portable, customizable and r...
“Streamlining Development of Edge AI Applications,” a Presentation from NVIDIA
Enabling open and reproducible computer systems research: the good, the bad a...
Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...
Harnessing AI for the Benefit of All.
EPSRC CDT Conference
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Enabling Artificial Intelligence - Alison B. Lowndes

What's hot (20)

PDF
Hardware in Space
PDF
NVIDIA Keynote #GTC21
PPTX
OpenACC Monthly Highlights - February 2018
PDF
Fuelling the AI Revolution with Gaming
PDF
Talk on commercialising space data
PDF
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
PPTX
PGI Compilers & Tools Update- March 2018
PDF
Talk on using AI to address some of humanities problems
PPTX
AI for All: Biology is eating the world & AI is eating Biology
PDF
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
PDF
AI + E-commerce
PDF
OpenPOWER/POWER9 AI webinar
PPTX
HPC Top 5 Stories: April 26, 2018
PPTX
OpenACC Monthly Highlights February 2019
PPT
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
PPTX
NVIDIA Developer Program Overview
PDF
FPGA-based soft-processors: 6G nodes and post-quantum security in space
PPTX
OpenACC Monthly Highlights: May 2019
PPTX
oneAPI: Industry Initiative & Intel Product
PPTX
WML OpenPOWER presentation
Hardware in Space
NVIDIA Keynote #GTC21
OpenACC Monthly Highlights - February 2018
Fuelling the AI Revolution with Gaming
Talk on commercialising space data
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
PGI Compilers & Tools Update- March 2018
Talk on using AI to address some of humanities problems
AI for All: Biology is eating the world & AI is eating Biology
Hire a Machine to Code - Michael Arthur Bucko & Aurélien Nicolas
AI + E-commerce
OpenPOWER/POWER9 AI webinar
HPC Top 5 Stories: April 26, 2018
OpenACC Monthly Highlights February 2019
C:\Alon Tech\New Tech\Embedded Conf Tlv\Prez\Sightsys Embedded Day
NVIDIA Developer Program Overview
FPGA-based soft-processors: 6G nodes and post-quantum security in space
OpenACC Monthly Highlights: May 2019
oneAPI: Industry Initiative & Intel Product
WML OpenPOWER presentation
Ad

Similar to Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge (20)

PDF
Inria - Software assets - Energy
PDF
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
ODP
Linux and Open Source in Math, Science and Engineering
PDF
NVIDIA at Breakthrough Discuss for Space Exploration
PPTX
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
PDF
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
PDF
Key Trends Shaping the Future of Infrastructure.pdf
PPTX
Scientific Computing @ Fred Hutch
PDF
Elastic r sc10-tutorial
PDF
Inria - Software assets - Aerospace
PDF
The future of AI is hybrid
PDF
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
PDF
Software used in Electronics and Communication
PDF
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
PDF
AI in Finance: Moving forward!
PPT
Cluster Tutorial
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
20130503 iCore at calipso workshop fia dublin
Inria - Software assets - Energy
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Linux and Open Source in Math, Science and Engineering
NVIDIA at Breakthrough Discuss for Space Exploration
OpenACC and Open Hackathons Monthly Highlights: September 2022.pptx
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
Key Trends Shaping the Future of Infrastructure.pdf
Scientific Computing @ Fred Hutch
Elastic r sc10-tutorial
Inria - Software assets - Aerospace
The future of AI is hybrid
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Software used in Electronics and Communication
"Imaging + AI: Opportunities Inside the Car and Beyond," a Presentation from ...
AI in Finance: Moving forward!
Cluster Tutorial
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
20130503 iCore at calipso workshop fia dublin
Ad

More from Grigori Fursin (8)

PDF
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
PDF
CK: from ad hoc computer engineering to collaborative and reproducible data s...
PDF
Collective Knowledge: python and scikit-learn based open research SDK for col...
PDF
Artifact Evaluation Experience CGO'15 / PPoPP'15
PDF
Collective Mind: bringing reproducible research to the masses
PDF
Panel at acm_sigplan_trust2014
PDF
Collective Mind: a collaborative curation tool for program optimization
PDF
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...
CGO/PPoPP'17 Artifact Evaluation Discussion (enabling open and reproducible r...
CK: from ad hoc computer engineering to collaborative and reproducible data s...
Collective Knowledge: python and scikit-learn based open research SDK for col...
Artifact Evaluation Experience CGO'15 / PPoPP'15
Collective Mind: bringing reproducible research to the masses
Panel at acm_sigplan_trust2014
Collective Mind: a collaborative curation tool for program optimization
Collective Mind infrastructure and repository to crowdsource auto-tuning (c-m...

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to machine learning and Linear Models
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
oil_refinery_comprehensive_20250804084928 (1).pptx
A Quantitative-WPS Office.pptx research study
Moving the Public Sector (Government) to a Digital Adoption
Miokarditis (Inflamasi pada Otot Jantung)
IB Computer Science - Internal Assessment.pptx
Database Infoormation System (DBIS).pptx
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
.pdf is not working space design for the following data for the following dat...
Business Acumen Training GuidePresentation.pptx
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to machine learning and Linear Models

Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions and Collective Knowledge

  • 1. Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW …… ARM Research SummitARM Research Summit Cambridge, September 2017Cambridge, September 2017 Grigori FursinGrigori Fursin CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK Chief Scientist, cTuning foundationChief Scientist, cTuning foundation … with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions
  • 2. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((2 of 24)of 24) A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) … Various form factors: IoT, mobile, data centers, supercomputers Various constraints: speed, energy, accuracy, size, resiliency, costs
  • 3. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((3 of 24)of 24) … leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos
  • 4. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((4 of 24)of 24) Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive? AI users We at dividiti.com perform competitive analysis and optimization of the whole AI/SW/HW stack for various realistic scenarios (object detection, image classification, etc)
  • 5. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((5 of 24)of 24) Scenario: image classification on mobile devices 800+ distinct mobile devices mobile CPUs and GPUs Caffe, TensorFlow OpenBLAS, CLBlast, ViennaCL, Eigen AlexNet, GoogleNet, SqueezeNet ImageNet and user images Requirement: speed vs cost (vs energy vs accuracy vs model size vs memory usage vs reliability…) Price (euros) Executiontime(sec) Just a few winning "AI+SW+HW species" must be optimized further or may "extinct" Obtained using our CK-based Android app to crowdsource experiments across devices provided by volunteers (later in the talk) cKnowledge.org/repo cKnowledge.org/ai
  • 6. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((6 of 24)of 24) Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… 100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Too many design and optimization choices at each level of continuously changing SW/HW stack!
  • 7. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((7 of 24)of 24) Mobile device ServerMobile device Server Data centersData centers Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Existing frameworks / algorithmsExisting frameworks / algorithms Various modelsVarious models User front-end (cloud, GRID,User front-end (cloud, GRID, supercomputer, etc) Algorithm / source codeAlgorithm / source code Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson… Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs … C,C++,Fortran,Java,Python,byte code, assembler … LLVM,GCC,ICC,Rose,PGI,Lift , functional programming … cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons diverse hardware: heterogeneous, out-of-order, caches (ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon) Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS … Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Time to reinvent computer engineering and enable open, collaborative and reproducible AI/SW/HW co-design! Optimization is adOptimization is ad--hoc, tedious, expensive and time consuminghoc, tedious, expensive and time consuming Too many design and optimization choices at each level of continuosly changing SW/HW stack!
  • 8. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((8 of 24)of 24) cKnowledge.org:cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16 Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Initial funding (2015) Common experimental framework for computer engineering and AI research https://github.com/ctuning/ck
  • 9. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((9 of 24)of 24) Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info) Unified modelsUnified models CK JSON APICK JSON API CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta AI frameworksAI frameworks CK JSON APICK JSON API CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta … … … Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API
  • 10. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((10 of 24)of 24) Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API) CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  • 11. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((11 of 24)of 24) Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Common JSON APICommon JSON API Unified modelsUnified models CK JSON APICK JSON API AI frameworksAI frameworks CK JSON APICK JSON API … … CK API CK API Image classification Image classification CK API CK API Object detection Object detection CK API CK API EmotionEmotion analysis Crowdsource AI expeirments across diverse platforms provided by volunteers ContinuousContinuous competition ofcompetition of various AI/SW/HW combinationsvarious AI/SW/HW combinations ((species)species) cKnowledge.org/repo Everyone is on the same page: fair and reproducible competitions CK metaCK metaMobileNets GoogleNetGoogleNet AlexNet SqueezeNetSqueezeNet ResNetResNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK metaTensorFlow Caffe Caffe2 CNTK MxNetMxNet CK metaCK meta CK metaCK meta CK metaCK meta CK metaCK meta …
  • 12. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((12 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setupsetup softsoft findfind extract featuresextract features datasetdataset compilecompile runrun addadd replayreplay experimentexperiment autotuneautotune programprogram TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. with some desc.with some desc. Ad-hoc scripts to perform some actions on some artifacts
  • 13. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((13 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta
  • 14. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((14 of 24)of 24) CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components setup soft find extract featuresextract features dataset compile run add replay experiment autotune program TensorFlowTensorFlow Caffe2Caffe2 ARM compute libARM compute lib image classificationimage classification object detectionobject detection ImageNetImageNet Car video streamCar video stream Real surveillance cameraReal surveillance camera GEMM OpenCLGEMM OpenCL convolution CPUconvolution CPU performance resultsperformance results training / accuracytraining / accuracy bugsbugs JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file JSON fileJSON file / 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info Python modulePython moduleJSON APIJSON API holder for original artifactholder for original artifact CK metaCK meta Collective Knowledge (github.com/ctuning/ck) – $ $ ck pull $ ck add $ ck compile $ ck run Collective Knowledge (github.com/ctuning/ck) – assists you in unifying, executing, sharing and reusing your artifacts: $ sudo pip install ck $ ck pull repo:ck-autotuning $ ck add dataset:my-new-dataset (UID will be automatically generated) $ ck compile program:cbench-automotive-susan $ ck run program:cbench-automotive-susan https://github.com/ctuning/ck/wiki/Shared-modules
  • 15. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((15 of 24)of 24) We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification https://github.com/ctuning/ck/wiki/Shared-repos
  • 16. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((16 of 24)of 24) We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK ICC 17.0 CUDA 8.0CUDA 8.0 GCC 7.0 LLVM 4.0 Databases, local repositoriesDatabases, local repositories Ad-hocinitAd-hocinit scripts Ad-hoc scripts to process CSV, XLS, TXT, etc. UnifiedAPI(input)UnifiedAPI(input) Read program Read program meta Detect all softwareDetect all software dependencies; ask user If multiple versions exists Prepare environment CompileCompile program Run program UnifiedAPI(output)UnifiedAPI(output) Ad-hoc experimental workflows ProgramProgramCKprogram CKpipeline CK compiler CK AI framework CK math library CK experiment JSONJSON CK program module can automatically adapt to underlying environment via dependencies Source files and auxiliary scriptsSource files and auxiliary scripts CK program entry (native directory)CK program entry (native directory) .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program .cm/meta.json – describes soft dependencies , data sets, and how to compile and run this program CK entries associated with a given module describe a given object using meta.json while storing all necessary files and sub-directories Caffe OpenCL Caffe CUDACaffe CUDA TensorFlowTensorFlow CPU/CUDA MAGMA cuBLAS OpenBLASOpenBLAS ViennaCL CLBlast Stat. analysis, predictive analytics, visualization • github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow $ ck pull repo –url= github.com/dividiti/ck-caffe $ ck compile program:caffe-classification $ ck run program:caffe-classification
  • 17. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((17 of 24)of 24) Automatically adapting workflow to any underlying software and hardware local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 03ca0be16962f471 / env.sh Tags: compiler,cuda,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 local / env / 0a5ba198d48e3af3 / env.bat Tags: lib,blas,cublas,v8.0 Soft entries in CK describe how to detect if a given software is already installed, how to set up all its environment including all paths (to binaries, libraries, include, aux tools, etc), and how to detect its version $ ck detect soft --tags=compiler,cuda$ ck detect soft --tags=compiler,cuda $ ck detect soft:compiler.gcc$ ck detect soft:compiler.gcc $ ck detect soft:compiler.llvm$ ck detect soft:compiler.llvm $ ck list soft:compiler*$ ck list soft:compiler* $ ck detect soft:lib.cublas$ ck detect soft:lib.cublas Env entries are created in CK local repo for all found software instances together with their meta and an auto-generated environment script env.sh (on Linux) or env.bat (on Windows) Package entries describe how to install a given software if it is not already installed (using CK Python plugin together with install.sh script on Linux host or install.bat on Windows host) $ ck install package:caffemodel-bvlc-googlenet$ ck install package:caffemodel-bvlc-googlenet $ ck install package:imagenet-2012-val$ ck install package:imagenet-2012-val $ ck install package:lib-tensorflow-cuda$ ck install package:lib-tensorflow-cuda $ ck list package:*caffemodel*$ ck list package:*caffemodel* LocalCKrepoLocalCKrepo $ ck search soft --tags=blas$ ck search soft --tags=blas $ ck show env$ ck show env $ ck show env –tags=cublas$ ck show env –tags=cublas $ ck rm env:* –tags=cublas$ ck rm env:* –tags=cublas $ ck search package –tags=caffe$ ck search package –tags=caffe $ ck list package:*tensorflow*$ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal$ ck install package:lib-caffe-bvlc-master-cuda-universal https://github.com/ctuning/ck/wiki/Portable-workflows Multiple versions of tools may easily co-exist and plugged in to CK workflows!
  • 18. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((18 of 24)of 24) Applying methodology from natural sciences to optimize computer systems https://github.com/ctuning/ck/wiki/Autotuning CK Python modules (wrappers) with a unified JSON API CKinput(JSON/dict) CKoutput(JSON/dict) Unified input BehaviorBehavior ChoicesChoices FeaturesFeatures StateState ActionAction Unified output BehaviorBehavior ChoicesChoices FeaturesFeatures StateState b = B( c , f , s ) … … … … Formalized function B of a behavior of any CK object Flattened CK JSON vectors (dict converted to vector) to simplify statistical analysis, machine learning and data mining Some actions Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files Chain CK modules to implement research workflows such as multi-objective autotuning and co-design exploration Choose exploration strategy Perform SW/HW DSEPerform SW/HW DSE (math transforms, skeleton params, compiler flags, transformations …) PerformPerform stat. analysis Detect (Pareto) frontier Model optimizations Model behavior, predict optimizations Reduce complexity SetSet environment for a given tool version CK program module with pipeline function CompileCompile program Run code i i i i First expose coarse grain high-level choices, features, system state and behavior characteristics Crowdsource benchmarking and random exploration across diverse inputs and devices; Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs
  • 19. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((19 of 24)of 24) Prepare first proof-of-concept community experiments Available libraries / skeletonsAvailable libraries / skeletons CompilersCompilers Binary or byte codeBinary or byte code Hardware, simulators Hardware, simulators Run-time environmentRun-time environment Run-time stateRun-time state of the system InputsInputs Various modelsVarious models Algorithm / source codeAlgorithm / source code AI frameworkAI framework Algorithms: object classification, object detection AI frameworks: Caffe CPU, Caffe OpenCL, TensorFlow CPU Math libraries: OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN, Eigen, gemmlowp Compilers: GCC 5+ Models: AlexNet, GoogleNet, VGG, ResNet, SqueezeNet, SqueezeDet, SSD Datasets: KITTI, COCO, VOC, ImageNet Optimization choices: batch size, number of CPU threads Characteristics: total execution time (including OpenCL overheads), top1/top5 model accuracy, static model size (MB), device cost, max power consumption (if available) System state: CPU/GPU frequency, memory cKnowledge.org/repo
  • 20. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((20 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo The number of distinct participated platforms:800+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Winning solutions on various frontiers Timeperimage(seconds) Cost(euros)
  • 21. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((21 of 24)of 24) Crowdsource benchmarking across Android devices provided by volunteers Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo Winning solutions on various frontiers Firefly-RK3399 The number of distinct participated platforms:790+ The number of distinct CPUs: 260+ The number of distinct GPUs: 110+ The number of distinct OS: 280+ Power range: 1-10W No need for a dedicated and expensive cloud – volunteers help us validate research ideas similar to SETI@HOME Also collecting real images from users for misclassifications to build an open and continuously updated training set)! Timeperimage(seconds) Cost(euros)
  • 22. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((22 of 24)of 24) Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399 Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom), Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti) Name Description Ranges KWG 2D tiling at workgroup level {32,64} KWI KWG kernel-loop can be unrolled by a factor KWI {1} MDIMA Local Memory Re-shape {4,8} MDIMC Local Memory Re-shape {8, 16, 32} MWG 2D tiling at workgroup level {32, 64, 128} NDIMB Local Memory Re-shape {8, 16, 32} NDIMC Local Memory Re-shape {8, 16, 32} NWG 2D tiling at workgroup level {16, 32} SA manual caching using the local memory {0, 1} SB manual caching using the local memory {0, 1} STRM Striding within single thread for matrix A and C {0,1} STRN Striding within single thread for matrix B {0,1} VWM Vector width for loading A and C {8,16} VWN Vector width for loading B {0,1} Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast ) For now only two data sets (small & large) Some extra constraints to avoid illegal combinations Use different autotuners under CK to speed up design space exploration based on probabilistic focused search, generic algorithms, deep learning, SVM, KNN, MARS, decision trees …
  • 23. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((23 of 24)of 24) Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399 • Caffe with autotuned OpenBLAS (threads and batches) is the fastest • Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with OpenBLAS-based version– now worth making adaptive selection at run-time. Sharing results in a reproducible way with the community for validation and improvement: https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/ blob/master/script/batch_size-libs-models/analysis.20170531.ipynb
  • 24. cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitionsdesign competitions ((24 of 24)of 24) • Bring together industry and academia to participate in open and reproducible AI/SW/HW co-design competitions using CK framework • Share more artifacts, workflows and results in a reusable and customizable CK format (common JSON API and meta description) • Collaboratively improve models and find missing features • Gradually expose more design and optimization knobs at all AI/SW/HW levels • Enable distributed on-line learning for self-optimizing and self-learning systems http://cKnowledge.org/partners http://cKnowledge.org/publications Join the growing Collective Knowledge community!