SlideShare a Scribd company logo
Tutorial:
Deep Learning Implementations
and Frameworks
Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+
*Preferred Networks, Inc. (PFN)
{tokui,oono}@preferred.jp
+National Institute of Advanced Industrial Science and Technology (AIST)
atsu-kan@aist.go.jp, mail@kamishima.net
1
Overview of this tutorial
•1st session (KO, 8:30 ‒ 10:00)
•Introduction
•Basics of neural networks
•Common design of neural network
implementations
•2nd session (ST, 10:30 ‒ 12:30)
•Differences of deep learning frameworks
•Coding examples of frameworks
•Conclusion
Common Design of
Deep Learning Frameworks
Kenta Oono <oono@preferred.jp>
Preferred Networks Inc.
2016/4/19 3DLIF Tutorial @ PAKDD2016
Objective of this part
• How deep learning frameworks represent various neural
networks.
• How deep learning frameworks realize the training procedure
of neural networks.
• Technology stack that is common to most of deep learning
frameworks.
2016/4/19 4DLIF Tutorial @ PAKDD2016
Steps for training neural networks
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the Neural Network (NN) parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
2016/4/19 5DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
2016/4/19 6DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 7DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
Neural Network as a Computational Graph
• In simplest form, NN is represented as a computational graph
(CG) that is a stack of bipartite DAGs (Directed Acyclic Graph)
consisting of data nodes and operator nodes.
y = x1 * x2
z = y - x3
x1 mul suby
x3
z
x2
data node
operator node
2016/4/19 8DLIF Tutorial @ PAKDD2016
Example: Multi-layer Perceptron (MLP)
x Affine
W1 b1
h1 ReLU a1
Affine
W2 b2
h2 ReLU a2
Soft
max
y
Cross
Entropy
Lo
ss
t
It is choice of
implementation if CG
includes weights and
biases.
2016/4/19 9DLIF Tutorial @ PAKDD2016
Example: Recurrent Neural Network (RNN)
x1
RNN
Unit
h1
RNN
Unit
x2
h2
RNN
Unit
xT
h0 ・・・ hT
RNN unit can be :
• Affine + activation function
• LSTM (Long Short-Term
Memory)
• GRU (Gated Recurrent Unit)
x h y
xt
ht-1
ht
W b
2016/4/19 10DLIF Tutorial @ PAKDD2016
Example: Stacked RNN
x1
RNN
Unit
h1
RNN
Unit
x2
h2
RNN
Unit
xT
h0 ・・・ hT
RNN
Unit
z1
RNN
Unit
z2
RNN
Unit
z0 ・・・ zT
Soft
max
Affine y
2016/4/19 11DLIF Tutorial @ PAKDD2016
Example: RNN with control flow nodes
loop
enter
s
i
predic
ate
pr
ed
s
h0
x
switch s
RNN
Unit
s’update
loop
end
y
pred=True
pred=False
• TensorFlow has control flow
nodes (e.g. cond, switch,
while)
• As CG has a loop, some
mechanism is necessary that
resolves he dependency of
nodes to schedule the order
of calculation.
W
b
2016/4/19 12DLIF Tutorial @ PAKDD2016
Automatic Differentiation
• Computes gradient of some specified data nodes (e.g. loss)
with respect to each data node.
• Each operator node must have backward operation to
calculate gradients w.r.t. its inputs from gradients w.r.t. its
outputs (realization of chain rule).
• e.g. Function class of Chainer has backward method.
• e.g. Each layer classes of Caffe has Backward_cpu and
Backward_gpu methods
• e.g. Autograd has a thin wrapper that adds gradient methods as a
closure to most of NumPy methods.
2016/4/19 13DLIF Tutorial @ PAKDD2016
Backprop through CG
∇y z∇x1 z ∇z z = 1
y = x1 * x2
z = y - x3
x1 mul suby
x3
z
x2
2016/4/19 14DLIF Tutorial @ PAKDD2016
Backprop as extended graphs
x1 mul suby
x3
z
x2
dzid
neg
mul
mul
dy
dx
3
dx
1
dx
2
forward
propagation
backward
propagation
y = x1 * x2
z = y - x3
2016/4/19 15DLIF Tutorial @ PAKDD2016
Example: Theano
2016/4/19 16DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 17DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
Numerical optimizer
• Many gradient-based optimization algorithms are
implemented.
• Stochastic Gradient Descent (SGD) is implemented in most DL
frameworks.
• It depends on concrete tasks which optimizer works best.
w: parameters of neural network
θ: states of optimizer
L: loss function
Γ: optimizer-specific function
initialize w, θ
until meet the criteria:
get data (x, y)
calculate ∇w L(x, y; w)
w, θ ← Γ(w, θ, ∇w L)
2016/4/19 18DLIF Tutorial @ PAKDD2016
Serialization
• Save/Load the snapshot of training process in specified format
(e.g. hdf5, npz, protobuf)
• Models to be trained (= architectures and parameters of NNs)
• States of training procedure (e.g. epoch, learning rate, momentum)
• Serialization enhance the portability of models.
• Publish pre-trained model (e.g. Model Zoo (Caffe), MXNet, TensorFlow)
• Import pre-trained model of other DL frameworks
• e.g. Chainer supports BVLC-official reference models of Caffe.
2016/4/19 19DLIF Tutorial @ PAKDD2016
Computational optimizer
• Convert CGs to make them simplified and efficient.
e.g. Theano
y = x1 * x2
z = y - x3
2016/4/19 20DLIF Tutorial @ PAKDD2016
Abstraction of ML workflow
• Offers typical training/validation/evaluation procedures as APIs.
• Users should call a single API and do not have to write the procedure
manually.
• e.g. fit, evaluate methods of Model class in Keras.
2016/4/19 21DLIF Tutorial @ PAKDD2016
Prepare the training dataset
Repeat until meeting some criterion
Prepare for the next (mini) batch
Compute the loss (forward prop)
Initialize the Neural Network (NN) parameters
Save the NN parameters
Define how to compute the loss of this batch
Compute the gradient (backprop)
Update the NN parameters
Graphical interface
• Computational graph management
• Editor, Visualizer
• Visualization of training procedure
• Visualization of feature maps, output of NNs etc.
• Transition of error and accuracy
• Performance monitor
• e.g. Throughput, latency, memory usage
2016/4/19 22DLIF Tutorial @ PAKDD2016
Technology stack of DL framework
2016/4/19 23DLIF Tutorial @ PAKDD2016
name functions example
Graphical interface DIGITS, TensorBoard
Machine learning workflow
management
Dataset Management
Training Loop
Keras, Lasagne
Blocks, TF Learn
Computational graph
management
Build computational graph
Forward prop/Backprop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array library
Linear algebra NumPy, CuPy
Eigen, torch (core)
Numerical computation
package
Matrix operation
Convolution
BLAS, cuBLAS, cuDNN
Hardware CPU, GPU
GPU support
• CUDA: Computing platform for GPGPU on NVIDIA GPU
• language extension, compiler, library etc.
• DL frameworks prepare wrappers for CUDA.
• GPU-array library that utilizes cuBLAS, cuRAND etc.
• Layer implementation with cuDNN (e.g. Convolution, sigmoid, LSTM)
• Designed to switch CPU and GPU easily.
• e.g. Users can write CPU-GPU agnostic code.
• e.g. Switch CPU/GPU with environment variables.
• Some framework supports Open CL as a GPU environment, but
CUDA is more popular for now.
2016/4/19 24DLIF Tutorial @ PAKDD2016
Multi-dimensional array library (CPU / GPU)
• In charge of concrete calculation of data nodes.
• Heavily depends on BLAS (CPU) or CUDA / CUDA Toolkits
(GPU)
• CPU
• Third-party library: Eigen::Tensor, NumPy
• Scratch: ND4J (DL4J), mshadow (MXNet)
• GPU
• Third-party library: Eigen::Tensor, PyCUDA, gpuarray
• Scratch: ND4J (DL4J), mshadow (MXNet), CuPy (Chainer)
2016/4/19 25DLIF Tutorial @ PAKDD2016
Which device to use?
• GPU is (by far) faster than CPU in most case.
• Most of tensor calculation consists of element-wise calculation,
matrix multiplications and convolutions.
• Exceptional cases
• Difficult to apply mini-batch technique.
• e.g. variable-length training dataset
• e.g. The architecture of NN depends on the training data.
• GPU calculation cannot hide transfer of data to GPU.
• e.g. Minibatch size is too small.
2016/4/19 26DLIF Tutorial @ PAKDD2016
Technology stack of Chainer
cuDNN
Chainer
NumPy CuPy
BLAS
cuBLAS,
cuRAND
CPU GPU
2016/4/19 27DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
Technology stack of TensorFlow
cuDNN
TensorFlow
Eigen::Tensor
BLAS
cuBLAS,
cuRAND
CPU GPU
2016/4/19 28DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
TensorBoard
TF Learn
Technology stack of Theano
CUDA, OpenCL
CUDAToolkit
Theano
BLAS
CPU GPU
2016/4/19 29DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
lib
gpuarray
NumPy
Technology stack of Keras
2016/4/19 30DLIF Tutorial @ PAKDD2016
name
Graphical interface
Machine learning workflow
management
Computational graph
management
Multi-dimensional
array library
Numerical computation
package
Hardware
Keras
TensorFlowTheano
Technology
Stack of Theano
Technology
Stack of TF
Summary
• Most DL frameworks have many components in common and
can be organized as a similar technology stack.
• At upper layer of the stack, frameworks are designed to
support users to follow typical ML workflows.
• At middle layer, manipulations on computational graphs are
automated.
• At lower layer, optimized tensor calculations are
implemented.
• Realization of these components differ between frameworks,
as we will see in the following part.
2016/4/19 31DLIF Tutorial @ PAKDD2016
memorandum
2016/4/19 32DLIF Tutorial @ PAKDD2016
Training of Neural Networks
• L is designed so that its value gets small as the prediction more
“accurate”
• In deep learning context
• L : represented by neural networks
• w : parameters of neural networks
argminw ∑(x, y) L(x, y; w)
w: parameters
x: feature vector
y: training label
L: loss function
e.g.: Classification problem
332016/4/19 DLIF Tutorial @ PAKDD2016
Layer = function + data nodes
• Layers (e.g. Fully connected layer, convolutional layer) can
be considered as a function with parameters to be optimized.
• In most of modern frameworks, parameters of layers can be
considered as data nodes in a computational graph.
• Framework need to be differentiate which data nodes are
parameters to be optimized or data point.
342016/4/19 DLIF Tutorial @ PAKDD2016
Execution Engine
• It calculates the dependency between data node and
schedules the execution of parts of computational graph
(especially in multi-node or multi-GPU setting)
352016/4/19 DLIF Tutorial @ PAKDD2016

More Related Content

PDF
PAKDD2016 Tutorial DLIF: Introduction and Basics
PPT
Notes from 2016 bay area deep learning school
PDF
GTC Japan 2016 Chainer feature introduction
PDF
Differences of Deep Learning Frameworks
PDF
Tokyo Webmining Talk1
PDF
深層学習フレームワーク概要とChainerの事例紹介
PDF
Deep Learning with PyTorch
PPTX
An Introduction to TensorFlow architecture
PAKDD2016 Tutorial DLIF: Introduction and Basics
Notes from 2016 bay area deep learning school
GTC Japan 2016 Chainer feature introduction
Differences of Deep Learning Frameworks
Tokyo Webmining Talk1
深層学習フレームワーク概要とChainerの事例紹介
Deep Learning with PyTorch
An Introduction to TensorFlow architecture

What's hot (20)

PDF
Distributed implementation of a lstm on spark and tensorflow
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
PDF
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
PDF
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
PDF
Deep learning for molecules, introduction to chainer chemistry
PDF
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
PDF
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
PDF
Introduction to TensorFlow
PDF
TensorFlow and Keras: An Overview
PPTX
PyTorch Tutorial for NTU Machine Learing Course 2017
PPTX
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
PDF
Learning stochastic neural networks with Chainer
PDF
Machine Intelligence at Google Scale: TensorFlow
PDF
TensorFlow Dev Summit 2017 요약
PPTX
Neural networks and google tensor flow
PPTX
Keras on tensorflow in R & Python
PPTX
TensorFrames: Google Tensorflow on Apache Spark
PDF
VAE-type Deep Generative Models
PDF
Large Scale Deep Learning with TensorFlow
Distributed implementation of a lstm on spark and tensorflow
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Deep learning for molecules, introduction to chainer chemistry
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Introduction to TensorFlow
TensorFlow and Keras: An Overview
PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
Learning stochastic neural networks with Chainer
Machine Intelligence at Google Scale: TensorFlow
TensorFlow Dev Summit 2017 요약
Neural networks and google tensor flow
Keras on tensorflow in R & Python
TensorFrames: Google Tensorflow on Apache Spark
VAE-type Deep Generative Models
Large Scale Deep Learning with TensorFlow
Ad

Viewers also liked (20)

PDF
Introduction to Chainer and CuPy
PDF
Chainer GTC 2016
PDF
Development and Experiment of Deep Learning with Caffe and maf
PDF
On the benchmark of Chainer
PDF
情報幾何学の基礎、第7章発表ノート
PPTX
Генерация новых продуктов и идей 1.1
PDF
RocksDB meetup
PDF
20161122 gpu deep_learningcommunity#02
PPTX
Artificial general intelligence research project at Keen Software House (3/2015)
PDF
How to Develop Experiment-Oriented Programs
PDF
High-Performance GPU Programming for Deep Learning
PPTX
LSTMで話題分類
PDF
集中不等式のすすめ [集中不等式本読み会#1]
PDF
Caffeインストール
PDF
提供AMIについて
PDF
Techtalk:多様体
PDF
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
PPTX
Enterprise Deep Learning with DL4J
PDF
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
PDF
GPU Accelerated Deep Learning for CUDNN V2
Introduction to Chainer and CuPy
Chainer GTC 2016
Development and Experiment of Deep Learning with Caffe and maf
On the benchmark of Chainer
情報幾何学の基礎、第7章発表ノート
Генерация новых продуктов и идей 1.1
RocksDB meetup
20161122 gpu deep_learningcommunity#02
Artificial general intelligence research project at Keen Software House (3/2015)
How to Develop Experiment-Oriented Programs
High-Performance GPU Programming for Deep Learning
LSTMで話題分類
集中不等式のすすめ [集中不等式本読み会#1]
Caffeインストール
提供AMIについて
Techtalk:多様体
Learning Image Embeddings using Convolutional Neural Networks for Improved Mu...
Enterprise Deep Learning with DL4J
2015年9月18日 (GTC Japan 2015) 深層学習フレームワークChainerの導入と化合物活性予測への応用
GPU Accelerated Deep Learning for CUDNN V2
Ad

Similar to Common Design of Deep Learning Frameworks (20)

PDF
Deep Learning libraries and first experiments with Theano
PDF
Austin,TX Meetup presentation tensorflow final oct 26 2017
PDF
A Platform for Accelerating Machine Learning Applications
PDF
Introduction to Chainer: A Flexible Framework for Deep Learning
PPTX
Deep learning framework
PDF
Neural Networks from Scratch - TensorFlow 101
PPTX
24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm
PDF
Chainer OpenPOWER developer congress HandsON 20170522_ota
PDF
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
PDF
Deep Dive on Deep Learning (June 2018)
PDF
Netflix machine learning
PPTX
BRV CTO Summit Deep Learning Talk
PDF
Introduction to Chainer
PDF
Julien Simon - Deep Dive: Compiling Deep Learning Models
PDF
Towards typesafe deep learning in scala
PPTX
Introduction to deep learning
PDF
Spark and Deep Learning frameworks with distributed workloads
 
PDF
Lecture 4: Deep Learning Frameworks
PDF
Open-Source Frameworks for Deep Learning: an Overview
Deep Learning libraries and first experiments with Theano
Austin,TX Meetup presentation tensorflow final oct 26 2017
A Platform for Accelerating Machine Learning Applications
Introduction to Chainer: A Flexible Framework for Deep Learning
Deep learning framework
Neural Networks from Scratch - TensorFlow 101
24-TensorFlow-Clipper.pptxnjjjjnjjjjjjmm
Chainer OpenPOWER developer congress HandsON 20170522_ota
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Dive on Deep Learning (June 2018)
Netflix machine learning
BRV CTO Summit Deep Learning Talk
Introduction to Chainer
Julien Simon - Deep Dive: Compiling Deep Learning Models
Towards typesafe deep learning in scala
Introduction to deep learning
Spark and Deep Learning frameworks with distributed workloads
 
Lecture 4: Deep Learning Frameworks
Open-Source Frameworks for Deep Learning: an Overview

More from Kenta Oono (14)

PDF
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
PDF
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
PDF
20170422 数学カフェ Part2
PDF
20170422 数学カフェ Part1
PDF
Stochastic Gradient MCMC
PDF
Chainer Contribution Guide
PDF
Introduction to Chainer (LL Ring Recursive)
PDF
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
PDF
Chainerインストール
PDF
ディープラーニング最近の発展とビジネス応用への課題
PDF
Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...
PDF
Deep Learning技術の最近の動向とPreferred Networksの取り組み
PDF
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
PDF
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
Minimax statistical learning with Wasserstein distances (NeurIPS2018 Reading ...
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017
20170422 数学カフェ Part2
20170422 数学カフェ Part1
Stochastic Gradient MCMC
Chainer Contribution Guide
Introduction to Chainer (LL Ring Recursive)
日本神経回路学会セミナー「DeepLearningを使ってみよう!」資料
Chainerインストール
ディープラーニング最近の発展とビジネス応用への課題
Encode勉強会:GENCODE: The reference human genome annotation for The ENCODE Proje...
Deep Learning技術の最近の動向とPreferred Networksの取り組み
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...
NIPS2013読み会:Inverse Density as an Inverse Problem: The Fredholm Equation Appr...

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)

Common Design of Deep Learning Frameworks

  • 1. Tutorial: Deep Learning Implementations and Frameworks Seiya Tokui*, Kenta Oono*, Atsunori Kanemura+, Toshihiro Kamishima+ *Preferred Networks, Inc. (PFN) {tokui,oono}@preferred.jp +National Institute of Advanced Industrial Science and Technology (AIST) atsu-kan@aist.go.jp, mail@kamishima.net 1
  • 2. Overview of this tutorial •1st session (KO, 8:30 ‒ 10:00) •Introduction •Basics of neural networks •Common design of neural network implementations •2nd session (ST, 10:30 ‒ 12:30) •Differences of deep learning frameworks •Coding examples of frameworks •Conclusion
  • 3. Common Design of Deep Learning Frameworks Kenta Oono <oono@preferred.jp> Preferred Networks Inc. 2016/4/19 3DLIF Tutorial @ PAKDD2016
  • 4. Objective of this part • How deep learning frameworks represent various neural networks. • How deep learning frameworks realize the training procedure of neural networks. • Technology stack that is common to most of deep learning frameworks. 2016/4/19 4DLIF Tutorial @ PAKDD2016
  • 5. Steps for training neural networks Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the Neural Network (NN) parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters 2016/4/19 5DLIF Tutorial @ PAKDD2016
  • 6. Technology stack of DL framework name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU 2016/4/19 6DLIF Tutorial @ PAKDD2016
  • 7. Technology stack of DL framework 2016/4/19 7DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 8. Neural Network as a Computational Graph • In simplest form, NN is represented as a computational graph (CG) that is a stack of bipartite DAGs (Directed Acyclic Graph) consisting of data nodes and operator nodes. y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 data node operator node 2016/4/19 8DLIF Tutorial @ PAKDD2016
  • 9. Example: Multi-layer Perceptron (MLP) x Affine W1 b1 h1 ReLU a1 Affine W2 b2 h2 ReLU a2 Soft max y Cross Entropy Lo ss t It is choice of implementation if CG includes weights and biases. 2016/4/19 9DLIF Tutorial @ PAKDD2016
  • 10. Example: Recurrent Neural Network (RNN) x1 RNN Unit h1 RNN Unit x2 h2 RNN Unit xT h0 ・・・ hT RNN unit can be : • Affine + activation function • LSTM (Long Short-Term Memory) • GRU (Gated Recurrent Unit) x h y xt ht-1 ht W b 2016/4/19 10DLIF Tutorial @ PAKDD2016
  • 11. Example: Stacked RNN x1 RNN Unit h1 RNN Unit x2 h2 RNN Unit xT h0 ・・・ hT RNN Unit z1 RNN Unit z2 RNN Unit z0 ・・・ zT Soft max Affine y 2016/4/19 11DLIF Tutorial @ PAKDD2016
  • 12. Example: RNN with control flow nodes loop enter s i predic ate pr ed s h0 x switch s RNN Unit s’update loop end y pred=True pred=False • TensorFlow has control flow nodes (e.g. cond, switch, while) • As CG has a loop, some mechanism is necessary that resolves he dependency of nodes to schedule the order of calculation. W b 2016/4/19 12DLIF Tutorial @ PAKDD2016
  • 13. Automatic Differentiation • Computes gradient of some specified data nodes (e.g. loss) with respect to each data node. • Each operator node must have backward operation to calculate gradients w.r.t. its inputs from gradients w.r.t. its outputs (realization of chain rule). • e.g. Function class of Chainer has backward method. • e.g. Each layer classes of Caffe has Backward_cpu and Backward_gpu methods • e.g. Autograd has a thin wrapper that adds gradient methods as a closure to most of NumPy methods. 2016/4/19 13DLIF Tutorial @ PAKDD2016
  • 14. Backprop through CG ∇y z∇x1 z ∇z z = 1 y = x1 * x2 z = y - x3 x1 mul suby x3 z x2 2016/4/19 14DLIF Tutorial @ PAKDD2016
  • 15. Backprop as extended graphs x1 mul suby x3 z x2 dzid neg mul mul dy dx 3 dx 1 dx 2 forward propagation backward propagation y = x1 * x2 z = y - x3 2016/4/19 15DLIF Tutorial @ PAKDD2016
  • 16. Example: Theano 2016/4/19 16DLIF Tutorial @ PAKDD2016
  • 17. Technology stack of DL framework 2016/4/19 17DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 18. Numerical optimizer • Many gradient-based optimization algorithms are implemented. • Stochastic Gradient Descent (SGD) is implemented in most DL frameworks. • It depends on concrete tasks which optimizer works best. w: parameters of neural network θ: states of optimizer L: loss function Γ: optimizer-specific function initialize w, θ until meet the criteria: get data (x, y) calculate ∇w L(x, y; w) w, θ ← Γ(w, θ, ∇w L) 2016/4/19 18DLIF Tutorial @ PAKDD2016
  • 19. Serialization • Save/Load the snapshot of training process in specified format (e.g. hdf5, npz, protobuf) • Models to be trained (= architectures and parameters of NNs) • States of training procedure (e.g. epoch, learning rate, momentum) • Serialization enhance the portability of models. • Publish pre-trained model (e.g. Model Zoo (Caffe), MXNet, TensorFlow) • Import pre-trained model of other DL frameworks • e.g. Chainer supports BVLC-official reference models of Caffe. 2016/4/19 19DLIF Tutorial @ PAKDD2016
  • 20. Computational optimizer • Convert CGs to make them simplified and efficient. e.g. Theano y = x1 * x2 z = y - x3 2016/4/19 20DLIF Tutorial @ PAKDD2016
  • 21. Abstraction of ML workflow • Offers typical training/validation/evaluation procedures as APIs. • Users should call a single API and do not have to write the procedure manually. • e.g. fit, evaluate methods of Model class in Keras. 2016/4/19 21DLIF Tutorial @ PAKDD2016 Prepare the training dataset Repeat until meeting some criterion Prepare for the next (mini) batch Compute the loss (forward prop) Initialize the Neural Network (NN) parameters Save the NN parameters Define how to compute the loss of this batch Compute the gradient (backprop) Update the NN parameters
  • 22. Graphical interface • Computational graph management • Editor, Visualizer • Visualization of training procedure • Visualization of feature maps, output of NNs etc. • Transition of error and accuracy • Performance monitor • e.g. Throughput, latency, memory usage 2016/4/19 22DLIF Tutorial @ PAKDD2016
  • 23. Technology stack of DL framework 2016/4/19 23DLIF Tutorial @ PAKDD2016 name functions example Graphical interface DIGITS, TensorBoard Machine learning workflow management Dataset Management Training Loop Keras, Lasagne Blocks, TF Learn Computational graph management Build computational graph Forward prop/Backprop Theano, TensorFlow Torch.nn Multi-dimensional array library Linear algebra NumPy, CuPy Eigen, torch (core) Numerical computation package Matrix operation Convolution BLAS, cuBLAS, cuDNN Hardware CPU, GPU
  • 24. GPU support • CUDA: Computing platform for GPGPU on NVIDIA GPU • language extension, compiler, library etc. • DL frameworks prepare wrappers for CUDA. • GPU-array library that utilizes cuBLAS, cuRAND etc. • Layer implementation with cuDNN (e.g. Convolution, sigmoid, LSTM) • Designed to switch CPU and GPU easily. • e.g. Users can write CPU-GPU agnostic code. • e.g. Switch CPU/GPU with environment variables. • Some framework supports Open CL as a GPU environment, but CUDA is more popular for now. 2016/4/19 24DLIF Tutorial @ PAKDD2016
  • 25. Multi-dimensional array library (CPU / GPU) • In charge of concrete calculation of data nodes. • Heavily depends on BLAS (CPU) or CUDA / CUDA Toolkits (GPU) • CPU • Third-party library: Eigen::Tensor, NumPy • Scratch: ND4J (DL4J), mshadow (MXNet) • GPU • Third-party library: Eigen::Tensor, PyCUDA, gpuarray • Scratch: ND4J (DL4J), mshadow (MXNet), CuPy (Chainer) 2016/4/19 25DLIF Tutorial @ PAKDD2016
  • 26. Which device to use? • GPU is (by far) faster than CPU in most case. • Most of tensor calculation consists of element-wise calculation, matrix multiplications and convolutions. • Exceptional cases • Difficult to apply mini-batch technique. • e.g. variable-length training dataset • e.g. The architecture of NN depends on the training data. • GPU calculation cannot hide transfer of data to GPU. • e.g. Minibatch size is too small. 2016/4/19 26DLIF Tutorial @ PAKDD2016
  • 27. Technology stack of Chainer cuDNN Chainer NumPy CuPy BLAS cuBLAS, cuRAND CPU GPU 2016/4/19 27DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware
  • 28. Technology stack of TensorFlow cuDNN TensorFlow Eigen::Tensor BLAS cuBLAS, cuRAND CPU GPU 2016/4/19 28DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware TensorBoard TF Learn
  • 29. Technology stack of Theano CUDA, OpenCL CUDAToolkit Theano BLAS CPU GPU 2016/4/19 29DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware lib gpuarray NumPy
  • 30. Technology stack of Keras 2016/4/19 30DLIF Tutorial @ PAKDD2016 name Graphical interface Machine learning workflow management Computational graph management Multi-dimensional array library Numerical computation package Hardware Keras TensorFlowTheano Technology Stack of Theano Technology Stack of TF
  • 31. Summary • Most DL frameworks have many components in common and can be organized as a similar technology stack. • At upper layer of the stack, frameworks are designed to support users to follow typical ML workflows. • At middle layer, manipulations on computational graphs are automated. • At lower layer, optimized tensor calculations are implemented. • Realization of these components differ between frameworks, as we will see in the following part. 2016/4/19 31DLIF Tutorial @ PAKDD2016
  • 33. Training of Neural Networks • L is designed so that its value gets small as the prediction more “accurate” • In deep learning context • L : represented by neural networks • w : parameters of neural networks argminw ∑(x, y) L(x, y; w) w: parameters x: feature vector y: training label L: loss function e.g.: Classification problem 332016/4/19 DLIF Tutorial @ PAKDD2016
  • 34. Layer = function + data nodes • Layers (e.g. Fully connected layer, convolutional layer) can be considered as a function with parameters to be optimized. • In most of modern frameworks, parameters of layers can be considered as data nodes in a computational graph. • Framework need to be differentiate which data nodes are parameters to be optimized or data point. 342016/4/19 DLIF Tutorial @ PAKDD2016
  • 35. Execution Engine • It calculates the dependency between data node and schedules the execution of parts of computational graph (especially in multi-node or multi-GPU setting) 352016/4/19 DLIF Tutorial @ PAKDD2016