Comparison of deep learning frameworks from a viewpoint of double backpropagation

Comparison of deep learning
frameworks from a viewpoint of
double backpropagation
Preferred Networks, Inc.
Kenta Oono <oono@preferred.jp>
Chainer Meetup #6@Preferred Networks
Sep. 30th 2017
1

Agenda
• Technological stack of DL frameworks
• Design choice in DL frameworks
• Double backprop primer
• Coding examples of double backprop in Chainer,
PyTorch, and TF
2

Technology stack of a DL framework
name functions example
Graphical visualization DIGITS, TensorBoard
Machine learning workflow
management
Dataset prep, Save/Load
Training loop
Keras, TF slim
Computational graph(CG)
management
Build/Optimize CGs
Forward/Back prop
Theano, TensorFlow
Torch.nn
Multi-dimensional
array processing
High-level array
manipulation
NumPy, CuPy
Eigen, Torch (core)
Numerical computation Matrix operation
Convolution
BLAS(OpenBLAS, MKL),
cuBLAS, cuDNN, MKL
DNN
Computational device CPU, GPU, TPU, FPGA
3

Technology stack of Chainer
cuDNN
Chainer
NumPy CuPy
BLAS
cuBLAS,
cuRAND
CPU GPU
4
name
Graphical visualization
management
Computational graph
management
Multi-dimensional
array processing
Numerical computation
Computational device

Technology stack of TensorFlow
cuDNN
TensorFlow
Eigen::Tensor
BLAS
cuBLAS,
cuRAND
CPU GPU
5
TensorBoard
TF slim
Keras
name
management
Computational graph
management
Multi-dimensional
array processing

Technology stack of Theano
CUDA, OpenCL
CUDAToolkit
Theano
BLAS
CPU GPU
6
lib
gpuarray
NumPy
Keras, Lasagne, Blocks, etc.
name
management
Computational graph
management
Multi-dimensional
array processing

Technology stack of Keras
7
Keras
TensorFlowTheano
Technology
Stack of Theano
Technology
Stack of TF
name
management
Computational graph
management
Multi-dimensional
array processing

12
Important Design Choices
through user’s typical workflow
Write NNs
(in which language?)
Compute backprop
(how?)
Update parameters
(how to represent?)
(how to update?)
Run user codes
(when?)
Optimize CG
(how?)
Scale up training
(how?)
Coding Execution Improvement

Important Design Choices
through user’s typical workflow
Write NNs
(in which language?)
Compute backprop
(how?)
Update parameters
(how to represent?)
(how to update?)
Run user codes
(when?)
Coding Execution Improvement
Optimize CG
(how?)
Scale up training
(how?)
13

http://bit.ly/aaai-dlif
14

Neural Network as a Computational Graph
• In most frameworks, NN is conceptualized as a computational graph (CG).
• The simplest form of CG is a bipartite DAG (Directed Acyclic Graph)
consisting of data nodes and operator nodes.
y = x1 * x2
z = y - x3
x1 mul suby
x3
z
x2
data node
operator node
15

Multi Layer Perceptron (MLP)
x Affine
W1 b1
h1 ReLU a1
Affine
W2 b2
h2 ReLU a2
Soft
max
prob
Cross
Entropy
loss
t 16

How to compute backprop
Backprop through graphs
Framework only builds graphs of
forward prop, and do backprop
by backtracking the graphs.
E.g. Torch.nn, Caffe
Backprop as extended graphs
Framework builds graphs for
backprop as well as those for
forward prop.
E.g. Theano, MXNet, TensorFlow,
Chainer, PyTorch
a mul suby
c
z
b
a mul suby
c
z
b
gzid
neg
mul
mul
gy
gc
ga
gb
∇y z∇a z ∇z z = 1
17

How to compute backprop
Backprop through graphs
Easy and simple to implement
Backprop computation need not
be defined as graphs.
Low flexibility
Features available for graphs may
not apply to backprop
computations.
Backprop as extended graphs
Implementation gets complicated
High flexibility
Any features available for graphs can
also be applied to backprop
computations (e.g. backprop of
backprop).
18

Double backprop
x F z
y
・・・ L
class F(FunctionNode):
def forward(self, x, y):
return x * x + y
def backward(self, x, y, gz):
return 2 * gz * x, gz
NumPy, CuPy
Note: The interface is simplified from actual implementation.
chainer.Variable
-> Creates CG
19

Double backprop
x F z
y
gx Grad F gz
gy
・・・ L
Backprop!
=∂L/∂z=∂L/∂x
=∂L/∂y
1.0
=∂L/∂L
Mul
x
gz
y
gx
gy
*2
20

Double backprop
x F z
y
gx Grad F 1.0
gy
Backprop!
=∂z/∂x
=∂z/∂y 21

Double backprop
x F z
y
gx
Grad F1.0
gy
22

Double backprop
x Mul z
y
gx
Grad F1.0
gy
Backprop!
1.0
Double
Grad F
ggx
=∂2z/∂x2
23

Double backprop
x f z
Computes the differentiation of L = G(f(x), ∇f(x)) with respect to x
L = G(f(x), ∇f(x))
24

Double backprop
x f z
gxGrad f
L = G(f(x), ∇f(x))
25

Double backprop
x f z
gxGrad f
・・・ L
L = G(f(x), ∇f(x))
26

Double backprop
x f z
gxGrad f
・・・ L
Backprop!
ggx
Double
Grad f
∂L/∂x
1.0gzGrad f
L = G(f(x), ∇f(x))
27

Example (Chainer)
http://bit.ly/2wpEzO5
28

Conclusion
• Several DL frameworks have similarity in their
structure
• Difference in choice of design determines capability
of frameworks
• Introduction of double backprop and toy examples
in several frameworks.
31

Comparison of deep learning frameworks from a viewpoint of double backpropagation

More Related Content

What's hot (20)

Similar to Comparison of deep learning frameworks from a viewpoint of double backpropagation (20)

More from Kenta Oono (20)

Recently uploaded (20)

Comparison of deep learning frameworks from a viewpoint of double backpropagation