Invertible residual networks Review

Invertible Residual Networks
박수철 
모두의연구소  
풀잎스쿨 Deep Generative Models
Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen 
David Duvenaud, Jörn-Henrik Jacobsen

Resnets
Fθ = I + gθ
Kaiming He et al. Deep Residual Learning for Image Recognition. 2015

1. Layer가 Invertible해야 한다.
Resnet구조를 사용하면서 non-linear function에 Lipshitz constraint 부여 
2. Inverse를 구할 수 있어야 한다.
Banach fixed-point thorem에 근거해 iteration을 통해 근사 
3. Layer의 log-determinant를 쉽게 구할 수 있어야 한다.
log-determinant를 matrix logarithm의 trace로 표현하고 이를 근사하여 구
함
Flow Models에 필요한 조건

Flow Models에 필요한 조건

1.Layer가 Invertible해야 한다.

Sufficient condition for invertible ResNets
Jens Behrmann et al. Invertible Residual Networks. 2019

Domain
Image
x
x’
f(x’)
f(x)
||x-x’||
||f(x)-f(x’)||
Lipshitz Constant, Lipshitz Norm
Sufficient condition for invertible ResNets
Takeru Miyato et al. Spectral Normalization for Generative Adversarial Networks. 2018

Satisfying the Lipschitz Constraint
1. Relu, ELU, tanh 등의 non-linear activation들은 이미
Lipschitz constraint 만족 
2. Matrix multiplication으로 표현되는 dense layer,
convolution layer들은 weight matrix를 largest singular
value로 나누어 normalization하는 것으로 Lipschitz
constraint를 만족시킬 수 있다.

https://en.wikipedia.org/wiki/Singular_value_decomposition
M: an arbitrary m × n matrix
U: m×m unitary(orthogonal)
matrix
Σ: m×n diagonal matrix with
non-negative real numbers
on the diagonal
V: n×n unitary(orthogonal)
matrix 
Singular Value Decomposition

https://en.wikipedia.org/wiki/Singular_value_decomposition
Singular Value Decomposition

Weight Normalization

Finding the largest singular value
Singular Value Decomposition을 수행하는데 O(D^3)의 연산량이 요
구되지만 다음과 같은 알고리즘에 의해 근사가 가능하다.

2. Inverse를 구할 수 있어야 한다.

Inverse of i-Resnet Layer
Banach fixed-point theorem

Inverse of i-Resnet Layer

3. Layer의 log-determinant를
쉽게 구할 수 있어야 한다.

Log-determinant of Jacobian
ln px(x) = ln pz(z) + ln det JF(x)
= tr(ln JF)
ln det JF(x) = ln det JF(x)
= tr(ln(I + Jg(x)))
=
∞
∑
k=1
(−1)k+1
tr(Jk
g)
k
Change of variable
by Lipshitz constraint
by Withers & Nadarajah (2010)
for z = F(x) = (I + g)(x)
by definition

Hall, B. C. Lie groups, lie algebras, and representations: An elementary introduction. Graduate Texts in Mathematics,
222 (2nd ed.), Springer, 2015.
Complex Logarithm
Matrix Logarithm

ln det JF(x) =
∞
∑
k=1
(−1)k+1
tr(Jk
g)
k
Problems
1. 를 구하는데 O(d^2) 연산량 필요
2. Jacobian matrix 자체를 를 구하기 어렵다.
3. 무한 수열의 합을 계산해야한다.
tr(Jk
g)
Jk
g
Solutions
1.2. Deep learning framework에서 제공하는 automatic differentiation을
이용해서 vector-Jacobian 을 구하고 이를 이용해 matrix trace를
stochastic approximation함.
3. 임의의 index n까지만 계산하여 근사한다. 이로인해 biased estimator가 되지
만 에러가 bound됨을 증명.
vT
Jg

tr(A) = 𝔼p(v) [vT
Av]
Hutchinsons trace estimator
, where A ∈ ℝd×d
, 𝔼[v] = 0, Cov(v) = I

Implementation of  
backpropagation
https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/deep_learning/layers.py
∂L
∂y
∂y
∂x
y = Wx + b
∂y
∂x
= W

에서 vector 를 샘플링하고 원하는 layer의 backward 입력으로
넣으면 vector-Jacobian 을 구할 수 있다.
p(v) v
vT
Jg
Layer g Layer g
x
y = g(x) v ∼ p(v)
Forward Backward
vT
Jg

처음에 로 초기화되고 그다음부터는 의 값을 가짐vT
wT
Jg
결국 값을 갖게 되고 를 근사한다.vT
Jk
gv tr(Jk
g)

비교

결과

Invertible residual networks Review

More Related Content

What's hot (13)

Similar to Invertible residual networks Review (20)

Invertible residual networks Review