Deep generative model.pdf

DEEP GENERATIVE MODELS
VAE & GANs TUTORIAL
조형주
DeepBio
DeepBio 1

DEEP GENERATIVE MODELs
DeepBio 2

WHAT IS GENERATIVE MODEL
(x) = g (z)
https://blog.openai.com/generative‐models/
pθ^ θ
DeepBio 3
Z
'
.
feature
space
.
l Assumption
)
for Connectionist .
%
TZMTTYVCT
CNOTSAMPLINCT )

WHY GENERATIVE
The new way of simulating applied math/engineering domain
Combining with Reinforcement Learning
Good for semi‐supervised learning
Can work with multi‐modal output
Can make data with realitic generation
DeepBio 4
ZAN 's TALK W wynsrolb ,
& mmm

https://blog.openai.com/generative‐models/
DeepBio 5
ROMANLEFORKZNGRATZEALZANE
Hfkf
.
| stziotvum BIT take UKA httonl HE 81 HIM .

TAXONOMIC TREE OF GENERATIVE MODEL
GAN_Tutorial from Ian Goodfellow
DeepBio 6
nonsense
'M
µ%%aHk
.
(
www.WN
)
⇒TA(
zuqinyn .
RBM
CGAMPLZNCT)

Generative model
(x) = g (z)
Let z ∼ N(0, 1)
Let g be a neural networks with transpose convolutional layers
﴾So Nice !!﴿
x ∼ X : MNIST dataset
L2 Loss ﴾Mean Square Error﴿
p^θ θ
DeepBio 8
FEATURE SPACE
PARAMHZZZFD BY O
=)
MA×2MUMw4L2k2e2H=
if font I -
No , E)
p(Ylx7 ~
N(g(Ho)
,
f) -8440) :
learner parfait .
LIPIO ) =
by Tlpcai , yi )
=
Ggtplyiiai ) Paci ) =6gTNYiK) +6ft Ma )
argmoaxtulokartmfncogtplesilnt =
hytrstrexp thrashing
=
-
Nloy FE6 -
Ira ( Itt -
genital P )

Generator ﴾TF‐code﴿
DeepBio 9
¥ '
#
"
Enge .me#..

Results ...
Maybe we need more conditions...
DeepBio 10
so MZTHZNCT WRONCT .
z GGHO)
:

VARIATIONAL AUTO‐ENCODER
DeepBio 11

Notations
x : Observed data, z : Latent variable
p(x) : Evidence, p(z) : Prior
p(x∣z) : Likelihood, p(z∣x) : Posterior
Probabilistic Model Difined As Joint Distribution of x, z
p(x, z)
DeepBio 12
I
t.FM#yfnsPH=niaeH=
#
eK=xd=
Fj
p(y=yzA=yrji
G-
§
no
,
r
;
s
Fini,plrx-aifsEPk-aiiEahPCZZ1zIPCx.nayt.ls@TkneNZ-zslx.x
;)
=
I.
Pixar , 2- =
th =
¥ =
¥ Fu
=P IEE
,
I # ni ) Paxil
PRIME

Model
p(x, z) = p(x∣z)p(z)
Our Interest Is Posterior !!
p(z∣x)
p(z∣x) = : Infer Good Value of z Given x
p(x) = p(x, z)dz = p(x∣z)p(z)dz
p(x) is Hard to calculate﴾INTRACTABLE﴿
Approaximate Posterior
p(x)
p(x∣z)p(z)
∫ ∫
DeepBio 13
4th =
HE rhueirjsinry 02.2 Latent variable
2 't 3h17 , "
I did ?
*t#N
Bayesian
Znfercnce
£ OBERVABUE
.
ZMP2Rzc#=
( BAYHZAN )
( PRODUCT RULE )
SAMMY
gMY#*oHAstw4HARD JOB 't
guys 's
.si#PuN4
:

Variational Inference
Pick a familiy of distributions over the latent variables with its
own variational parameters,
q (z∣x)
Find ϕ that makes q close to the posterior of interest
ϕ
DeepBio 14
www.vpaste.MN @
0 At KNT DON'T Access )
PARAMZRZZZD
¢
we
-0
/
-
Measure
⇒
USZNCT Vz
,
SAMPLZNCT PROBLEM
maw
"
pqootp → OPTIMZZZNLT PROBLEM ,
y →
gfor
gaussian , ( µ ,
r )
for uniform ,
( *min
, * max
)
i.

KULLBACK LEIBLER DIVERGENCE
Only if Q(i) = 0 implies P(i) = 0, for all i,
Measure of the non‐symmetric difference between two
probability distributions P and Q
KL(P∣∣Q) = p(x) log dx∫
q(x)
p(x)
= p(x) log p(x)dx − p(x) log q(x)dx∫ ∫
DeepBio 15
Pe > Qc
c)
equivalent
.
=) Q > P at Ewan th Kee
Pnt '
Ehyiohf %Z Malek 3h
we ioy M 39
,
Qcij '
t o
4mL
Ki ) '
to Terminator .
fiercely=) KENTROPY -
ENTROPY
"
BECAUSE 67 ENTROPY
,
NWTYMIETRK
ENTROPY = UNCERTAINTY

Property
The Kullback Leibler divergence is always non‐negative,
KL(P∣∣Q) ≥ 0
DeepBio 16
kl ( MIQ ) =/ pimhg My dn
Hmm @tTH )

Proof
X − 1 ≥ log X ⇒ log ≥ 1 − X
Using this,
X
1
KL(P∣∣Q) = p(x) log dx∫
q(x)
p(x)
≥ p(x) 1 − dx∫ (
p(x)
q(x)
)
= {p(x) − q(x)}dx∫
= p(x)dx − q(x)dx∫ ∫
= 1 − 1 = 0
DeepBio 17
¥t¥Ia⇒#i* ⇒
"
:# * ,
Lee
: :

Relationship with Maximum Likelihood Estimation
For minimizaing KL Divergence,
ϕ = argmin − p(x) log q(x; ϕ)dx
KL(P∣∣Q; ϕ) = p(x) log dx∫
q(x; ϕ)
p(x)
= p(x) log p(x)dx − p(x) log q(x; ϕ)dx∫ ∫
∗
ϕ ( ∫ )
DeepBio 18
PC a) =
qlk ; 0 ) ⇒ KLCPHQ ) -0
.

Maximizing Likelihood is equivalent to minimizing KL
Divergence
ϕ∗
= argmin − p(x) log q(x; ϕ)dxϕ ( ∫ )
= argmax p(x) log q(x; ϕ)dxϕ ∫
= argmax E [log q(x; ϕ)]ϕ x∼p(x;ϕ)
≊ argmax Σ log q(x ; ϕ)ϕ [
N
1
i
N
i ]
DeepBio 19
¥
III?Iki :
:*;¥a
-
LOLTLZKZLZHOLOD

JENSEN'S INEQUALITY
For Concave Function, f(E[x]) ≥ E[f(x)]
For Conveax Function, f(E[x]) ≤ E[f(x)]
DeepBio 20
An c.
Norte
.#T¥±i:#⇒fftn )

Evidence Lower BOund
log p(x) = log p(x, z)dx∫
z
= log p(x, z) dx∫
z q(z)
q(z)
= log q(z) dx∫
z q(z)
p(x, z)
= log E dxq
q(z)
p(x, z)
≥ E [log p(x, z)] −E [log q(z)]q q
DeepBio 21
of i. WZU KNOWN PROBABZLISTZC
Xdz
DZSTRZBUTZON
-
÷of
www.?EeumgD*µ÷⇒=***n⇒×
dz
-
zefso
LOCTPCHE
2-430am on at .

Variational Distribution
q (z∣x) = argmin KL(q (z∣x)∣∣p (z∣x))
Choose a family of variational distributions﴾q﴿
Fit the parameter﴾ϕ﴿ to minimize the distance of two
distribution﴾KL‐Divergence﴿
ϕ
∗
ϕ ϕ θ
DeepBio 22
go.fi#ttI
( RZVZRSZ KL DWERCTZNEE
)

KL Divergence
KL(q (z∣x)∣∣p (z∣x))ϕ θ =E logqϕ
[
p (z∣x)θ
q (z∣x)ϕ
]
=E log q (z∣x) − log p (z∣x)qϕ
[ ϕ θ ]
=E log q (z∣x) − log p (z∣x)qϕ
[ ϕ θ
p (x)θ
p (x)θ
]
=E log q (z∣x) − log p (x, z) + log p (x)qϕ
[ ϕ θ θ ]
=E [log q (z∣x) − log p (x, z)] + log p (x)qϕ ϕ θ θ
DeepBio 23
1KZVERSE)

Object
q (z∣x) = argmin E log q (z∣x) − log p (x, z) + log p (x)
q (z∣x) is negative ELBO plus log marginal probability of x
log p (x) does not depend on q
Minimizing the KL divergence is the same as maximizing the
ELBO
q (z∣x) = argmax ELBO
ϕ
∗
ϕ [ qϕ
[ ϕ θ ] θ ]
ϕ
∗
θ
ϕ
∗
ϕ
DeepBio 24
frtoolmyttmmee
MZNZMZZZKL 7430
→•
→
hfpdn )
6
mm
-
EUBO

Variational Lower Bound
For each data point x , marginal likelihood of individual data pointi
log p (x )θ i ≥ L(θ, ϕ; x )i
=E − log q (z∣x ) + log p (x , z)q (z∣x )ϕ i
[ ϕ i θ i ]
=E log p (x ∣z)p (z) − log q (z∣x )q (z∣x )ϕ i
[ θ i θ ϕ i ]
=E log p (x ∣z) − (log q (z∣x ) − log p (z))q (z∣x )ϕ i
[ θ i ϕ i θ ]
=E log p (x ∣z) −E logq (z∣x )ϕ i
[ θ i ] q (z∣x )ϕ i
[(
p (z)θ
q (z∣x )ϕ i
)]
=E log p (x ∣z) − KL q (z∣x )∣∣p (z)q (z∣x )ϕ i
[ θ i ] (( ϕ i θ ))
DeepBio 25
EUBO
Infarct
IT
a
- µAxvM2#
⇒ KLBIMZMMH )
#yq# EKKAVGATA

ELBO
L(θ, ϕ; x ) =E log p (x ∣z) − KL q (z∣x )∣∣p (z)
q (z∣x ) : proposal distribution
p (z) : prior ﴾our belief﴿
How to Choose a Good Proposal Distribution
Easy to sample
Differentiable ﴾∵ Backprop.﴿
i q (z∣x )ϕ i
[ θ i ] (( ϕ i θ ))
ϕ i
θ
DeepBio 26
n
posterior approximate
→ Earth 4h .
) → CTAVKZAN

Maximizing ELBO ‐ I
L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z)
ϕ = argmax E log p(x ∣z)
E log p(x ∣z) : Log‐Likelihood ﴾NOT LOSS﴿
Maximize likelihood for maximizing ELBO ﴾NOT MINIMIZE!!﴿
i q (z∣x )ϕ i
[ i ] (( ϕ i ))
∗
ϕ q (z∣x )ϕ i
[ i ]
q (z∣x )ϕ i
[ i ]
DeepBio 27
( Lott ruklrtloob )

Log Likelihood
In case of Bernoulli distribution p(x∣z) is,
E log p(x∣z) = x log p(y ) + (1 − x ) log(1 − p(y ))
For maximize it, minimize Negative Log Likelihood !!
Loss = − [x log( ) + (1 − x ) log(1 − )]
Already know as Sigmoid Cross‐Entropy
is output of Decoder
We call it Reconstructure Loss
q (z∣x)ϕ
i=1
∑
n
i i i i
n
1
i=1
∑
n
i xî i xî
xî
DeepBio 28
normalisation
I
L
f :
the
output
is 4 ,
i ] )
* zl
£CH
( or Binomial Cross
Entropy )
Zn ale of Faustian distribution ,
loss
= L 2 los } ( Mk )

Maximizing ELBO ‐ II
L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z)
ϕ = argmin KL q (z∣x )∣∣p(z)
Assume that prior and posterior approaximation are Gaussian
﴾actually it's not a critical issue...﴿
Then we can use KL Divergence according to definition
Let prior be N(0, 1)
How about q (z∣x ) ?
i q (z∣x )ϕ i
[ i ] (( ϕ i ))
∗
ϕ (( ϕ i ))
ϕ i
DeepBio 29

Posterior
Posterior approaximation is Gaussian,
q (z∣x ) = N(μ , σ )
where, (μ , σ ) is the output of Encoder
ϕ i i i
2
i i
DeepBio 30
if dimofznto
⇒
Nof µ ,
6 =@
• •
• •

Minimizing KL Divergence
KL(q (z∣x)∣∣p(z)) = q (z) log q (z)dz − q (z) log p(z)dz
q (z) log q (z∣x)dz = N(μ , σ ) log N(μ , σ )dz
= − log 2π − (1 + log σ )
q (z) log p(z)dz = N(μ , σ ) log N(0, 1)dz
= − log 2π − (μ + σ )
Therefore,
KL(q (z∣x)∣∣p(z)) = 1 + log σ − μ − σ
ϕ ∫ ϕ ϕ ∫ ϕ
∫ ϕ ϕ ∫ i i
2
i i
2
2
N
2
1
∑N
i
2
∫ ϕ ∫ i i
2
2
N
2
1
∑N
i
2
i
2
ϕ
2
1
∑
N
[ i
2
i
2
i
2
]
DeepBio 31
for
)"EFFI.
! Basic format

AUTO‐ENCODER
Encoder : MLPs to Infer (μ , σ ) for q (z∣x )
Decoder : MLPs to Infer using latent variables ∼ N(μ, σ )
Is it differentiable? ﴾ = possible to backprop?﴿
i i ϕ i
x^ 2
DeepBio 32
J I

REPARAMETERIZATION TRICK
Tutorial on Variational Autoencoders
DeepBio 33
NOT ABLE To 0
BACKPAY -
Now , sampling process is
independent
To the model .
1- → D k ) I GAMPLENLT ( not ✓ armpit )
( Tust constant )

Latent Code
batch_size = 32
rand_dim = 50
z = tf.random_normal((batch_size, rand_dim))
Data load
# MNIST input tensor ( with QueueRunner )
data = tf.sg_data.Mnist(batch_size=32)
# input images
x = data.train.image
DeepBio 34
# All code is written
using Sugar
-
tensor
,
KF wrapper)
# number of 2- variables
# normal distribution

Encoder
# assume that std = 1
with tf.sg_context(name='encoder', size=4, stride=2, act='relu'):
    mu = (x
          .sg_conv(dim=64)
          .sg_conv(dim=128)
          .sg_flatten()
          .sg_dense(dim=1024)
          .sg_dense(dim=num_dim, act='linear'))

# re‐parameterization trick with random gaussian
z = mu + tf.random_normal(mu.get_shape())
DeepBio 35
# down
sampling Ya
# down
sanplny 112
A MPs
# MLPS
# assume that 6=1

Decoder
with tf.sg_context(name='decoder', size=4, stride=2, act='relu'):
    xx = (z
          .sg_dense(dim=1024)
          .sg_dense(dim=7*7*128)
          .sg_reshape(shape=(‐1, 7, 7, 128))
          .sg_upconv(dim=64)
          .sg_upconv(dim=1, act='sigmoid'))
DeepBio 36
) # MPs
→ reshape to 4h -
tensor
# transpose convnet
If transpose
com net

Losses
loss_recon = xx.sg_mse(target=x, name='recon').sg_mean(axis=[1,
loss_kld = tf.square(mu).sg_sum(axis=1) / (28 * 28)
tf.sg_summary_loss(loss_kld, name='kld')
loss = loss_recon + loss_kld * 0.5
DeepBio 37
yla
loss
( , ⇒ ⇒ asset

Train
# do training
tf.sg_train(loss=loss, log_interval=10, ep_size=data.train.num_batch,
save_dir='asset/train/vae')
DeepBio 38

Results
DeepBio 39
BLURRY ZMAFZ

Features
Advantage
Fast and Easy to train
We can check the loss and evaluate
Disadvantage
Low Quality
Even though q reached the optimal point, it is quite different with p
Issues
Reconstruction loss ﴾x‐entropy, L1, L2, ...﴿
MLPs structure
Regularizer loss ﴾sometimes don't use log, sometimes use exp, ...﴿
...
DeepBio 40

GENERATIVE ADVERSARIAL NETWORKS
DeepBio 41

DeepBio 42terry
vm 's facebook
page
.

Value Function
min max V (D, G)
=E [log D(x)] +E [log(1 − D(G(z)))]
For second term, E [log(1 − D(G(z)))]
D want to maximize it → Do not fool
G want to minimize it → Fool
G D
x∼p (x)data z∼p (z)z
z∼p (z)z
DeepBio 45
D

Example
DeepBio 46
around trunk
for tlznit for fixed 'T Zteration
~ → -
,
Is

Global Optimulity of p = p
D (x) =
note that 'FOR ANY GIVEN generator G'
g data
G
∗
p + p (x)data g
p (x)data
DeepBio 47
olaphoz of Ham CT4 output 't original data st Foot 2tt .
14

Proof
For G fixed,
V (G, D) = p (x) log(D(x))dx + p (z) log(1 − D(G(z))dz
= p (x) log(D(x)) + p (x) log(1 − D(x))dx
Let X = D(x), a = p (x), b = p (x). So,
V = a log X + b log(1 − X)
Find X which can maximize the value function V .
∇ V
∫x r ∫z g
∫x r g
r g
X
DeepBio 48
8-8#
d- pug
if P .
=
Pg
,
then pay =D ( GCZI ) alternate
@ € #
[ 1

Proof
∇ VX = ∇ a log X + b log(1 − X)X ( )
= ∇ a log X + ∇ b log(1 − X)X X
= a + b
X
1
1 − X
−1
=
X(1 − X)
a(1 − X) − bX
=
X(1 − X)
a − aX − bX
=
X(1 − X)
a − (a + b)X
DeepBio 49

Proof
Find the solution of this,
f(X) = a − (a + b)X
DeepBio 50

Proof
Find the solution of this,
f(X) = a − (a + b)X
Solution,
Function f(X) is monotone decreasing.
∴ is the maximum point of f(X).
a − (a + b)X = 0
(a + b)X = a
X =
a + b
a
a+b
a
DeepBio 51
←
fix ) has maximum
point

Theorem
The global minimum of the virtual training criterion L(D, g ) is
achieved if and only if p = p .
At that point, L(D, g ) achieves the value − log 4.
θ
g r
θ
DeepBio 52

Proof
L(D , g ) = max V (G, D)∗
θ D
=E [log D (x)] +E [log(1 − D (G(z)))]x∼pr G
∗
z∼pz G
∗
=E [log D (x)] +E [log(1 − D (x))]x∼pr G
∗
x∼pg G
∗
=E [log ] +E [log ]x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] +E [log ] + log 4 − log 4x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] + log 2 +E [log ] + log 2 − log 4x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] +E [log ] − log 4x∼pr
p (x) + p (x)r g
2p (x)r
x∼pg
p (x) + p (x)r g
2p (x)g
DeepBio 53
← fixed D
,
find EF
)
it , Preae =P
gen
.

where JS is Jensen‐Shannon Divergence difined as
JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M)
where, M = (P + Q)
∵ JS always ≥ 0, then − log 4 is global minimum
=E [log(p (x)/ )] +E [log(p (x)/ ] − log 4x∼pr r
2
p (x) + p (x)r g
x∼pg g
2
p (x) + p (x)r g
= KL[p (x)∣∣ ] + KL[p (x)∣∣ ] − log 4r
2
p (x) + p (x)r g
g
2
p (x) + p (x)r g
= −log4 + 2JS(p (x)∣∣p (x))r g
2
1
2
1
2
1
DeepBio 54

Jensen‐Shannon Divergence
JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M)
Two types of KL Divergence
KL(P∣∣Q) : Maximum liklihood. Approximations Q that overgeneralise P
KL(Q∣∣P) : Reverse KL Divergence. tends to favour under‐generalisation.
The optimal Q will typically describe the single largest mode of P well
Jensen Divergence would exhibit a behaviour that is kind of halfway
between the two extremes above
2
1
2
1
DeepBio 55

Training
Cost Function For D
J = − E log D(x) − E log(1 − D(G(z)))
Typical cross entropy with label 1, 0 ﴾Bernoulli﴿
Cost Function For G
J = − E log(D(G(z)))
Maximize log D(G(z)) instead of minimizing
log(1 − D(G(z))) ﴾cause vanishing gradient﴿
Also standard cross entropy with label 1
Really Good this way is??
(D)
2
1
x∼pdata 2
1
z
(G)
2
1
z
DeepBio 58

Secret of G Loss
We already know that
E [∇ log(1 − D (g (z)))] = ∇ 2JS(P ∣∣P )
Furthurmore,
z θ
∗
θ θ r g
KL(P ∣∣P )g r =E logx[
p (x)r
p (x)g
]
=E log −E logx[
p (x)r
p (x)g
] x[
p (x)g
p (x)g
]
=E log − KL(P ∣∣P )x[
1 − D (x)∗
D (x)∗
] g g
=E log − KL(P ∣∣P )x[
1 − D (g (z))∗
θ
D (g (z))∗
θ
] g g
DeepBio 59
( from Martin )

Taking derivatives in θ at θ we get
Subtracting this last equation with result for JSD,
E [−∇ log D (g (z))] = ∇ [KL(P ∣∣P ) − JS(P ∣∣P )]
JS push for the distributions to be different, which seems like a
fault in the update
KL appearing here assigns an extremely high cost to
generation fake looking samples, and an extremely low cost on
mode dropping
0
∇ KL(P ∣∣P )θ gθ r = −∇ E log − ∇ KL(P ∣∣P )θ z[
1 − D (g (z))∗
θ
D (g (z))∗
θ
] θ gθ gθ
=E −∇ logz[ θ
1 − D (g (z))∗
θ
D (g (z))∗
θ
]
z θ
∗
θ θ gθ r gθ r
DeepBio 60

DeepBio 61
Fagnant ascendoy
-
D ( Fcei
's
)

Latent Code
batch_size = 32
rand_dim = 50
z = tf.random_normal((batch_size, rand_dim))
Data load
data = tf.sg_data.Mnist(batch_size=batch_size)
x = data.train.image
y_real = tf.ones(batch_size)
y_fake = tf.zeros(batch_size)
DeepBio 62
# Sugar tensor lode
*
seal label 1
# fake label °

Model D
def discriminator(tensor):
    # reuse flag
    reuse = len([t for t in tf.global_variables() if t.name.startswit
    with tf.sg_context(name='discriminator', size=4, stride=2, act=
        res = (tensor
               .sg_conv(dim=64, name='conv1')
               .sg_conv(dim=128, name='conv2')
               .sg_flatten()
               .sg_dense(dim=1024, name='fc1')
               .sg_dense(dim=1, act='linear', bn=False, name='fc2'
               .sg_squeeze())
        return res
DeepBio 63

Model G
def generator(tensor):
    # reuse flag
    reuse = len([t for t in tf.global_variables() if t.name.startswit
    with tf.sg_context(name='generator', size=4, stride=2, act='leaky
        # generator network
        res = (tensor
               .sg_dense(dim=1024, name='fc1')
               .sg_dense(dim=7*7*128, name='fc2')
               .sg_reshape(shape=(‐1, 7, 7, 128))
               .sg_upconv(dim=64, name='conv1')
               .sg_upconv(dim=1, act='sigmoid', bn=False, name='conv2
        return res
DeepBio 64

Call
# generator
gen = generator(z)
# discriminator
disc_real = discriminator(x)
disc_fake = discriminator(gen)
DeepBio 65

Losses
# discriminator loss
loss_d_r = disc_real.sg_bce(target=y_real, name='disc_real')
loss_d_f = disc_fake.sg_bce(target=y_fake, name='disc_fake')
loss_d = (loss_d_r + loss_d_f) / 2
# generator loss
loss_g = disc_fake.sg_bce(target=y_real, name='gen')
DeepBio 66

Train
# train ops
# Default optimizer : MaxProp
train_disc = tf.sg_optim(loss_d, lr=0.0001, category='discriminator'
train_gen = tf.sg_optim(loss_g, lr=0.001, category='generator')
# def alternate training func
@tf.sg_train_func
def alt_train(sess, opt):
    l_disc = sess.run([loss_d, train_disc])[0]  # training discrimina
    l_gen = sess.run([loss_g, train_gen])[0]  # training generator
    return np.mean(l_disc) + np.mean(l_gen)

# do training
alt_train(ep_size=data.train.num_batch, early_stop=False, save_dir=
DeepBio 67

Features
Advantage
Advanced quality
Disadvantage
Unstable training
Mode collapsing
Issues
Simple networks structure
Loss selection...﴾alternative﴿
other conditions?
DeepBio 70

Normalizing Input
normalize the images between ‐1 and 1
Tanh as the last layer of the generator output
A Modified Loss Function
Like maximizing D(G(z)) instead of minimizing
1 − D(G(z))
Use a spherical Z
Sample from a gaussian distribution rather that uniform
DeepBio 78

XX Norm
One label per one mini‐batch
Batch norm, layer norm, instance norm, or batch renorm ...
Avoid Sparse Gradients : Relu, MaxPool
the stability of the GAN game suffers if you have sparse
gradients
leakyRelu = good ﴾in both G and D﴿
For down sampling, use : AVG pooling, strided conv
For up sampling, use : Conv_transpose, PixelShuffle
DeepBio 79

Use Soft and Noisy Lables
real : 1 ‐> 0.7 ~ 1.2
fake : 0 ‐> 0.0 ~ 0.3
flip for discriminator﴾occasionally﴿
ADAM is Good
SGD for D, ADAM for G
If you have labels, use them
go to the Conditional GAN
DeepBio 80

Add noise to inputs, decay over time
add some artificial noise to inputs to D
adding gaussian noise to every layer of G
Use dropout in G in both train and test phase
Provide noise in the form of dropout
Apply on several layers of our G at both traing and test time
DeepBio 81

Metal artifact reduction
DeepBio 84

Deep generative model.pdf

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Deep generative model.pdf (20)

Recently uploaded (20)

Deep generative model.pdf