SlideShare a Scribd company logo
DEEP GENERATIVE MODELS
VAE & GANs TUTORIAL
조형주
DeepBio
DeepBio 1
DEEP GENERATIVE MODELs
DeepBio 2
WHAT IS GENERATIVE MODEL
(x) = g (z)
https://blog.openai.com/generative‐models/
pθ^ θ
DeepBio 3
Z
'
.
feature
space
.
l Assumption
)
for Connectionist .
%
TZMTTYVCT
CNOTSAMPLINCT )
WHY GENERATIVE
The new way of simulating applied math/engineering domain
Combining with Reinforcement Learning
Good for semi‐supervised learning
Can work with multi‐modal output
Can make data with realitic generation
DeepBio 4
ZAN 's TALK W wynsrolb ,
& mmm
https://blog.openai.com/generative‐models/
DeepBio 5
ROMANLEFORKZNGRATZEALZANE
Hfkf
.
| stziotvum BIT take UKA httonl HE 81 HIM .
TAXONOMIC TREE OF GENERATIVE MODEL
GAN_Tutorial from Ian Goodfellow
DeepBio 6
nonsense
'M
µ%%aHk
.
(
www.WN
)
⇒TA(
zuqinyn .
RBM
CGAMPLZNCT)
TOY EXAMPLE
DeepBio 7
Generative model
(x) = g (z)
Let z ∼ N(0, 1)
Let g be a neural networks with transpose convolutional layers
﴾So Nice !!﴿
x ∼ X : MNIST dataset
L2 Loss ﴾Mean Square Error﴿
p^θ θ
DeepBio 8
FEATURE SPACE
PARAMHZZZFD BY O
=)
MA×2MUMw4L2k2e2H=
if font I -
No , E)
p(Ylx7 ~
N(g(Ho)
,
f) -8440) :
learner parfait .
LIPIO ) =
by Tlpcai , yi )
=
Ggtplyiiai ) Paci ) =6gTNYiK) +6ft Ma )
argmoaxtulokartmfncogtplesilnt =
hytrstrexp thrashing
=
-
Nloy FE6 -
Ira ( Itt -
genital P )
Generator ﴾TF‐code﴿
DeepBio 9
¥ '
#
"
Enge .me#..
Results ...
Maybe we need more conditions...
DeepBio 10
so MZTHZNCT WRONCT .
z GGHO)
:
VARIATIONAL AUTO‐ENCODER
DeepBio 11
Notations
x : Observed data, z : Latent variable
p(x) : Evidence, p(z) : Prior
p(x∣z) : Likelihood, p(z∣x) : Posterior
Probabilistic Model Difined As Joint Distribution of x, z
p(x, z)
DeepBio 12
I
t.FM#yfnsPH=niaeH=
#
eK=xd=
Fj
p(y=yzA=yrji
G-
§
no
,
r
;
s
Fini,plrx-aifsEPk-aiiEahPCZZ1zIPCx.nayt.ls@TkneNZ-zslx.x
;)
=
I.
Pixar , 2- =
th =
¥ =
¥ Fu
=P IEE
,
I # ni ) Paxil
PRIME
Model
p(x, z) = p(x∣z)p(z)
Our Interest Is Posterior !!
p(z∣x)
p(z∣x) = : Infer Good Value of z Given x
p(x) = p(x, z)dz = p(x∣z)p(z)dz
p(x) is Hard to calculate﴾INTRACTABLE﴿
Approaximate Posterior
p(x)
p(x∣z)p(z)
∫ ∫
DeepBio 13
4th =
HE rhueirjsinry 02.2 Latent variable
2 't 3h17 , "
I did ?
*t#N
Bayesian
Znfercnce
£ OBERVABUE
.
ZMP2Rzc#=
( BAYHZAN )
( PRODUCT RULE )
SAMMY
gMY#*oHAstw4HARD JOB 't
guys 's
.si#PuN4
:
Variational Inference
Pick a familiy of distributions over the latent variables with its
own variational parameters,
q (z∣x)
Find ϕ that makes q close to the posterior of interest
ϕ
DeepBio 14
www.vpaste.MN @
0 At KNT DON'T Access )
PARAMZRZZZD
¢
we
-0
/
-
Measure
⇒
USZNCT Vz
,
SAMPLZNCT PROBLEM
maw
"
pqootp → OPTIMZZZNLT PROBLEM ,
y →
gfor
gaussian , ( µ ,
r )
for uniform ,
( *min
, * max
)
i.
KULLBACK LEIBLER DIVERGENCE
Only if Q(i) = 0 implies P(i) = 0, for all i,
Measure of the non‐symmetric difference between two
probability distributions P and Q
KL(P∣∣Q) = p(x) log dx∫
q(x)
p(x)
= p(x) log p(x)dx − p(x) log q(x)dx∫ ∫
DeepBio 15
Pe > Qc
c)
equivalent
.
=) Q > P at Ewan th Kee
Pnt '
Ehyiohf %Z Malek 3h
we ioy M 39
,
Qcij '
t o
4mL
Ki ) '
to Terminator .
fiercely=) KENTROPY -
ENTROPY
"
BECAUSE 67 ENTROPY
,
NWTYMIETRK
ENTROPY = UNCERTAINTY
Property
The Kullback Leibler divergence is always non‐negative,
KL(P∣∣Q) ≥ 0
DeepBio 16
kl ( MIQ ) =/ pimhg My dn
Hmm @tTH )
Proof
X − 1 ≥ log X ⇒ log ≥ 1 − X
Using this,
X
1
KL(P∣∣Q) = p(x) log dx∫
q(x)
p(x)
≥ p(x) 1 − dx∫ (
p(x)
q(x)
)
= {p(x) − q(x)}dx∫
= p(x)dx − q(x)dx∫ ∫
= 1 − 1 = 0
DeepBio 17
¥t¥Ia⇒#i* ⇒
"
:# * ,
Lee
: :
Relationship with Maximum Likelihood Estimation
For minimizaing KL Divergence,
ϕ = argmin − p(x) log q(x; ϕ)dx
KL(P∣∣Q; ϕ) = p(x) log dx∫
q(x; ϕ)
p(x)
= p(x) log p(x)dx − p(x) log q(x; ϕ)dx∫ ∫
∗
ϕ ( ∫ )
DeepBio 18
PC a) =
qlk ; 0 ) ⇒ KLCPHQ ) -0
.
Maximizing Likelihood is equivalent to minimizing KL
Divergence
ϕ∗
= argmin − p(x) log q(x; ϕ)dxϕ ( ∫ )
= argmax p(x) log q(x; ϕ)dxϕ ∫
= argmax E [log q(x; ϕ)]ϕ x∼p(x;ϕ)
≊ argmax Σ log q(x ; ϕ)ϕ [
N
1
i
N
i ]
DeepBio 19
¥
III?Iki :
:*;¥a
-
LOLTLZKZLZHOLOD
JENSEN'S INEQUALITY
For Concave Function, f(E[x]) ≥ E[f(x)]
For Conveax Function, f(E[x]) ≤ E[f(x)]
DeepBio 20
An c.
Norte
.#T¥±i:#⇒fftn )
Evidence Lower BOund
log p(x) = log p(x, z)dx∫
z
= log p(x, z) dx∫
z q(z)
q(z)
= log q(z) dx∫
z q(z)
p(x, z)
= log E dxq
q(z)
p(x, z)
≥ E [log p(x, z)] −E [log q(z)]q q
DeepBio 21
of i. WZU KNOWN PROBABZLISTZC
Xdz
DZSTRZBUTZON
-
÷of
www.?EeumgD*µ÷⇒=***n⇒×
dz
-
zefso
LOCTPCHE
2-430am on at .
Variational Distribution
q (z∣x) = argmin KL(q (z∣x)∣∣p (z∣x))
Choose a family of variational distributions﴾q﴿
Fit the parameter﴾ϕ﴿ to minimize the distance of two
distribution﴾KL‐Divergence﴿
ϕ
∗
ϕ ϕ θ
DeepBio 22
go.fi#ttI
( RZVZRSZ KL DWERCTZNEE
)
KL Divergence
KL(q (z∣x)∣∣p (z∣x))ϕ θ =E logqϕ
[
p (z∣x)θ
q (z∣x)ϕ
]
=E log q (z∣x) − log p (z∣x)qϕ
[ ϕ θ ]
=E log q (z∣x) − log p (z∣x)qϕ
[ ϕ θ
p (x)θ
p (x)θ
]
=E log q (z∣x) − log p (x, z) + log p (x)qϕ
[ ϕ θ θ ]
=E [log q (z∣x) − log p (x, z)] + log p (x)qϕ ϕ θ θ
DeepBio 23
1KZVERSE)
Object
q (z∣x) = argmin E log q (z∣x) − log p (x, z) + log p (x)
q (z∣x) is negative ELBO plus log marginal probability of x
log p (x) does not depend on q
Minimizing the KL divergence is the same as maximizing the
ELBO
q (z∣x) = argmax ELBO
ϕ
∗
ϕ [ qϕ
[ ϕ θ ] θ ]
ϕ
∗
θ
ϕ
∗
ϕ
DeepBio 24
frtoolmyttmmee
MZNZMZZZKL 7430
→•
→
hfpdn )
6
mm
-
EUBO
Variational Lower Bound
For each data point x , marginal likelihood of individual data pointi
log p (x )θ i ≥ L(θ, ϕ; x )i
=E − log q (z∣x ) + log p (x , z)q (z∣x )ϕ i
[ ϕ i θ i ]
=E log p (x ∣z)p (z) − log q (z∣x )q (z∣x )ϕ i
[ θ i θ ϕ i ]
=E log p (x ∣z) − (log q (z∣x ) − log p (z))q (z∣x )ϕ i
[ θ i ϕ i θ ]
=E log p (x ∣z) −E logq (z∣x )ϕ i
[ θ i ] q (z∣x )ϕ i
[(
p (z)θ
q (z∣x )ϕ i
)]
=E log p (x ∣z) − KL q (z∣x )∣∣p (z)q (z∣x )ϕ i
[ θ i ] (( ϕ i θ ))
DeepBio 25
EUBO
Infarct
IT
a
- µAxvM2#
⇒ KLBIMZMMH )
#yq# EKKAVGATA
ELBO
L(θ, ϕ; x ) =E log p (x ∣z) − KL q (z∣x )∣∣p (z)
q (z∣x ) : proposal distribution
p (z) : prior ﴾our belief﴿
How to Choose a Good Proposal Distribution
Easy to sample
Differentiable ﴾∵ Backprop.﴿
i q (z∣x )ϕ i
[ θ i ] (( ϕ i θ ))
ϕ i
θ
DeepBio 26
n
posterior approximate
→ Earth 4h .
) → CTAVKZAN
Maximizing ELBO ‐ I
L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z)
ϕ = argmax E log p(x ∣z)
E log p(x ∣z) : Log‐Likelihood ﴾NOT LOSS﴿
Maximize likelihood for maximizing ELBO ﴾NOT MINIMIZE!!﴿
i q (z∣x )ϕ i
[ i ] (( ϕ i ))
∗
ϕ q (z∣x )ϕ i
[ i ]
q (z∣x )ϕ i
[ i ]
DeepBio 27
( Lott ruklrtloob )
Log Likelihood
In case of Bernoulli distribution p(x∣z) is,
E log p(x∣z) = x log p(y ) + (1 − x ) log(1 − p(y ))
For maximize it, minimize Negative Log Likelihood !!
Loss = − [x log( ) + (1 − x ) log(1 − )]
Already know as Sigmoid Cross‐Entropy
is output of Decoder
We call it Reconstructure Loss
q (z∣x)ϕ
i=1
∑
n
i i i i
n
1
i=1
∑
n
i x^i i x^i
x^i
DeepBio 28
normalisation
I
L
f :
the
output
is 4 ,
i ] )
* zl
£CH
( or Binomial Cross
Entropy )
Zn ale of Faustian distribution ,
loss
= L 2 los } ( Mk )
Maximizing ELBO ‐ II
L(ϕ; x ) =E log p(x ∣z) − KL q (z∣x )∣∣p(z)
ϕ = argmin KL q (z∣x )∣∣p(z)
Assume that prior and posterior approaximation are Gaussian
﴾actually it's not a critical issue...﴿
Then we can use KL Divergence according to definition
Let prior be N(0, 1)
How about q (z∣x ) ?
i q (z∣x )ϕ i
[ i ] (( ϕ i ))
∗
ϕ (( ϕ i ))
ϕ i
DeepBio 29
Posterior
Posterior approaximation is Gaussian,
q (z∣x ) = N(μ , σ )
where, (μ , σ ) is the output of Encoder
ϕ i i i
2
i i
DeepBio 30
if dimofznto
⇒
Nof µ ,
6 =@
• •
• •
Minimizing KL Divergence
KL(q (z∣x)∣∣p(z)) = q (z) log q (z)dz − q (z) log p(z)dz
q (z) log q (z∣x)dz = N(μ , σ ) log N(μ , σ )dz
    = − log 2π − (1 + log σ )
q (z) log p(z)dz = N(μ , σ ) log N(0, 1)dz
   = − log 2π − (μ + σ )
Therefore,
KL(q (z∣x)∣∣p(z)) = 1 + log σ − μ − σ
ϕ ∫ ϕ ϕ ∫ ϕ
∫ ϕ ϕ ∫ i i
2
i i
2
2
N
2
1
∑N
i
2
∫ ϕ ∫ i i
2
2
N
2
1
∑N
i
2
i
2
ϕ
2
1
∑
N
[ i
2
i
2
i
2
]
DeepBio 31
for
)"EFFI.
! Basic format
AUTO‐ENCODER
Encoder : MLPs to Infer (μ , σ ) for q (z∣x )
Decoder : MLPs to Infer using latent variables ∼ N(μ, σ )
Is it differentiable? ﴾ = possible to backprop?﴿
i i ϕ i
x^ 2
DeepBio 32
J I
REPARAMETERIZATION TRICK
Tutorial on Variational Autoencoders
DeepBio 33
NOT ABLE To 0
BACKPAY -
Now , sampling process is
independent
To the model .
1- → D k ) I GAMPLENLT ( not ✓ armpit )
( Tust constant )
Latent Code
batch_size = 32
rand_dim = 50
z = tf.random_normal((batch_size, rand_dim))
Data load
# MNIST input tensor ( with QueueRunner )
data = tf.sg_data.Mnist(batch_size=32)
# input images
x = data.train.image
DeepBio 34
# All code is written
using Sugar
-
tensor
,
KF wrapper)
# number of 2- variables
# normal distribution
Encoder
# assume that std = 1
with tf.sg_context(name='encoder', size=4, stride=2, act='relu'):
    mu = (x
          .sg_conv(dim=64)
          .sg_conv(dim=128)
          .sg_flatten()
          .sg_dense(dim=1024)
          .sg_dense(dim=num_dim, act='linear'))
          
# re‐parameterization trick with random gaussian
z = mu + tf.random_normal(mu.get_shape())
DeepBio 35
# down
sampling Ya
# down
sanplny 112
A MPs
# MLPS
# assume that 6=1
Decoder
with tf.sg_context(name='decoder', size=4, stride=2, act='relu'):
    xx = (z
          .sg_dense(dim=1024)
          .sg_dense(dim=7*7*128)
          .sg_reshape(shape=(‐1, 7, 7, 128))
          .sg_upconv(dim=64)
          .sg_upconv(dim=1, act='sigmoid'))
DeepBio 36
) # MPs
→ reshape to 4h -
tensor
# transpose convnet
If transpose
com net
Losses
loss_recon = xx.sg_mse(target=x, name='recon').sg_mean(axis=[1, 
loss_kld = tf.square(mu).sg_sum(axis=1) / (28 * 28)
tf.sg_summary_loss(loss_kld, name='kld')
loss = loss_recon + loss_kld * 0.5
DeepBio 37
yla
loss
( , ⇒ ⇒ asset
Train
# do training
tf.sg_train(loss=loss, log_interval=10, ep_size=data.train.num_batch,
            save_dir='asset/train/vae')
DeepBio 38
Results
DeepBio 39
BLURRY ZMAFZ
Features
Advantage
Fast and Easy to train
We can check the loss and evaluate
Disadvantage
Low Quality
Even though q reached the optimal point, it is quite different with p
Issues
Reconstruction loss ﴾x‐entropy, L1, L2, ...﴿
MLPs structure
Regularizer loss ﴾sometimes don't use log, sometimes use exp, ...﴿
...
DeepBio 40
GENERATIVE ADVERSARIAL NETWORKS
DeepBio 41
DeepBio 42terry
vm 's facebook
page
.
DeepBio 43
DeepBio 44
Value Function
min max V (D, G)
=E [log D(x)] +E [log(1 − D(G(z)))]
For second term, E [log(1 − D(G(z)))]
D want to maximize it → Do not fool
G want to minimize it → Fool
G D
x∼p (x)data z∼p (z)z
z∼p (z)z
DeepBio 45
D
Example
DeepBio 46
around trunk
for tlznit for fixed 'T Zteration
~ → -
,
Is
Global Optimulity of p = p
D (x) =
note that 'FOR ANY GIVEN generator G'
g data
G
∗
p + p (x)data g
p (x)data
DeepBio 47
olaphoz of Ham CT4 output 't original data st Foot 2tt .
14
Proof
For G fixed,
V (G, D) = p (x) log(D(x))dx + p (z) log(1 − D(G(z))dz
= p (x) log(D(x)) + p (x) log(1 − D(x))dx
Let X = D(x),   a = p (x),   b = p (x). So,
V = a log X + b log(1 − X)
Find X which can maximize the value function V .
∇ V
∫x r ∫z g
∫x r g
r g
X
DeepBio 48
8-8#
d- pug
if P .
=
Pg
,
then pay =D ( GCZI ) alternate
@ € #
[ 1
Proof
∇ VX = ∇ a log X + b log(1 − X)X ( )
= ∇ a log X + ∇ b log(1 − X)X X
= a + b
X
1
1 − X
−1
=
X(1 − X)
a(1 − X) − bX
=
X(1 − X)
a − aX − bX
=
X(1 − X)
a − (a + b)X
DeepBio 49
Proof
Find the solution of this,
f(X) = a − (a + b)X
DeepBio 50
Proof
Find the solution of this,
f(X) = a − (a + b)X
Solution,
Function f(X) is monotone decreasing.
∴ is the maximum point of f(X).
a − (a + b)X = 0
(a + b)X = a
X =
a + b
a
a+b
a
DeepBio 51
←
fix ) has maximum
point
Theorem
The global minimum of the virtual training criterion L(D, g ) is
achieved if and only if p = p .
At that point, L(D, g ) achieves the value − log 4.
θ
g r
θ
DeepBio 52
Proof
L(D , g ) = max V (G, D)∗
θ D
=E [log D (x)] +E [log(1 − D (G(z)))]x∼pr G
∗
z∼pz G
∗
=E [log D (x)] +E [log(1 − D (x))]x∼pr G
∗
x∼pg G
∗
=E [log ] +E [log ]x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] +E [log ] + log 4 − log 4x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] + log 2 +E [log ] + log 2 − log 4x∼pr
p (x) + p (x)r g
p (x)r
x∼pg
p (x) + p (x)r g
p (x)g
=E [log ] +E [log ] − log 4x∼pr
p (x) + p (x)r g
2p (x)r
x∼pg
p (x) + p (x)r g
2p (x)g
DeepBio 53
← fixed D
,
find EF
)
it , Preae =P
gen
.
where JS is Jensen‐Shannon Divergence difined as
JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M)
where, M = (P + Q)
∵ JS always ≥ 0, then − log 4 is global minimum
  =E [log(p (x)/ )] +E [log(p (x)/ ] − log 4x∼pr r
2
p (x) + p (x)r g
x∼pg g
2
p (x) + p (x)r g
= KL[p (x)∣∣ ] + KL[p (x)∣∣ ] − log 4r
2
p (x) + p (x)r g
g
2
p (x) + p (x)r g
= −log4 + 2JS(p (x)∣∣p (x))r g
2
1
2
1
2
1
DeepBio 54
Jensen‐Shannon Divergence
JS(P∣∣Q) = KL(P∣∣M) + KL(Q∣∣M)
Two types of KL Divergence
KL(P∣∣Q) : Maximum liklihood. Approximations Q that overgeneralise P
KL(Q∣∣P) : Reverse KL Divergence. tends to favour under‐generalisation.
The optimal Q will typically describe the single largest mode of P well
Jensen Divergence would exhibit a behaviour that is kind of halfway
between the two extremes above
2
1
2
1
DeepBio 55
DeepBio 56
DeepBio 57
Training
Cost Function For D
J = − E log D(x) − E log(1 − D(G(z)))
Typical cross entropy with label 1, 0 ﴾Bernoulli﴿
Cost Function For G
J = − E log(D(G(z)))
Maximize log D(G(z)) instead of minimizing
log(1 − D(G(z))) ﴾cause vanishing gradient﴿
Also standard cross entropy with label 1
Really Good this way is??
(D)
2
1
x∼pdata 2
1
z
(G)
2
1
z
DeepBio 58
Secret of G Loss
We already know that
E [∇ log(1 − D (g (z)))] = ∇ 2JS(P ∣∣P )
Furthurmore,
z θ
∗
θ θ r g
KL(P ∣∣P )g r =E logx[
p (x)r
p (x)g
]
=E log −E logx[
p (x)r
p (x)g
] x[
p (x)g
p (x)g
]
=E log − KL(P ∣∣P )x[
1 − D (x)∗
D (x)∗
] g g
=E log − KL(P ∣∣P )x[
1 − D (g (z))∗
θ
D (g (z))∗
θ
] g g
DeepBio 59
( from Martin )
Taking derivatives in θ at θ we get
Subtracting this last equation with result for JSD,
E [−∇ log D (g (z))] = ∇ [KL(P ∣∣P ) − JS(P ∣∣P )]
JS push for the distributions to be different, which seems like a
fault in the update
KL appearing here assigns an extremely high cost to
generation fake looking samples, and an extremely low cost on
mode dropping
0
∇ KL(P ∣∣P )θ gθ r = −∇ E log − ∇ KL(P ∣∣P )θ z[
1 − D (g (z))∗
θ
D (g (z))∗
θ
] θ gθ gθ
=E −∇ logz[ θ
1 − D (g (z))∗
θ
D (g (z))∗
θ
]
z θ
∗
θ θ gθ r gθ r
DeepBio 60
DeepBio 61
Fagnant ascendoy
-
D ( Fcei
's
)
Latent Code
batch_size = 32
rand_dim = 50
z = tf.random_normal((batch_size, rand_dim))
Data load
data = tf.sg_data.Mnist(batch_size=batch_size)
x = data.train.image
y_real = tf.ones(batch_size)
y_fake = tf.zeros(batch_size)
DeepBio 62
# Sugar tensor lode
*
seal label 1
# fake label °
Model D
def discriminator(tensor):
    # reuse flag
    reuse = len([t for t in tf.global_variables() if t.name.startswit
    with tf.sg_context(name='discriminator', size=4, stride=2, act=
        res = (tensor
               .sg_conv(dim=64, name='conv1')
               .sg_conv(dim=128, name='conv2')
               .sg_flatten()
               .sg_dense(dim=1024, name='fc1')
               .sg_dense(dim=1, act='linear', bn=False, name='fc2'
               .sg_squeeze())
        return res
DeepBio 63
Model G
def generator(tensor):
    # reuse flag
    reuse = len([t for t in tf.global_variables() if t.name.startswit
    with tf.sg_context(name='generator', size=4, stride=2, act='leaky
        # generator network
        res = (tensor
               .sg_dense(dim=1024, name='fc1')
               .sg_dense(dim=7*7*128, name='fc2')
               .sg_reshape(shape=(‐1, 7, 7, 128))
               .sg_upconv(dim=64, name='conv1')
               .sg_upconv(dim=1, act='sigmoid', bn=False, name='conv2
        return res
DeepBio 64
Call
# generator
gen = generator(z)
# discriminator
disc_real = discriminator(x)
disc_fake = discriminator(gen)
DeepBio 65
Losses
# discriminator loss
loss_d_r = disc_real.sg_bce(target=y_real, name='disc_real')
loss_d_f = disc_fake.sg_bce(target=y_fake, name='disc_fake')
loss_d = (loss_d_r + loss_d_f) / 2
# generator loss
loss_g = disc_fake.sg_bce(target=y_real, name='gen')
DeepBio 66
Train
# train ops
# Default optimizer : MaxProp
train_disc = tf.sg_optim(loss_d, lr=0.0001, category='discriminator'
train_gen = tf.sg_optim(loss_g, lr=0.001, category='generator')  
# def alternate training func
@tf.sg_train_func
def alt_train(sess, opt):
    l_disc = sess.run([loss_d, train_disc])[0]  # training discrimina
    l_gen = sess.run([loss_g, train_gen])[0]  # training generator
    return np.mean(l_disc) + np.mean(l_gen)
    
# do training
alt_train(ep_size=data.train.num_batch, early_stop=False, save_dir=
DeepBio 67
DeepBio 68
Results
DeepBio 69
Features
Advantage
Advanced quality
Disadvantage
Unstable training
Mode collapsing
Issues
Simple networks structure
Loss selection...﴾alternative﴿
other conditions?
DeepBio 70
DCGAN
DeepBio 71
DeepBio 72
Network structure
DeepBio 73
Tips
DeepBio 74
Z Vector
DeepBio 75
DeepBio 76
GAN HACKS
DeepBio 77
Normalizing Input
normalize the images between ‐1 and 1
Tanh as the last layer of the generator output
A Modified Loss Function
Like maximizing D(G(z)) instead of minimizing
1 − D(G(z))
Use a spherical Z
Sample from a gaussian distribution rather that uniform
DeepBio 78
XX Norm
One label per one mini‐batch
Batch norm, layer norm, instance norm, or batch renorm ...
Avoid Sparse Gradients : Relu, MaxPool
the stability of the GAN game suffers if you have sparse
gradients
leakyRelu = good ﴾in both G and D﴿
For down sampling, use : AVG pooling, strided conv
For up sampling, use : Conv_transpose, PixelShuffle
DeepBio 79
Use Soft and Noisy Lables
real : 1 ‐> 0.7 ~ 1.2
fake : 0 ‐> 0.0 ~ 0.3
flip for discriminator﴾occasionally﴿
ADAM is Good
SGD for D, ADAM for G
If you have labels, use them
go to the Conditional GAN
DeepBio 80
Add noise to inputs, decay over time
add some artificial noise to inputs to D
adding gaussian noise to every layer of G
Use dropout in G in both train and test phase
Provide noise in the form of dropout
Apply on several layers of our G at both traing and test time
DeepBio 81
GAN in Medical
DeepBio 82
Tumor segmentation
DeepBio 83
Metal artifact reduction
DeepBio 84
Thank you
DeepBio 85

More Related Content

PDF
20191019 sinkhorn
PDF
Wasserstein GAN
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
[DL輪読会]Control as Inferenceと発展
PDF
오토인코더의 모든 것
PPTX
Imitation learning tutorial
PDF
Introduction to Generative Adversarial Networks (GANs)
PDF
Soft Actor Critic 解説
20191019 sinkhorn
Wasserstein GAN
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
[DL輪読会]Control as Inferenceと発展
오토인코더의 모든 것
Imitation learning tutorial
Introduction to Generative Adversarial Networks (GANs)
Soft Actor Critic 解説

What's hot (20)

PDF
ddpg seminar
PDF
Wasserstein GAN 수학 이해하기 I
PDF
全日本コンピュータビジョン勉強会:Disentangling and Unifying Graph Convolutions for Skeleton-B...
PDF
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
PPTX
십분딥러닝_16_WGAN (Wasserstein GANs)
PDF
Improved Trainings of Wasserstein GANs (WGAN-GP)
PDF
Variational AutoEncoder
PDF
関西CVPRML勉強会2018 岡本大和 Unsupervised Feature Learning Via Non-Parametric Instanc...
PDF
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
PDF
深層学習による非滑らかな関数の推定
PDF
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
PDF
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
PPTX
劣モジュラ最適化と機械学習1章
PDF
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
PDF
[DL輪読会]Attention Is All You Need
PDF
強化学習の分散アーキテクチャ変遷
PPTX
Graph Neural Network (한국어)
PDF
[DL輪読会]Scalable Training of Inference Networks for Gaussian-Process Models
PDF
GAN(と強化学習との関係)
PPTX
【DL輪読会】Factory: Fast Contact for Robotic Assembly
ddpg seminar
Wasserstein GAN 수학 이해하기 I
全日本コンピュータビジョン勉強会:Disentangling and Unifying Graph Convolutions for Skeleton-B...
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
십분딥러닝_16_WGAN (Wasserstein GANs)
Improved Trainings of Wasserstein GANs (WGAN-GP)
Variational AutoEncoder
関西CVPRML勉強会2018 岡本大和 Unsupervised Feature Learning Via Non-Parametric Instanc...
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
深層学習による非滑らかな関数の推定
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
劣モジュラ最適化と機械学習1章
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
[DL輪読会]Attention Is All You Need
強化学習の分散アーキテクチャ変遷
Graph Neural Network (한국어)
[DL輪読会]Scalable Training of Inference Networks for Gaussian-Process Models
GAN(と強化学習との関係)
【DL輪読会】Factory: Fast Contact for Robotic Assembly
Ad

Viewers also liked (7)

PDF
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
PPTX
160927 VR Mini Conference - 소프트뱅크 장유진 심사역
PPSX
160927 VR Mini Conference - NUNULO 김진태 대표
PDF
세계 선도 It사 및 게임사 벤치마킹 + 인사이트 보고서 (6부)_새로운 삶의 시작
PDF
160927 VR Mini Conference - AIXLAB 이상수 대표
PDF
Applying deep learning to medical data
PDF
블록 체인 기술 원리, 이용 현황, 전망과 활용 분야.
라즈베리파이와 서버리스 환경을 통한 얼굴 인식 AI 서비스 구현 - AWS Summit Seoul 2017
160927 VR Mini Conference - 소프트뱅크 장유진 심사역
160927 VR Mini Conference - NUNULO 김진태 대표
세계 선도 It사 및 게임사 벤치마킹 + 인사이트 보고서 (6부)_새로운 삶의 시작
160927 VR Mini Conference - AIXLAB 이상수 대표
Applying deep learning to medical data
블록 체인 기술 원리, 이용 현황, 전망과 활용 분야.
Ad

Similar to Deep generative model.pdf (20)

PDF
Meta-learning and the ELBO
PDF
Recursive Compressed Sensing
PDF
Hyperfunction method for numerical integration and Fredholm integral equation...
PDF
Lecture9 xing
PDF
Murphy: Machine learning A probabilistic perspective: Ch.9
PDF
20191026 bayes dl
PDF
Semi vae memo (2)
PDF
Fast parallelizable scenario-based stochastic optimization
PDF
Tensor Train data format for uncertainty quantification
PDF
Error control coding bch, reed-solomon etc..
PDF
Bayesian inference on mixtures
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Numerical integration based on the hyperfunction theory
PDF
從 VAE 走向深度學習新理論
PDF
Testing for mixtures by seeking components
PDF
Bayesian Deep Learning
PDF
Maximizing Submodular Function over the Integer Lattice
PDF
The dual geometry of Shannon information
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Introduction to modern Variational Inference.
Meta-learning and the ELBO
Recursive Compressed Sensing
Hyperfunction method for numerical integration and Fredholm integral equation...
Lecture9 xing
Murphy: Machine learning A probabilistic perspective: Ch.9
20191026 bayes dl
Semi vae memo (2)
Fast parallelizable scenario-based stochastic optimization
Tensor Train data format for uncertainty quantification
Error control coding bch, reed-solomon etc..
Bayesian inference on mixtures
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Numerical integration based on the hyperfunction theory
從 VAE 走向深度學習新理論
Testing for mixtures by seeking components
Bayesian Deep Learning
Maximizing Submodular Function over the Integer Lattice
The dual geometry of Shannon information
Estimation of the score vector and observed information matrix in intractable...
Introduction to modern Variational Inference.

Recently uploaded (20)

PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Introduction to Artificial Intelligence
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
AI in Product Development-omnex systems
PDF
Softaken Excel to vCard Converter Software.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Introduction to Artificial Intelligence
How to Choose the Right IT Partner for Your Business in Malaysia
wealthsignaloriginal-com-DS-text-... (1).pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Upgrade and Innovation Strategies for SAP ERP Customers
L1 - Introduction to python Backend.pptx
Transform Your Business with a Software ERP System
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Operating system designcfffgfgggggggvggggggggg
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
AI in Product Development-omnex systems
Softaken Excel to vCard Converter Software.pdf

Deep generative model.pdf