A Style-Based Generator Architecture for Generative Adversarial Networks

2022. 06. 03.
A Style-Based Generator Architecture for Generative
Adversarial Networks
Tero Karras, Samuli Laine, Timo Aila
CVPR 2019
Hyunwook Lee

Contents
• Overview
• Preliminaries
• Disentangled Representation
• Various Normalization in Deep Learning
Domain
• StyleGAN
• Disentanglement of StyleGAN
• AdaIN in the StyleGAN
• Applications of StyleGAN
• Case: Music Generation
• Conclusion

3
Overview: What is the StyleGAN?
(a) Traditional generator and (b) StyleGAN generator
One of the most famous GANs for image synthesis
Automatic, unsupervised separation of high-level attributes
Control image synthesis inspired by style transfer
Scale-specific mixing and interpolation
Learnable
Operation
AdaIN

4
Overview: Examples with StyleGAN
• StyleGAN enables scale-specific styling
• Different styles in each layer only affect to
the corresponding scale of style
• Coarse – pose, general hair style, face, eyeglasses
• Middle – small facial features, hair style, eyes
• Fine – color scheme and microstructure
• How they achieve these styling?
 AdaIN from Style Transfer & Disentangled Representation!

5
Preliminaries: Disentangled Representation
Entangled and disentangled representation
One of the unsupervised representation learning in generative learning
Control image synthesis inspired by style transfer
Automatic, unsupervised separation of high-level attributes
Scale-specific mixing and interpolation

6
Training of the GANs in traditional way
 Latent z will be a kind of feature vector
(i.e., representation)
Train & Test
• Change of the latent z in arbitrary dimension causes changes of two or
more features  these features are entangled!
• Degrades interpretability and controllability of generation process
 each latent dimension should correspond to one “independent feature”

7
• Based on manifold hypothesis
• real-world high-dimensional data lie on low-
dimensional manifolds embedded within the high-
dimensional space
• Unit Gaussian is not enough to represents image
manifolds
• Images can badly reconstructed
• A latent space that consists of linear subspaces,
each of which controls one factor of variation
• Reading materials:
• InfoGAN, β-VAE, Spatial CBN, LAPGAN,…

8
Various Normalization in Deep Learning
• Commonly utilized in most of the deep learning models
• Main idea: normalize layer input  guarantee all the layers have same /
similar input distribution
mean/std among minibatch mean/std among channels mean/std in minibatch mean/std in minibatch

9
Instance Normalization in Style Transfer
• Convolutional feature statistics of DNN can capture the style of images
• Recent work reveals that channel-wise mean/variance are effective for
style transfer
 Instance Normalization can be seen as one of the style normalization!

10
Adaptive Instance Normalization
• Given context image x and style image y, the style can be obtained by:
• Normalize x (remove style of the context)  denormalize x with style of y

11
Style-based Generator (StyleGAN)
How can they achieve Disentangled Representation?
How can they design a generator as a style transfer?
Why do they need noise input for each layer?
(a) Traditional generator and (b) StyleGAN generator
Learnable
Operation
AdaIN

12
StyleGAN: Disentangled Representation
• Latent space disentanglement is crucial part for both style
transfer and generative model
• Hard to achieved by direct mapping (b in lower figure)
• StyleGAN generates disentangled intermediate latent space
𝒲
• Not a fixed distribution, but learned mapping
• Spatially invariant, modified by affined transformation A
• Generate images from disentangled representation is much
easier than that from entangled representation
 mapping network surely trained to generate disentangled
representation

13
StyleGAN: AdaIN as Styling Methods
• By affined transformation A, the vector w be the
style y = (ys, yb)
• ys is style deviation and yb is style mean
• Step-by-Step
• Input x is normalized as Instance Normalization
• Effectively localize the styles
• Denormalized by ys and yb
• To guarantee ys is standard deviation (i.e., positive value),
actual multiplier is ys + 1
• Forward to next layer
• Note: scale-specific styling is only possible when
we can separate each network output gradually

14
StyleGAN: Style Mixing
• Encouraging the styles to localize by
Style Mixing in training
• Simply,
• try to run two different latent code z1, z2
• Mix corresponding intermediate latent w1, w2
at a randomly selected point in 𝑔
• preventing the network from assuming
that adjacent styles are correlated
 more localized, scale-specific modification!

16
Style Mixing in Coarse Level (42 - 82)

17
Style Mixing in Middle Level (162 - 322)
• Bring smaller scale face features, hair style, eye
open / close,…

18
Style Mixing in Fine Level (642 – 10242)
• Mainly bring color scheme and microstructures
• Doesn’t change coarse / middle styles

19
StyleGAN: Stochastic Variation
• Traditional GANs achieves stochastic
variation by…
• generating spatially-varying pseudorandom
numbers
• Consumes network capacity
• Not always successful
• In StyleGAN…
• Introduce random noise in layer-level
• Hypothesis: there is pressure to introduce new
content as soon as possible at any point
• Fake discriminator
• The easiest way: introducing new random noise for
each layers  variation with random noises

20
StyleGAN: Stochastic Variation
• The main areas of stochastic
variation is
• the hair
• Silhouettes
• parts of background
 The noise doesn’t affect to
global aspects!

21
StyleGAN: Water Droplet –like Artifacts

22
Advances of StyleGAN: StyleGAN2

23
Phase artifacts in StyleGAN Examples of unnatural images w/ StyleGAN
• StyleGAN (left) has texture sticking problem due to
the progressive growing
• Each Image in different scale generated by
corresponding generator, independently
• Adopting ResNet architecture to solve problem
• Note: not perfectly solved – it’ll be discussed in StyleGAN3
Examples of natural images w/ StyleGAN2

24
• StyleGAN2 (left) has not perfectly solved texture sticking problem
• (left) Averaged images w/ small changes of latent should blur the central image
• (left) But StyleGAN2 have stick to the same pixel coordinates
• asdf

25

26
Applications of StyleGAN: Image Domain
• InterFaceGAN
• Extract linear editing directions through attribute-level supervision
• StyleFlow
• First to present editing that is stable to be composed
• Normalizing flows and attribute-level supervision
• DyStyle
• Addresses compositional editing directly
• Accurate, elaborate, and diverse editing
• StyleCLIP
• Free textual editing w/ visual-linguistic pretrained model
• Pose with Style
• Human pose supervision to edit body poses and clothing
• StyleMapGAN
• Localized editing by augmenting StyleGAN’s architecture
w/ spatially adaptive modulation

27
Applications of StyleGAN: StyleMapGAN

28
Applications of StyleGAN: StyleMapGAN
• Localized editing by augmenting StyleGAN’s
architecture w/ spatially adaptive modulation
• Localied editing conducted with

29
Music Generation: Recent Works w/wo GANs
• Style-Conditioned Music Generation
• Style transfer-like methods in music generation w/ LSTM-based GANs
• Making style codebook that decides overall style of the music
• Symbolic Music Generation with Transformer-GANs
• Compound Word Transformer: Learning to Compose Full-Song Music over
Dynamic Directed Hypergraphs
• Transformer-based music generation model

30
Music Generation: Why is the StyleGAN hard to utilized?
• Main “scale-specific controllability” of the StyleGAN comes from the
stacked CNN w/ various size
 To utilize StyleGAN, it should be separable
• Music composition should be infinitely extended
 cannot utilize CNNs in temporal dimension
• Separation of the musical components (e.g., Motive – Phrase – Period)
 Hard to modeled like CNNs (intuitive separation of the components are hard)
 Each of them shares overall flow  separation causes incoherence music
• Separation of the Midi components (e.g., bar – beat - …)
 Using CNNs to combine them can cause information loss
(e.g., structured tokens)
• Too many additional features to consider
• StyleGAN and Image processing  no other input or consideration except image
• Music has a bunch of extra features like instrument

31
Conclusion
• “Scale-specific controllability” of the StyleGAN comes from the stacked
CNN w/ various size
 To utilize StyleGAN, target domain output should be separable
(e.g., 4x4  8x8  16x16  32x32  …  1024x1024)
• Maybe utilized in GUI design, but it will be more like “Conditioned image
synthesis regardless of the structure”
• If we utilize StyleGAN in GUI design, we should defense…
• why do we ignore the structures?
• Isn’t this design a combination of existing designs?

A Style-Based Generator Architecture for Generative Adversarial Networks

More Related Content

What's hot (20)

Similar to A Style-Based Generator Architecture for Generative Adversarial Networks (20)

More from ivaderivader (20)

Recently uploaded (20)

A Style-Based Generator Architecture for Generative Adversarial Networks

Editor's Notes