SlideShare a Scribd company logo
2022. 06. 03.
A Style-Based Generator Architecture for Generative
Adversarial Networks
Tero Karras, Samuli Laine, Timo Aila
CVPR 2019
Hyunwook Lee
Contents
• Overview
• Preliminaries
• Disentangled Representation
• Various Normalization in Deep Learning
Domain
• StyleGAN
• Disentanglement of StyleGAN
• AdaIN in the StyleGAN
• Applications of StyleGAN
• Case: Music Generation
• Conclusion
3
Overview: What is the StyleGAN?
(a) Traditional generator and (b) StyleGAN generator
One of the most famous GANs for image synthesis
Automatic, unsupervised separation of high-level attributes
Control image synthesis inspired by style transfer
Scale-specific mixing and interpolation
Learnable
Operation
AdaIN
4
Overview: Examples with StyleGAN
• StyleGAN enables scale-specific styling
• Different styles in each layer only affect to
the corresponding scale of style
• Coarse – pose, general hair style, face, eyeglasses
• Middle – small facial features, hair style, eyes
• Fine – color scheme and microstructure
• How they achieve these styling?
 AdaIN from Style Transfer & Disentangled Representation!
5
Preliminaries: Disentangled Representation
Entangled and disentangled representation
One of the unsupervised representation learning in generative learning
Control image synthesis inspired by style transfer
Automatic, unsupervised separation of high-level attributes
Scale-specific mixing and interpolation
6
Preliminaries: Disentangled Representation
Training of the GANs in traditional way
 Latent z will be a kind of feature vector
(i.e., representation)
Train & Test
• Change of the latent z in arbitrary dimension causes changes of two or
more features  these features are entangled!
• Degrades interpretability and controllability of generation process
 each latent dimension should correspond to one “independent feature”
7
Preliminaries: Disentangled Representation
• Based on manifold hypothesis
• real-world high-dimensional data lie on low-
dimensional manifolds embedded within the high-
dimensional space
• Unit Gaussian is not enough to represents image
manifolds
• Images can badly reconstructed
• A latent space that consists of linear subspaces,
each of which controls one factor of variation
• Reading materials:
• InfoGAN, β-VAE, Spatial CBN, LAPGAN,…
8
Various Normalization in Deep Learning
• Commonly utilized in most of the deep learning models
• Main idea: normalize layer input  guarantee all the layers have same /
similar input distribution
mean/std among minibatch mean/std among channels mean/std in minibatch mean/std in minibatch
9
Instance Normalization in Style Transfer
• Convolutional feature statistics of DNN can capture the style of images
• Recent work reveals that channel-wise mean/variance are effective for
style transfer
 Instance Normalization can be seen as one of the style normalization!
10
Adaptive Instance Normalization
• Given context image x and style image y, the style can be obtained by:
• Normalize x (remove style of the context)  denormalize x with style of y
11
Style-based Generator (StyleGAN)
How can they achieve Disentangled Representation?
How can they design a generator as a style transfer?
Why do they need noise input for each layer?
(a) Traditional generator and (b) StyleGAN generator
Learnable
Operation
AdaIN
12
StyleGAN: Disentangled Representation
• Latent space disentanglement is crucial part for both style
transfer and generative model
• Hard to achieved by direct mapping (b in lower figure)
• StyleGAN generates disentangled intermediate latent space
𝒲
• Not a fixed distribution, but learned mapping
• Spatially invariant, modified by affined transformation A
• Generate images from disentangled representation is much
easier than that from entangled representation
 mapping network surely trained to generate disentangled
representation
13
StyleGAN: AdaIN as Styling Methods
• By affined transformation A, the vector w be the
style y = (ys, yb)
• ys is style deviation and yb is style mean
• Step-by-Step
• Input x is normalized as Instance Normalization
• Effectively localize the styles
• Denormalized by ys and yb
• To guarantee ys is standard deviation (i.e., positive value),
actual multiplier is ys + 1
• Forward to next layer
• Note: scale-specific styling is only possible when
we can separate each network output gradually
14
StyleGAN: Style Mixing
• Encouraging the styles to localize by
Style Mixing in training
• Simply,
• try to run two different latent code z1, z2
• Mix corresponding intermediate latent w1, w2
at a randomly selected point in 𝑔
• preventing the network from assuming
that adjacent styles are correlated
 more localized, scale-specific modification!
15
StyleGAN: Style Mixing
16
Style Mixing in Coarse Level (42 - 82)
17
Style Mixing in Middle Level (162 - 322)
• Bring smaller scale face features, hair style, eye
open / close,…
18
Style Mixing in Fine Level (642 – 10242)
• Mainly bring color scheme and microstructures
• Doesn’t change coarse / middle styles
19
StyleGAN: Stochastic Variation
• Traditional GANs achieves stochastic
variation by…
• generating spatially-varying pseudorandom
numbers
• Consumes network capacity
• Not always successful
• In StyleGAN…
• Introduce random noise in layer-level
• Hypothesis: there is pressure to introduce new
content as soon as possible at any point
• Fake discriminator
• The easiest way: introducing new random noise for
each layers  variation with random noises
20
StyleGAN: Stochastic Variation
• The main areas of stochastic
variation is
• the hair
• Silhouettes
• parts of background
 The noise doesn’t affect to
global aspects!
21
StyleGAN: Water Droplet –like Artifacts
22
Advances of StyleGAN: StyleGAN2
23
Advances of StyleGAN: StyleGAN2
Phase artifacts in StyleGAN Examples of unnatural images w/ StyleGAN
• StyleGAN (left) has texture sticking problem due to
the progressive growing
• Each Image in different scale generated by
corresponding generator, independently
• Adopting ResNet architecture to solve problem
• Note: not perfectly solved – it’ll be discussed in StyleGAN3
Examples of natural images w/ StyleGAN2
24
Advances of StyleGAN: StyleGAN3
• StyleGAN2 (left) has not perfectly solved texture sticking problem
• (left) Averaged images w/ small changes of latent should blur the central image
• (left) But StyleGAN2 have stick to the same pixel coordinates
• asdf
25
Advances of StyleGAN: StyleGAN3
26
Applications of StyleGAN: Image Domain
• InterFaceGAN
• Extract linear editing directions through attribute-level supervision
• StyleFlow
• First to present editing that is stable to be composed
• Normalizing flows and attribute-level supervision
• DyStyle
• Addresses compositional editing directly
• Accurate, elaborate, and diverse editing
• StyleCLIP
• Free textual editing w/ visual-linguistic pretrained model
• Pose with Style
• Human pose supervision to edit body poses and clothing
• StyleMapGAN
• Localized editing by augmenting StyleGAN’s architecture
w/ spatially adaptive modulation
27
Applications of StyleGAN: StyleMapGAN
28
Applications of StyleGAN: StyleMapGAN
• Localized editing by augmenting StyleGAN’s
architecture w/ spatially adaptive modulation
• Localied editing conducted with
29
Music Generation: Recent Works w/wo GANs
• Style-Conditioned Music Generation
• Style transfer-like methods in music generation w/ LSTM-based GANs
• Making style codebook that decides overall style of the music
• Symbolic Music Generation with Transformer-GANs
• Compound Word Transformer: Learning to Compose Full-Song Music over
Dynamic Directed Hypergraphs
• Transformer-based music generation model
30
Music Generation: Why is the StyleGAN hard to utilized?
• Main “scale-specific controllability” of the StyleGAN comes from the
stacked CNN w/ various size
 To utilize StyleGAN, it should be separable
• Music composition should be infinitely extended
 cannot utilize CNNs in temporal dimension
• Separation of the musical components (e.g., Motive – Phrase – Period)
 Hard to modeled like CNNs (intuitive separation of the components are hard)
 Each of them shares overall flow  separation causes incoherence music
• Separation of the Midi components (e.g., bar – beat - …)
 Using CNNs to combine them can cause information loss
(e.g., structured tokens)
• Too many additional features to consider
• StyleGAN and Image processing  no other input or consideration except image
• Music has a bunch of extra features like instrument
31
Conclusion
• “Scale-specific controllability” of the StyleGAN comes from the stacked
CNN w/ various size
 To utilize StyleGAN, target domain output should be separable
(e.g., 4x4  8x8  16x16  32x32  …  1024x1024)
• Maybe utilized in GUI design, but it will be more like “Conditioned image
synthesis regardless of the structure”
• If we utilize StyleGAN in GUI design, we should defense…
• why do we ignore the structures?
• Isn’t this design a combination of existing designs?
Thank you

More Related Content

PDF
Evolution of the StyleGAN family
PPTX
PDF
Generative Adversarial Networks
PPTX
Drowsiness Detection using machine learning (1).pptx
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PPT
Artificial neural network
PDF
Single Image Super Resolution Overview
PDF
GANs and Applications
Evolution of the StyleGAN family
Generative Adversarial Networks
Drowsiness Detection using machine learning (1).pptx
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Artificial neural network
Single Image Super Resolution Overview
GANs and Applications

What's hot (20)

PPTX
Style gan
PDF
Style gan2 review
PDF
Generative adversarial networks
PDF
Basic Generative Adversarial Networks
PDF
Style space analysis paper review !
PPTX
Generative Adversarial Networks (GANs)
PDF
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
PPTX
Generative Adversarial Networks (GAN)
PDF
Introduction to Generative Adversarial Networks (GANs)
PPTX
Generative Adversarial Network (GAN)
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PPTX
Disentangled Representation Learning of Deep Generative Models
PDF
Generative adversarial networks
PPTX
Image-to-Image Translation pix2pix
PPTX
Diffusion models beat gans on image synthesis
PDF
Finding connections among images using CycleGAN
PDF
Introduction to Generative Adversarial Networks
PDF
Self-supervised Learning Lecture Note
PDF
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PDF
Introduction To Generative Adversarial Networks GANs
Style gan
Style gan2 review
Generative adversarial networks
Basic Generative Adversarial Networks
Style space analysis paper review !
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GAN)
Introduction to Generative Adversarial Networks (GANs)
Generative Adversarial Network (GAN)
Transfer Learning and Fine-tuning Deep Neural Networks
Disentangled Representation Learning of Deep Generative Models
Generative adversarial networks
Image-to-Image Translation pix2pix
Diffusion models beat gans on image synthesis
Finding connections among images using CycleGAN
Introduction to Generative Adversarial Networks
Self-supervised Learning Lecture Note
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Introduction To Generative Adversarial Networks GANs
Ad

Similar to A Style-Based Generator Architecture for Generative Adversarial Networks (20)

PDF
PyData Delhi 2018 : Creating Art with Neural Nets
PDF
Demystifying Neural Style Transfer
PDF
A beginner's guide to Style Transfer and recent trends
PPTX
Let there be color! 논문 설명 입니다.
PDF
Image processing.pdf
PDF
3 D texturing
PDF
Deferred shading
PPT
Animated Visualization of Software History Using Software Evolution Storyboards
PPTX
Introduction image features
PPT
WT in IP.ppt
PDF
Y. Jung, ICML 2023, MLILAB, KAISTAI
PDF
Domain Transfer and Adaptation Survey
PDF
DC04 Image Compression Standards.pdf
PDF
Make your designers love (working with) you
PPTX
cgvr ppt key frame animation computer .
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PPTX
06 image features
PDF
Y. Kim, ICLR 2023, MLILAB, KAISTAI
PDF
Clean architecture for shaders unite2019
PDF
A (very brief) Introduction to Image Processing and 3D Printing with ImageJ
PyData Delhi 2018 : Creating Art with Neural Nets
Demystifying Neural Style Transfer
A beginner's guide to Style Transfer and recent trends
Let there be color! 논문 설명 입니다.
Image processing.pdf
3 D texturing
Deferred shading
Animated Visualization of Software History Using Software Evolution Storyboards
Introduction image features
WT in IP.ppt
Y. Jung, ICML 2023, MLILAB, KAISTAI
Domain Transfer and Adaptation Survey
DC04 Image Compression Standards.pdf
Make your designers love (working with) you
cgvr ppt key frame animation computer .
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
06 image features
Y. Kim, ICLR 2023, MLILAB, KAISTAI
Clean architecture for shaders unite2019
A (very brief) Introduction to Image Processing and 3D Printing with ImageJ
Ad

More from ivaderivader (20)

PPTX
Argument Mining
PPTX
Papers at CHI23
PPTX
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
PPTX
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
PPTX
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
PPTX
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
PPTX
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
PPTX
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
PPTX
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
PPTX
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
PPTX
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
PPTX
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
PPTX
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
PPTX
Invertible Denoising Network: A Light Solution for Real Noise Removal
PPTX
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
PPTX
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
PPTX
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
PPTX
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
PPTX
Natural Language to Visualization by Neural Machine Translation
PPTX
Recommending What Video to Watch Next: A Multitask Ranking System
Argument Mining
Papers at CHI23
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Invertible Denoising Network: A Light Solution for Real Noise Removal
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Natural Language to Visualization by Neural Machine Translation
Recommending What Video to Watch Next: A Multitask Ranking System

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

A Style-Based Generator Architecture for Generative Adversarial Networks

  • 1. 2022. 06. 03. A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras, Samuli Laine, Timo Aila CVPR 2019 Hyunwook Lee
  • 2. Contents • Overview • Preliminaries • Disentangled Representation • Various Normalization in Deep Learning Domain • StyleGAN • Disentanglement of StyleGAN • AdaIN in the StyleGAN • Applications of StyleGAN • Case: Music Generation • Conclusion
  • 3. 3 Overview: What is the StyleGAN? (a) Traditional generator and (b) StyleGAN generator One of the most famous GANs for image synthesis Automatic, unsupervised separation of high-level attributes Control image synthesis inspired by style transfer Scale-specific mixing and interpolation Learnable Operation AdaIN
  • 4. 4 Overview: Examples with StyleGAN • StyleGAN enables scale-specific styling • Different styles in each layer only affect to the corresponding scale of style • Coarse – pose, general hair style, face, eyeglasses • Middle – small facial features, hair style, eyes • Fine – color scheme and microstructure • How they achieve these styling?  AdaIN from Style Transfer & Disentangled Representation!
  • 5. 5 Preliminaries: Disentangled Representation Entangled and disentangled representation One of the unsupervised representation learning in generative learning Control image synthesis inspired by style transfer Automatic, unsupervised separation of high-level attributes Scale-specific mixing and interpolation
  • 6. 6 Preliminaries: Disentangled Representation Training of the GANs in traditional way  Latent z will be a kind of feature vector (i.e., representation) Train & Test • Change of the latent z in arbitrary dimension causes changes of two or more features  these features are entangled! • Degrades interpretability and controllability of generation process  each latent dimension should correspond to one “independent feature”
  • 7. 7 Preliminaries: Disentangled Representation • Based on manifold hypothesis • real-world high-dimensional data lie on low- dimensional manifolds embedded within the high- dimensional space • Unit Gaussian is not enough to represents image manifolds • Images can badly reconstructed • A latent space that consists of linear subspaces, each of which controls one factor of variation • Reading materials: • InfoGAN, β-VAE, Spatial CBN, LAPGAN,…
  • 8. 8 Various Normalization in Deep Learning • Commonly utilized in most of the deep learning models • Main idea: normalize layer input  guarantee all the layers have same / similar input distribution mean/std among minibatch mean/std among channels mean/std in minibatch mean/std in minibatch
  • 9. 9 Instance Normalization in Style Transfer • Convolutional feature statistics of DNN can capture the style of images • Recent work reveals that channel-wise mean/variance are effective for style transfer  Instance Normalization can be seen as one of the style normalization!
  • 10. 10 Adaptive Instance Normalization • Given context image x and style image y, the style can be obtained by: • Normalize x (remove style of the context)  denormalize x with style of y
  • 11. 11 Style-based Generator (StyleGAN) How can they achieve Disentangled Representation? How can they design a generator as a style transfer? Why do they need noise input for each layer? (a) Traditional generator and (b) StyleGAN generator Learnable Operation AdaIN
  • 12. 12 StyleGAN: Disentangled Representation • Latent space disentanglement is crucial part for both style transfer and generative model • Hard to achieved by direct mapping (b in lower figure) • StyleGAN generates disentangled intermediate latent space 𝒲 • Not a fixed distribution, but learned mapping • Spatially invariant, modified by affined transformation A • Generate images from disentangled representation is much easier than that from entangled representation  mapping network surely trained to generate disentangled representation
  • 13. 13 StyleGAN: AdaIN as Styling Methods • By affined transformation A, the vector w be the style y = (ys, yb) • ys is style deviation and yb is style mean • Step-by-Step • Input x is normalized as Instance Normalization • Effectively localize the styles • Denormalized by ys and yb • To guarantee ys is standard deviation (i.e., positive value), actual multiplier is ys + 1 • Forward to next layer • Note: scale-specific styling is only possible when we can separate each network output gradually
  • 14. 14 StyleGAN: Style Mixing • Encouraging the styles to localize by Style Mixing in training • Simply, • try to run two different latent code z1, z2 • Mix corresponding intermediate latent w1, w2 at a randomly selected point in 𝑔 • preventing the network from assuming that adjacent styles are correlated  more localized, scale-specific modification!
  • 16. 16 Style Mixing in Coarse Level (42 - 82)
  • 17. 17 Style Mixing in Middle Level (162 - 322) • Bring smaller scale face features, hair style, eye open / close,…
  • 18. 18 Style Mixing in Fine Level (642 – 10242) • Mainly bring color scheme and microstructures • Doesn’t change coarse / middle styles
  • 19. 19 StyleGAN: Stochastic Variation • Traditional GANs achieves stochastic variation by… • generating spatially-varying pseudorandom numbers • Consumes network capacity • Not always successful • In StyleGAN… • Introduce random noise in layer-level • Hypothesis: there is pressure to introduce new content as soon as possible at any point • Fake discriminator • The easiest way: introducing new random noise for each layers  variation with random noises
  • 20. 20 StyleGAN: Stochastic Variation • The main areas of stochastic variation is • the hair • Silhouettes • parts of background  The noise doesn’t affect to global aspects!
  • 21. 21 StyleGAN: Water Droplet –like Artifacts
  • 23. 23 Advances of StyleGAN: StyleGAN2 Phase artifacts in StyleGAN Examples of unnatural images w/ StyleGAN • StyleGAN (left) has texture sticking problem due to the progressive growing • Each Image in different scale generated by corresponding generator, independently • Adopting ResNet architecture to solve problem • Note: not perfectly solved – it’ll be discussed in StyleGAN3 Examples of natural images w/ StyleGAN2
  • 24. 24 Advances of StyleGAN: StyleGAN3 • StyleGAN2 (left) has not perfectly solved texture sticking problem • (left) Averaged images w/ small changes of latent should blur the central image • (left) But StyleGAN2 have stick to the same pixel coordinates • asdf
  • 26. 26 Applications of StyleGAN: Image Domain • InterFaceGAN • Extract linear editing directions through attribute-level supervision • StyleFlow • First to present editing that is stable to be composed • Normalizing flows and attribute-level supervision • DyStyle • Addresses compositional editing directly • Accurate, elaborate, and diverse editing • StyleCLIP • Free textual editing w/ visual-linguistic pretrained model • Pose with Style • Human pose supervision to edit body poses and clothing • StyleMapGAN • Localized editing by augmenting StyleGAN’s architecture w/ spatially adaptive modulation
  • 28. 28 Applications of StyleGAN: StyleMapGAN • Localized editing by augmenting StyleGAN’s architecture w/ spatially adaptive modulation • Localied editing conducted with
  • 29. 29 Music Generation: Recent Works w/wo GANs • Style-Conditioned Music Generation • Style transfer-like methods in music generation w/ LSTM-based GANs • Making style codebook that decides overall style of the music • Symbolic Music Generation with Transformer-GANs • Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs • Transformer-based music generation model
  • 30. 30 Music Generation: Why is the StyleGAN hard to utilized? • Main “scale-specific controllability” of the StyleGAN comes from the stacked CNN w/ various size  To utilize StyleGAN, it should be separable • Music composition should be infinitely extended  cannot utilize CNNs in temporal dimension • Separation of the musical components (e.g., Motive – Phrase – Period)  Hard to modeled like CNNs (intuitive separation of the components are hard)  Each of them shares overall flow  separation causes incoherence music • Separation of the Midi components (e.g., bar – beat - …)  Using CNNs to combine them can cause information loss (e.g., structured tokens) • Too many additional features to consider • StyleGAN and Image processing  no other input or consideration except image • Music has a bunch of extra features like instrument
  • 31. 31 Conclusion • “Scale-specific controllability” of the StyleGAN comes from the stacked CNN w/ various size  To utilize StyleGAN, target domain output should be separable (e.g., 4x4  8x8  16x16  32x32  …  1024x1024) • Maybe utilized in GUI design, but it will be more like “Conditioned image synthesis regardless of the structure” • If we utilize StyleGAN in GUI design, we should defense… • why do we ignore the structures? • Isn’t this design a combination of existing designs?

Editor's Notes

  • #9: Batch Norm  가장 기본적인 normalization technique, batch의 평균 / 분산이 전체 데이터셋을 대표한다는 가정하에 실행. inference와 training시의 실행 방식이 다름 Layer Norm  입력 scale에 robust, 가중치의 scale / shifting에 robust Instance Norm  각 채널별 / 배치별로 mean / std normalization, inferenc단에서도 동일하게 이용가능, 명암 대비 등을 normalize할 수 있음 Group Norm  2018년 Kaiming He가 발표, Layer Norm과 Instance Norm의 절충안
  • #17: Bring High-level aspects (i.e., pose, general hair style, face shape, and eye glasses)
  • #22: Normalization으로 인해 발생하는 smooth하지 못한 mapping임 64 by 64 image부터 나타나며, 모든 feature map에 발생함  AdaIN이 결국 channel간의 연관성을 박살을 내기때문 또한, normalization이 입력에 의존하기때문에, 입력에 아주 큰 spike가 있다면 다른 곳에서 세부 조정이 쉬워지기때문에
  • #23: Bias 및 noise가 normalize 전에 적용된다면 상대적인 영향력이 style magnitude에 반비례하게 됨.  noise와 bias가 style과 correlate하게 됨  따라서, normalization 이후에 noise 및 bias를 적용함 Mean 빼는 부분을 없앰 + data를 기반으로 한 normalization을 없앰 AdaIN을 weight의 norm / denorm으로 변경 (bias가 없는 convolution의 성질을 생각해본다면 간단하게 가능함.)
  • #26: Average value를 다루는 것으로 EMA 추가. Upsample을 통해 Filtering 진행 Upsampling에는 (a)에 있는 Filter를 이용  continuous, infinite spatial domain 가정
  • #31: Note: scale-specific styling is only possible when we can separate each network output gradually