SlideShare a Scribd company logo
3D Volumetric Data Generation
with
Generative Adversarial Networks
Hiroyuki Vincent Yamazaki
Keio University hvy@keio.jp
Preferred Networks Summer Internship, 2016
Background
Generative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised
learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel
images.
However, these networks have not yet been extended to higher dimensions such as volumetric 3D data.
Generated 3D model have various applications in entertainment and could be used as an alternative to
existing procedural methods for creating graphics.
This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models
by applying 3 dimensional convolutions and deconvolutions* on voxel data.
Goal
• Extension of GANs to 3D volumetric data, training on a single class
• Control the shapes of the generated models by e.g. interpolation
1. Introduction
*Transposed Convolutions
2. Training Data
3D CAD models from ShapeNet [2]
• Class: Chair
• Instances: 4846
Preprocessing
• Voxelization
• 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3]
• Normalization
• No normalization is applied. Data is in range [0, 1]
• Other
• Remove bad samples and centre the models in the 

space
Training Data Volume DistributionMean 3D Model
A GAN consists of a generator G and a discriminator D, in this
case, both of them are represented as a feed forward neural
network that are trained simultaneously.
• Random noise z vectors sampled from a uniform or
Gaussian distribution
Loss
• Softmax cross-entropies based on the predictions of D
• Separate losses for G and D defined by the minimax game
Optimal Discriminator Strategy
Optimization
• Adam for both G and D
• Learning rate of G is larger than D
3. Generative Adversarial Network
Random Noise
Random Index
Generator
(Linear, Deconvolution,
Batch Normalization,
ReLU, Sigmoid)
Discriminator
(Convolution, Linear, Leaky ReLU)
Training
Data
Generated
3D Model
Real
3D Model
Generated/Real
Prediction
See Appendix for the network architecture and Adam parameters
min
G
max
D
V (G, D) = Ex⇠Pdata(x)[log D(x)] + Ez⇠Pz(z)[log(1 D(G(z)))]
D(x) =
pdata(x)
pdata(x) + pG(x)
Issues with GAN
• Collapsing Generator
• G outputs similar 3D models for different inputs
• Non-semantic input z
• Interpolation of z indicate on sharp edges in the latent
space. Hence no way to control the shape of the output

Improving the GAN
• Avoid Generator from collapsing
• Minibatch Discrimination [4] layer in D
• Embed semantic meaning into the input [5]
• With z, concatenate additional latent codes before feeding
it to G
• Additional loss based on mutual information reconstruction
by D
Random Noise
+ Latent Codes
Random Index
Generator
(Linear, Deconvolution,
Batch Normalization,
ReLU, Sigmoid)
Discriminator
(Convolution, Linear, Leaky ReLU, Minibatch Discrimination)
Training
Data
Generated
3D Model
Real
3D Model
Generated/Real
Prediction
Mutual Information
Reconstruction
Minibatch Discrimination
Motivation
Avoid generator from collapsing to a single point
Idea
Reproduce the diversity in the training data
Minibatch Discrimination layer to D, before the
generated/real prediction
For each minibatch fed to this layer, compute the
L1 distance between all input vectors
Add this information to the given minibatch
Mutual Information Reconstruction
Motivation
Embed semantic meanings in z
Idea
Maximize the mutual information being preserved
for latent codes C that are passed through the
networks
Latent Codes, input to G
• C = [C1, C2, C3] (Concatenations)
• Categorical one-hot vector C1~Cat(K=2, p=0.5)
• Continuous C2~Unif(-1, 1)
• Continuous C3~Unif(-1, 1)
Reconstruction, output from D
• Categorical
• Softmax Cross Entropy
• Continuous
• Assume a fixed variance and compute the Gaussian
negative log-likelihood based on the mean.
z c1, e.g. [0, 1] c2 c3
Softmax1
𝞵2 𝞵3
Minibatch Discrimination Layer
Kernel
… …
• Minibatch size: 128
• Epochs: 100
4. Results
Generated 3D Models
*The blue models are their nearest models in the training dataset
3D Volume
Distributions
Chair-likeness Learned Distribution
True DistributionLosses
5. Conclusions
• GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions
• Smaller datasets (sparse data) leads to worse looking models with noise
• Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination
• In many cases, D improves faster than G
• Gradients back propagated through G saturates and training stops
• Training not converging
Future Work
• Larger dataset with potentially multiple classes
• Balance training between G and D
• Heuristic
• Stop updating D while it is too strong
• Larger G, i.e. more parameters
Reference
[1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, .
[2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, .
[3] Patrick Min, Binvox, 3D Mesh Voxelizer, http://www.patrickmin.com/binvox/
[4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, .
[5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, .
Appendix
Generator Discriminator
Input ∈ R128+2+2 Input 32x32x32 3D voxel data
FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU)
FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU
DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLU
DC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU
Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16
Output FC 2 (Generated/Real prediction)
FC 256, BN, lReLU
Output FC 2+2 (Mutual Information Reconstruction)
Adam Optimizer Parameters
Generator Discriminator
ɑ 0.001 0.00005
β1 0.5 0.5
β1 0.999 0.999
GAN Architecture

More Related Content

PPT
Teleimmersion
PDF
An Open Source solution for Three-Dimensional documentation: archaeological a...
PDF
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
PPTX
OpenStreetMap in 3D - current developments
PDF
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
PDF
Introducing google’s mobile nets
PDF
Introduction to 3D Computer Vision and Differentiable Rendering
PDF
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
Teleimmersion
An Open Source solution for Three-Dimensional documentation: archaeological a...
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
OpenStreetMap in 3D - current developments
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
Introducing google’s mobile nets
Introduction to 3D Computer Vision and Differentiable Rendering
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...

What's hot (20)

PPTX
Object Detection using Deep Neural Networks
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
PDF
Manifold learning with application to object recognition
PDF
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
PDF
Log polar coordinates
PDF
Differentiable Ray Sampling for Neural 3D Representation
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
PDF
Robust Watermarking through Dual Band IWT and Chinese Remainder Theorem
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PPTX
PCL (Point Cloud Library)
PDF
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
PDF
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
PDF
crfasrnn_presentation
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
PDF
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
PPTX
Deep learning for image super resolution
PPTX
Image segmentation hj_cho
Object Detection using Deep Neural Networks
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Manifold learning with application to object recognition
Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018
Log polar coordinates
Differentiable Ray Sampling for Neural 3D Representation
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Robust Watermarking through Dual Band IWT and Chinese Remainder Theorem
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PCL (Point Cloud Library)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)
crfasrnn_presentation
Semantic segmentation with Convolutional Neural Network Approaches
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Deep learning for image super resolution
Image segmentation hj_cho
Ad

Viewers also liked (20)

PDF
Bayesian Dark Knowledge and Matrix Factorization
PDF
対話における商品の営業
PDF
Generation of 3D-avatar animation from latent representations
PDF
Response Summarizer: An Automatic Summarization System of Call Center Convers...
PDF
DQN with Differentiable Memory Architectures
PDF
Anomaly Detection by ADGM / LVAE
PDF
Automatically Fusing Functions on CuPy
PDF
Imitation Learning for Autonomous Driving in TORCS
PDF
Ibis2016okanohara
PDF
実世界の人工知能@DeNA TechCon 2017
PDF
aiconf2017okanohara
PDF
IPAB2017 深層学習を使った新薬の探索から創造へ
PDF
Chainer, Cupy入門
PDF
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
PDF
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
PDF
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
PPTX
Chainerを使って細胞を数えてみた
PDF
On the benchmark of Chainer
PDF
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
PDF
マシンパーセプション研究におけるChainer活用事例
Bayesian Dark Knowledge and Matrix Factorization
対話における商品の営業
Generation of 3D-avatar animation from latent representations
Response Summarizer: An Automatic Summarization System of Call Center Convers...
DQN with Differentiable Memory Architectures
Anomaly Detection by ADGM / LVAE
Automatically Fusing Functions on CuPy
Imitation Learning for Autonomous Driving in TORCS
Ibis2016okanohara
実世界の人工知能@DeNA TechCon 2017
aiconf2017okanohara
IPAB2017 深層学習を使った新薬の探索から創造へ
Chainer, Cupy入門
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
深層学習ライブラリの環境問題Chainer Meetup2016 07-02
NVIDIA 更新情報: Tesla P100 PCIe/cuDNN 5.1
Chainerを使って細胞を数えてみた
On the benchmark of Chainer
俺のtensorが全然flowしないのでみんなchainer使おう by DEEPstation
マシンパーセプション研究におけるChainer活用事例
Ad

Similar to 3D Volumetric Data Generation with Generative Adversarial Networks (20)

PDF
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
PDF
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
PDF
IRJET- A Study of Generative Adversarial Networks in 3D Modelling
PDF
Generative adversarial networks
PDF
Tutorial on Theory and Application of Generative Adversarial Networks
PPTX
brief Introduction to Different Kinds of GANs
PDF
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
PDF
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
PDF
Generative models (Geek hub 2021 lecture)
PPTX
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
PDF
A Short Introduction to Generative Adversarial Networks
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
PDF
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
PDF
Generative adversarial networks
PDF
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
PDF
Generative adversarial networks
PDF
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
PDF
PDF
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
PDF
Generative modeling with Convolutional Neural Networks
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- A Study of Generative Adversarial Networks in 3D Modelling
Generative adversarial networks
Tutorial on Theory and Application of Generative Adversarial Networks
brief Introduction to Different Kinds of GANs
Vladislav Kolbasin “Introduction to Generative Adversarial Networks (GANs)”
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative models (Geek hub 2021 lecture)
A (Very) Gentle Introduction to Generative Adversarial Networks (a.k.a GANs)
A Short Introduction to Generative Adversarial Networks
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
Jakub Langr (University of Oxford) - Overview of Generative Adversarial Netwo...
Generative adversarial networks
TensorFlow London: Progressive Growing of GANs for increased stability, quali...
Generative adversarial networks
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
Generative Adversarial Networks (D2L5 Deep Learning for Speech and Language U...
Generative modeling with Convolutional Neural Networks

More from Preferred Networks (20)

PDF
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PDF
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
PDF
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
PDF
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
PDF
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
PDF
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
PDF
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
PDF
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
PDF
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
PPTX
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
PPTX
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PDF
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
PDF
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
PDF
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
PDF
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PDF
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
PDF
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
PDF
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
PDF
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
PDF
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50

Recently uploaded (20)

PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm

3D Volumetric Data Generation with Generative Adversarial Networks

  • 1. 3D Volumetric Data Generation with Generative Adversarial Networks Hiroyuki Vincent Yamazaki Keio University hvy@keio.jp Preferred Networks Summer Internship, 2016
  • 2. Background Generative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel images. However, these networks have not yet been extended to higher dimensions such as volumetric 3D data. Generated 3D model have various applications in entertainment and could be used as an alternative to existing procedural methods for creating graphics. This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models by applying 3 dimensional convolutions and deconvolutions* on voxel data. Goal • Extension of GANs to 3D volumetric data, training on a single class • Control the shapes of the generated models by e.g. interpolation 1. Introduction *Transposed Convolutions
  • 3. 2. Training Data 3D CAD models from ShapeNet [2] • Class: Chair • Instances: 4846 Preprocessing • Voxelization • 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3] • Normalization • No normalization is applied. Data is in range [0, 1] • Other • Remove bad samples and centre the models in the 
 space Training Data Volume DistributionMean 3D Model
  • 4. A GAN consists of a generator G and a discriminator D, in this case, both of them are represented as a feed forward neural network that are trained simultaneously. • Random noise z vectors sampled from a uniform or Gaussian distribution Loss • Softmax cross-entropies based on the predictions of D • Separate losses for G and D defined by the minimax game Optimal Discriminator Strategy Optimization • Adam for both G and D • Learning rate of G is larger than D 3. Generative Adversarial Network Random Noise Random Index Generator (Linear, Deconvolution, Batch Normalization, ReLU, Sigmoid) Discriminator (Convolution, Linear, Leaky ReLU) Training Data Generated 3D Model Real 3D Model Generated/Real Prediction See Appendix for the network architecture and Adam parameters min G max D V (G, D) = Ex⇠Pdata(x)[log D(x)] + Ez⇠Pz(z)[log(1 D(G(z)))] D(x) = pdata(x) pdata(x) + pG(x)
  • 5. Issues with GAN • Collapsing Generator • G outputs similar 3D models for different inputs • Non-semantic input z • Interpolation of z indicate on sharp edges in the latent space. Hence no way to control the shape of the output
 Improving the GAN • Avoid Generator from collapsing • Minibatch Discrimination [4] layer in D • Embed semantic meaning into the input [5] • With z, concatenate additional latent codes before feeding it to G • Additional loss based on mutual information reconstruction by D Random Noise + Latent Codes Random Index Generator (Linear, Deconvolution, Batch Normalization, ReLU, Sigmoid) Discriminator (Convolution, Linear, Leaky ReLU, Minibatch Discrimination) Training Data Generated 3D Model Real 3D Model Generated/Real Prediction Mutual Information Reconstruction
  • 6. Minibatch Discrimination Motivation Avoid generator from collapsing to a single point Idea Reproduce the diversity in the training data Minibatch Discrimination layer to D, before the generated/real prediction For each minibatch fed to this layer, compute the L1 distance between all input vectors Add this information to the given minibatch Mutual Information Reconstruction Motivation Embed semantic meanings in z Idea Maximize the mutual information being preserved for latent codes C that are passed through the networks Latent Codes, input to G • C = [C1, C2, C3] (Concatenations) • Categorical one-hot vector C1~Cat(K=2, p=0.5) • Continuous C2~Unif(-1, 1) • Continuous C3~Unif(-1, 1) Reconstruction, output from D • Categorical • Softmax Cross Entropy • Continuous • Assume a fixed variance and compute the Gaussian negative log-likelihood based on the mean. z c1, e.g. [0, 1] c2 c3 Softmax1 𝞵2 𝞵3 Minibatch Discrimination Layer Kernel … …
  • 7. • Minibatch size: 128 • Epochs: 100 4. Results Generated 3D Models *The blue models are their nearest models in the training dataset 3D Volume Distributions Chair-likeness Learned Distribution True DistributionLosses
  • 8. 5. Conclusions • GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions • Smaller datasets (sparse data) leads to worse looking models with noise • Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination • In many cases, D improves faster than G • Gradients back propagated through G saturates and training stops • Training not converging Future Work • Larger dataset with potentially multiple classes • Balance training between G and D • Heuristic • Stop updating D while it is too strong • Larger G, i.e. more parameters
  • 9. Reference [1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, . [2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, . [3] Patrick Min, Binvox, 3D Mesh Voxelizer, http://www.patrickmin.com/binvox/ [4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, . [5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, . Appendix Generator Discriminator Input ∈ R128+2+2 Input 32x32x32 3D voxel data FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU) FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLU DC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16 Output FC 2 (Generated/Real prediction) FC 256, BN, lReLU Output FC 2+2 (Mutual Information Reconstruction) Adam Optimizer Parameters Generator Discriminator ɑ 0.001 0.00005 β1 0.5 0.5 β1 0.999 0.999 GAN Architecture