ARCHITECTURAL CONDITIONING
FOR DISENTANGLEMENT OF OBJECT
IDENTITY AND POSTURE INFORMATION
저자 : Kazutoshi Sagi, Takahiro Toizumi & Yuzo Senda
Data Science Research Laboratories
NEC Corporation
https://openreview.net/forum?id=HkaYjG6Lf
정리 : 김홍배
일반적으로 다양한 pose에 대한 image를 획득하여 Networks을
Pose에 대하여 Invariance하게 Training  High cost approach
3D Object Identification Problem
Equivariance
Φ
Image(X)
Latent(Z) Z1 Z2
𝑇𝑔
2
𝑇𝑔
1
Φ
Transformation
X1 X2
Z2 = 𝑻 𝒈
𝟐
Z1 = 𝑻 𝒈
𝟐
Φ(X1) = Φ(𝑻 𝒈
𝟏
X1 )
: Invariance is special case of equivariance where 𝑇𝑔
2 is the identity.
X2 = 𝑇𝑔
1
X1
Z2 = 𝑇𝑔
2
Z1
: 주어진 Image의 pose변환에 대하여 Latent space상에서
명확한 변환관계를 찾을 수 있다면 ?
Z1 ≠ Z2 but keeps the relationship
Mapping
ft’n(Φ(·))
ROLLABLE LATENT SPACE
0
1
2
3
4
5
6
7
8
9
10
11
0 1 2 3 4 5 6 7 8 9 10 11
0
1
2
3
4
5
6
7
8
9
10
11
Image Latent vector
Shift by
circular permutation
Angular
rotation
본 연구에서 제시한 아이디어
3 4 5 6 7 8 9 10 11 0 1 2
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗
ROLLABLE LATENT SPACE
Image space에서의 pose 변경(Angular rotation)이 Latent vector의
Circular Permutation에 의한 Shift로 나타낼 수 있다면 ?
 2 space의 Mapping 관계를 명확하게 알 수 있으며
Training하지않은 다른 pose에서의 latent vector를 유추할 수 있다 !
 여기서는 Auto-Encoder를 살짝 바꿔서 강제로 학습을 시킨다
ROLLABLE LATENT SPACE
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗
여기서 Roll(Z, s)는 𝑍θ 𝑖
를 shift parameter s(각도 차) 만큼 Cyclic
permutation 시킨 후 Decoder쪽의 입력 latent vector로 준다.
Encoder쪽 입력에 𝑋θ 𝑖
를 Decoder 쪽 출력에는 회전한 𝑋θ 𝑗
를 준다.
ROLLABLE LATENT SPACE
𝑋θ 𝑖
𝑋θ 𝑗
𝑍θ 𝑖
𝑍θ 𝑗
Decoder의 출력이 𝑋θ 𝑗
와 근사하도록 Networks을 훈련시키면 됨.
Feature Augmentation by RLS
Classifier의 훈련 시 Image level에서의 augmentation이 필요없이 주어
진 image, 𝑋𝑖의 latent vector, 𝑍𝑖를 랜덤하게 shift 시킴으로서 Feature
level에서의 augmentation이 가능
EXPERIMENTAL RESULTS
- The encoder and the decoder just consist of one hidden fully connected
layer with ReLU activation for each.
- The number of the latent space dimentions is given as 24, which
corresponds to 2 dimensions in 12 viewing directions
Exp. 1 : DISENTANGLING 2D IMAGE ROTATION
Reconstructions of the test dataset. An input and reconstructions in given
rotation angles generated by
are presented from the left column of each row.
EXPERIMENTAL RESULTS
Exp. 2 : DISENTANGLING 3D OBJECT ROTATION
• 809 chair models are selected
• The first 500 models are used as a training set and the remaining 309 models
are used as a test set.
• Each chair model is rendered from 31 azimuth angles and 2 elevation angles
(20 and 30)
• A deep convolutional encoder-decoder architecture are used.
• The number of the latent space dimensions is given as 992, which corresponds
to 32 dimensions in 31 viewing directions.
EXPERIMENTAL RESULTS
Exp. 2 : DISENTANGLING 3D OBJECT ROTATION
(a): A network architecture used in the experiment of 3D object rotation.
(b): Reconstructions of the test dataset. An input and reconstructions in given
rotation angles are shown from the left column of each row.

More Related Content

PPTX
Brief intro : Invariance and Equivariance
PPTX
The world of loss function
PPTX
Anomaly detection using deep one class classifier
PPTX
Detailed Description on Cross Entropy Loss Function
PPTX
Anomaly Detection and Localization Using GAN and One-Class Classifier
PPTX
Machine learning applications in aerospace domain
PPTX
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
PDF
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Brief intro : Invariance and Equivariance
The world of loss function
Anomaly detection using deep one class classifier
Detailed Description on Cross Entropy Loss Function
Anomaly Detection and Localization Using GAN and One-Class Classifier
Machine learning applications in aerospace domain
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...

What's hot (20)

PDF
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
PDF
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
PDF
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Variational Autoencoders For Image Generation
PDF
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
PPTX
Variational Auto Encoder and the Math Behind
PDF
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
PDF
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
PDF
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
PPTX
Rabbit challenge 3 DNN Day2
PDF
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
PDF
The Perceptron (D1L2 Deep Learning for Speech and Language)
PDF
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
PDF
Digital signal and image processing FAQ
PDF
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
PPT
CS 354 Acceleration Structures
PDF
00463517b1e90c1e63000000
PDF
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
PDF
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
Variational Autoencoders For Image Generation
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Variational Auto Encoder and the Math Behind
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Rabbit challenge 3 DNN Day2
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
The Perceptron (D1L2 Deep Learning for Speech and Language)
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Digital signal and image processing FAQ
Deep Generative Models II (DLAI D10L1 2017 UPC Deep Learning for Artificial I...
CS 354 Acceleration Structures
00463517b1e90c1e63000000
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Ad

Similar to ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION (20)

PDF
A Beginner's Guide to Monocular Depth Estimation
PPTX
Scene Representation Networks(NIPS 2019)_OJung
PDF
TransNeRF
PPTX
PPT Image Analysis(IRDE, DRDO)
PDF
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
PDF
20150703.journal club
PDF
Passive network-redesign-ntua
PDF
VoxelNet
PDF
Fisheye Omnidirectional View in Autonomous Driving
PDF
Journey to structure from motion
PDF
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
PDF
Mask R-CNN
DOCX
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
PDF
Yolo v2 ai_tech_20190421
PPTX
Generating super resolution images using transformers
PDF
Weakly supervised semantic segmentation of 3D point cloud
PDF
I3602061067
PPTX
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shap...
PPT
Computer vision 3 4
PDF
A Beginner's Guide to Monocular Depth Estimation
Scene Representation Networks(NIPS 2019)_OJung
TransNeRF
PPT Image Analysis(IRDE, DRDO)
Theories and Engineering Technics of 2D-to-3D Back-Projection Problem
20150703.journal club
Passive network-redesign-ntua
VoxelNet
Fisheye Omnidirectional View in Autonomous Driving
Journey to structure from motion
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Mask R-CNN
AU QP Answer key NOv/Dec 2015 Computer Graphics 5 sem CSE
Yolo v2 ai_tech_20190421
Generating super resolution images using transformers
Weakly supervised semantic segmentation of 3D point cloud
I3602061067
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shap...
Computer vision 3 4
Ad

More from 홍배 김 (20)

PDF
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
PPTX
Gaussian processing
PPTX
Lecture Summary : Camera Projection
PPTX
Learning agile and dynamic motor skills for legged robots
PPTX
Robotics of Quadruped Robot
PPTX
Basics of Robotics
PPTX
Recurrent Neural Net의 이론과 설명
PPTX
Convolutional neural networks 이론과 응용
PPTX
Optimal real-time landing using DNN
PPTX
Anomaly Detection with GANs
PPTX
Focal loss의 응용(Detection & Classification)
PPTX
Convolution 종류 설명
PPTX
Learning by association
PPTX
알기쉬운 Variational autoencoder
PPTX
Binarized CNN on FPGA
PPTX
Visualizing data using t-SNE
PPTX
Normalization 방법
PPTX
Learning to remember rare events
PPTX
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
PPTX
Meta-Learning with Memory Augmented Neural Networks
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
Gaussian processing
Lecture Summary : Camera Projection
Learning agile and dynamic motor skills for legged robots
Robotics of Quadruped Robot
Basics of Robotics
Recurrent Neural Net의 이론과 설명
Convolutional neural networks 이론과 응용
Optimal real-time landing using DNN
Anomaly Detection with GANs
Focal loss의 응용(Detection & Classification)
Convolution 종류 설명
Learning by association
알기쉬운 Variational autoencoder
Binarized CNN on FPGA
Visualizing data using t-SNE
Normalization 방법
Learning to remember rare events
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
Meta-Learning with Memory Augmented Neural Networks

Recently uploaded (20)

PPTX
Modernising the Digital Integration Hub
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPT
What is a Computer? Input Devices /output devices
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
DOCX
search engine optimization ppt fir known well about this
PDF
Architecture types and enterprise applications.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
August Patch Tuesday
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Getting Started with Data Integration: FME Form 101
Modernising the Digital Integration Hub
sustainability-14-14877-v2.pddhzftheheeeee
What is a Computer? Input Devices /output devices
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
observCloud-Native Containerability and monitoring.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Benefits of Physical activity for teenagers.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
search engine optimization ppt fir known well about this
Architecture types and enterprise applications.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
Zenith AI: Advanced Artificial Intelligence
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
August Patch Tuesday
1 - Historical Antecedents, Social Consideration.pdf
Getting started with AI Agents and Multi-Agent Systems
O2C Customer Invoices to Receipt V15A.pptx
A comparative study of natural language inference in Swahili using monolingua...
Getting Started with Data Integration: FME Form 101

ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION

  • 1. ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE INFORMATION 저자 : Kazutoshi Sagi, Takahiro Toizumi & Yuzo Senda Data Science Research Laboratories NEC Corporation https://openreview.net/forum?id=HkaYjG6Lf 정리 : 김홍배
  • 2. 일반적으로 다양한 pose에 대한 image를 획득하여 Networks을 Pose에 대하여 Invariance하게 Training  High cost approach 3D Object Identification Problem
  • 3. Equivariance Φ Image(X) Latent(Z) Z1 Z2 𝑇𝑔 2 𝑇𝑔 1 Φ Transformation X1 X2 Z2 = 𝑻 𝒈 𝟐 Z1 = 𝑻 𝒈 𝟐 Φ(X1) = Φ(𝑻 𝒈 𝟏 X1 ) : Invariance is special case of equivariance where 𝑇𝑔 2 is the identity. X2 = 𝑇𝑔 1 X1 Z2 = 𝑇𝑔 2 Z1 : 주어진 Image의 pose변환에 대하여 Latent space상에서 명확한 변환관계를 찾을 수 있다면 ? Z1 ≠ Z2 but keeps the relationship Mapping ft’n(Φ(·))
  • 4. ROLLABLE LATENT SPACE 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 6 7 8 9 10 11 Image Latent vector Shift by circular permutation Angular rotation 본 연구에서 제시한 아이디어 3 4 5 6 7 8 9 10 11 0 1 2 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗
  • 5. ROLLABLE LATENT SPACE Image space에서의 pose 변경(Angular rotation)이 Latent vector의 Circular Permutation에 의한 Shift로 나타낼 수 있다면 ?  2 space의 Mapping 관계를 명확하게 알 수 있으며 Training하지않은 다른 pose에서의 latent vector를 유추할 수 있다 !  여기서는 Auto-Encoder를 살짝 바꿔서 강제로 학습을 시킨다
  • 6. ROLLABLE LATENT SPACE 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗 여기서 Roll(Z, s)는 𝑍θ 𝑖 를 shift parameter s(각도 차) 만큼 Cyclic permutation 시킨 후 Decoder쪽의 입력 latent vector로 준다. Encoder쪽 입력에 𝑋θ 𝑖 를 Decoder 쪽 출력에는 회전한 𝑋θ 𝑗 를 준다.
  • 7. ROLLABLE LATENT SPACE 𝑋θ 𝑖 𝑋θ 𝑗 𝑍θ 𝑖 𝑍θ 𝑗 Decoder의 출력이 𝑋θ 𝑗 와 근사하도록 Networks을 훈련시키면 됨.
  • 8. Feature Augmentation by RLS Classifier의 훈련 시 Image level에서의 augmentation이 필요없이 주어 진 image, 𝑋𝑖의 latent vector, 𝑍𝑖를 랜덤하게 shift 시킴으로서 Feature level에서의 augmentation이 가능
  • 9. EXPERIMENTAL RESULTS - The encoder and the decoder just consist of one hidden fully connected layer with ReLU activation for each. - The number of the latent space dimentions is given as 24, which corresponds to 2 dimensions in 12 viewing directions Exp. 1 : DISENTANGLING 2D IMAGE ROTATION Reconstructions of the test dataset. An input and reconstructions in given rotation angles generated by are presented from the left column of each row.
  • 10. EXPERIMENTAL RESULTS Exp. 2 : DISENTANGLING 3D OBJECT ROTATION • 809 chair models are selected • The first 500 models are used as a training set and the remaining 309 models are used as a test set. • Each chair model is rendered from 31 azimuth angles and 2 elevation angles (20 and 30) • A deep convolutional encoder-decoder architecture are used. • The number of the latent space dimensions is given as 992, which corresponds to 32 dimensions in 31 viewing directions.
  • 11. EXPERIMENTAL RESULTS Exp. 2 : DISENTANGLING 3D OBJECT ROTATION (a): A network architecture used in the experiment of 3D object rotation. (b): Reconstructions of the test dataset. An input and reconstructions in given rotation angles are shown from the left column of each row.