Image-to-Image Translation with
Conditional Adversarial Nets (Pix2Pix)
&
Perceptual Adversarial Networks for
Image-to-Image Transformation (PAN)
2017/10/2 DLHacks
Otsubo
Topic : image-to-image “translation”
1
Info
Pix2Pix [CVPR2017]
•  Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros
-  iGAN [ECCV 2016]
-  interactive-deep-colorization [SIGGRAPH 2017]
-  Context-Encoder [CVPR 2016]
-  Image Quilting [SIGGRAPH 2001]
-  Texture Synthesis by Non-parametric Sampling [ICCV 1999]
•  University of California
•  178 citations
PAN [arXiv2017]
•  Chaoyue Wang, Chang Xu, Chaohui Wang, Dacheng Tao
•  University of Technology Sydney, The University of Sydney,
Universite Paris-Est
2
Background
•  Many tasks are regarded as “translation”
from input image to output image
-  Diverse methods exist for them
3	
Is	there	single	framework	to	achieve	them?
Overview
Pix2Pix
•  General-purpose solution to image-to-image
translation using single framework
-  Single framework: conditional GAN (cGAN)
PAN
•  Pix2Pix - (per-pixel loss)
+ (perceptual adversarial loss)
4
Naive Implementation : U-Net (①)
5	
①per-pixel
loss (L1/L2)
Pix2Pix (①+②)
6	
②adversarial
loss
Pix2Pix’s loss (①+②)
7	
②
②
①
PAN (②+③)
8	
③perceptual
adversarial loss
PAN’s loss (②+③)
9	
L1 norm
②
②
③
③
m : constant
Example1 : Image De-Raining
•  Removing rain from single images via a deep
detail network [Fu, CVPR2017]
•  ID-GAN (cGAN) [Zhang, arXiv2017]
-  per-pixel loss
-  adversarial loss
-  pre-trained VGG’s
perceptual loss
10	
Input Output
(Ground Truth)
Example1 : Image De-Raining
•  Removing rain from single images via a deep
detail network [Fu, CVPR2017]
•  ID-GAN (cGAN) [Zhang, arXiv2017]
-  per-pixel loss
-  adversarial loss
-  pre-trained VGG’s
perceptual loss
11	
Input Output
(Ground Truth)
(cf. PAN uses discriminator’s
perceptual loss)
Example2 : Image Inpainting
•  Globally and Locally Consistent Image
Completion [Iizuka, SIGGRAPH2017]
•  Context Encoders (cGAN) [Pathak, CVPR2016]
-  per-pixel loss
-  adversarial loss
12	
Input Output
(Ground Truth)
Example3 : Semantic Segmentation
Cityscape / Pascal VOC
•  DeepLabv3 [Chen, arXiv2017]
•  PSPNet [Zhao, CVPR2017]
http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?
cls=mean&challengeid=11&compid=6
Cell Tracking / CREMI
•  Learned Watershed
[Wolf, ICCV2017]
•  U-Net
[Ronneberger, MICCAI2015]
http://www.codesolorzano.com/Challenges/CTC/Welcome.html
13	
Input Output
(Ground Truth)
Result1 : Image De-Raining
14	
(≒pix2pix)→
(≒pix2pix)
Result2 : Image Inpainting
15
Result3 : Semantic Segmentation
16
Discussion
vs. No perceptual loss (Pix2Pix)
-  Perceptual loss enables D to detect more
discrepancy between True/False images
vs. Pre-trained VGG perceptual loss (ID-GAN)
-  VGG features tend to focus on content
-  PAN features tend to focus on discrepancy
-  PAN’s loss leads to avoid adversarial
examples [Goodfellow, ICLR2015] (?)
17	
Why is perceptual adversarial loss so efficient?
Minor Difference
•  Pix2Pix uses Patch-GAN
-  Small size(70×70) patch-discriminator
-  Final output of D is average of
patch-discriminator’s responses
(convolutionally applied)
18
To Do
•  Implement
1.  Pix2Pix (Patch Discriminator)
2.  PAN (Patch Discriminator)
3.  PAN (Normal Discriminator)
Wang et al. might compare 1 with 3.
19
20
Implementation
2017/10/17 DLHacks
Otsubo
My Implementation
•  https://github.com/DLHacks/pix2pix_PAN
•  pix2pix
- https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
•  PAN
-  per-pixel loss à perceptual adversarial loss
-  not same as paper’s original architecture
-  num of parameters is same as pix2pix
22
My Experiments
•  Facade (label à picture)
•  Map (picture à Google map)
•  Cityscape (picture à label)
23
Result (Facade pix2pix)
24
Result (Facade PAN)
25
Result (Map pix2pix)
26
Result (Map PAN)
27
Result (Cityscape pix2pix)
28
Result (Cityscape PAN)
29
Result (PSNR[dB])
30
Discussion – Why pix2pix > PAN?
•  per-pixel loss is needed?
•  patch discriminator is not suited for PAN?
•  positive margin m
•  (bad pix2pix implementation in PAN’s paper…?)
31

More Related Content

PDF
【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)
PDF
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
PPTX
[DL輪読会]Objects as Points
PDF
第六回全日本コンピュータビジョン勉強会資料 UniT (旧題: Transformer is all you need)
PDF
[DL輪読会]“Meta-Learning for Online Update of Recommender Systems. (AAAI 2022)”
PPTX
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
PDF
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
PPTX
【DL輪読会】BlobGAN: Spatially Disentangled Scene Representations
【DL輪読会】GAN-Supervised Dense Visual Alignment (CVPR 2022)
[DL輪読会]Learning Transferable Visual Models From Natural Language Supervision
[DL輪読会]Objects as Points
第六回全日本コンピュータビジョン勉強会資料 UniT (旧題: Transformer is all you need)
[DL輪読会]“Meta-Learning for Online Update of Recommender Systems. (AAAI 2022)”
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
ICLR2019 読み会in京都 ICLRから読み取るFeature Disentangleの研究動向
【DL輪読会】BlobGAN: Spatially Disentangled Scene Representations

What's hot (20)

PDF
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
PDF
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
PPTX
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
PPTX
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
PDF
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
PPTX
Curriculum Learning (関東CV勉強会)
PDF
敵対的生成ネットワーク(GAN)
PDF
最近のSingle Shot系の物体検出のアーキテクチャまとめ
PPTX
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
PDF
[論文紹介] Convolutional Neural Network(CNN)による超解像
PPTX
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
PPTX
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
PDF
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
PPTX
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
PPTX
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
PDF
SPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
PDF
Transformer 動向調査 in 画像認識(修正版)
PDF
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
ODP
Attention-Guided GANについて
PPTX
【DL輪読会】Factory: Fast Contact for Robotic Assembly
東北大学 先端技術の基礎と実践_深層学習による画像認識とデータの話_菊池悠太
[DL輪読会]High-Quality Self-Supervised Deep Image Denoising
ArcFace: Additive Angular Margin Loss for Deep Face Recognition
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
文献紹介:TSM: Temporal Shift Module for Efficient Video Understanding
Curriculum Learning (関東CV勉強会)
敵対的生成ネットワーク(GAN)
最近のSingle Shot系の物体検出のアーキテクチャまとめ
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
[論文紹介] Convolutional Neural Network(CNN)による超解像
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
[DL輪読会]Neural Radiance Flow for 4D View Synthesis and Video Processing (NeRF...
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
Active Convolution, Deformable Convolution ―形状・スケールを学習可能なConvolution―
SPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
Transformer 動向調査 in 画像認識(修正版)
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
Attention-Guided GANについて
【DL輪読会】Factory: Fast Contact for Robotic Assembly

Viewers also liked (10)

PPTX
[DL Hacks 実装]The Conditional Analogy GAN: Swapping Fashion Articles on People...
PDF
[DLHacks 実装]Neural Machine Translation in Linear Time
PDF
[DL輪読会]Training RNNs as Fast as CNNs
PDF
[DL輪読会]Energy-based generative adversarial networks
PDF
[DL輪読会]Deep Recurrent Generative Decoder For Abstractive Text Summarization(E...
PDF
[DLHacks] DLHacks説明資料
PDF
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
PDF
[DL輪読会]Learning to Act by Predicting the Future
PDF
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
PDF
[DL輪読会]Deep Direct Reinforcement Learning for Financial Signal Representation...
[DL Hacks 実装]The Conditional Analogy GAN: Swapping Fashion Articles on People...
[DLHacks 実装]Neural Machine Translation in Linear Time
[DL輪読会]Training RNNs as Fast as CNNs
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Deep Recurrent Generative Decoder For Abstractive Text Summarization(E...
[DLHacks] DLHacks説明資料
[DL輪読会]Opening the Black Box of Deep Neural Networks via Information
[DL輪読会]Learning to Act by Predicting the Future
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DL輪読会]Deep Direct Reinforcement Learning for Financial Signal Representation...

Similar to [DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation (20)

PPTX
Obscenity Detection in Images
PDF
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
PDF
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
PDF
Evaluation of conditional images synthesis: generating a photorealistic image...
PDF
Video to Video Translation CGAN
PDF
PDF
3d scanning pipeline
PDF
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
PDF
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
PPTX
Semantic segmentation with Convolutional Neural Network Approaches
PDF
MediaEval 2017 - Satellite Task: Visual and textual analysis of social media ...
PDF
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PPT
Interactive Stereoscopic Rendering for Non-Planar Projections (GRAPP 2009)
PPTX
Transformer in Vision
PDF
Fast Feature Pyramids for Object Detection
PDF
Deep Learning applications
PDF
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
PDF
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
DOCX
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
Obscenity Detection in Images
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Evaluation of conditional images synthesis: generating a photorealistic image...
Video to Video Translation CGAN
3d scanning pipeline
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...
Semantic segmentation with Convolutional Neural Network Approaches
MediaEval 2017 - Satellite Task: Visual and textual analysis of social media ...
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Interactive Stereoscopic Rendering for Non-Planar Projections (GRAPP 2009)
Transformer in Vision
Fast Feature Pyramids for Object Detection
Deep Learning applications
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...

More from Deep Learning JP (20)

PPTX
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
PPTX
【DL輪読会】事前学習用データセットについて
PPTX
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
PPTX
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
PPTX
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
PPTX
【DL輪読会】マルチモーダル LLM
PDF
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
PPTX
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
PDF
【DL輪読会】Can Neural Network Memorization Be Localized?
PPTX
【DL輪読会】Hopfield network 関連研究について
PPTX
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
PDF
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
PDF
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
PPTX
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
PPTX
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
PDF
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
PPTX
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
PDF
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
PDF
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
PPTX
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】事前学習用データセットについて
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】マルチモーダル LLM
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Hopfield network 関連研究について
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...

Recently uploaded (20)

PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
STKI Israel Market Study 2025 version august
PDF
Architecture types and enterprise applications.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Five Habits of High-Impact Board Members
PDF
Hybrid model detection and classification of lung cancer
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
August Patch Tuesday
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Assigned Numbers - 2025 - Bluetooth® Document
observCloud-Native Containerability and monitoring.pptx
WOOl fibre morphology and structure.pdf for textiles
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
STKI Israel Market Study 2025 version august
Architecture types and enterprise applications.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Five Habits of High-Impact Board Members
Hybrid model detection and classification of lung cancer
Getting started with AI Agents and Multi-Agent Systems
Benefits of Physical activity for teenagers.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Enhancing emotion recognition model for a student engagement use case through...
1 - Historical Antecedents, Social Consideration.pdf
August Patch Tuesday
A novel scalable deep ensemble learning framework for big data classification...
A review of recent deep learning applications in wood surface defect identifi...
Web Crawler for Trend Tracking Gen Z Insights.pptx

[DLHacks 実装]Perceptual Adversarial Networks for Image-to-Image Transformation