Advances in Visual Quality Restoration with Generative Adversarial Networks

Advances in Visual Quality Restoration
with Generative Adversarial Networks
Leonardo Galteri - PhD
University of Florence, MICC

*Amazon Cloud Outbound Traffic
-1,125,000 $
Why does this happens? (in 2019…)
• First Episode of Season 2 had 15M viewers
• Stream this at reasonable quality at a cost of
0,020 $/GB*

Working on the Encoder
ICPR’18
• Adaptive video coding approach

Working on the Encoder
ICPR’18
X.264
• Adaptive video coding approach

Deep CNN
Is there Another Way?

Improving Compressed Images
Given an uncompressed frame 𝑥𝐻𝑄
𝑥𝐿𝑄 = 𝒞(𝑥𝐻𝑄; 𝜃)
We want to learn a function
𝐺(𝑥𝐿𝑄) ≈ 𝒞−1(𝑥𝐻𝑄; 𝜃)
where 𝜃 are codec parameters.
𝑥𝐿𝑄 𝐺(𝑥𝐿𝑄)
ICCV’17

A Deep Residual Network for
Reconstruction
• We use strided convolution to reduce feature map size.
• We avoid checkerboard artifacts with NN upsampling followed by 2
more convolutional layers
• Trained on patches 128x128 pixel extracted from MS-COCO. ICCV’17

Limitations of MSE and SSIM Losses
JPEG SSIM Loss Original
• SSIM and MSE losses are able to reduce effectively compression artefacts.
• However, reconstructions appear blurry and there are many missing details with respect to the
uncompressed version of the image.
ICCV’17

Generative Adversarial Network
G
Generator
D
Discriminator
Image
REAL or
FAKE?
High Quality
Images (REAL)
Low Quality
Images
Restored
Images (FAKE)
D is trained to tell apart
real from reconstructed
images
G is trained to fool D
ICCV’17

The Sub-Patch Discriminator
• 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent
input sub-patches and processed by the discriminator.
• The discriminator is trained with a binary cross-entropy loss over all the sub-patches.
ICCV’17

Generator Loss
ICCV’17
• The new objective of Generator is:
ℒ𝐺 = ℒ𝑃 + 𝜆ℒ𝐴𝐷𝑉
where: ℒ𝑃 = 𝜙 𝐼𝐻𝑄
𝑥,𝑦 − 𝜙 𝐼𝑅𝑄
𝑥,𝑦
2
is called perceptual loss, a MSE loss in VGG19 feature space, and:
ℒ𝐴𝐷𝑉 = − log 𝐷 𝐼𝑅𝑄|𝐼𝐿𝑄
is the adversarial loss, which measures how good is the
fooling the discriminator.

Effect of Sub-Patch Discriminator
• This technique allows to reduce the mosquito noise present in reconstructions.
W/o Sub-Patch With Sub-Patch ICCV’17

Predicting QF
• We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators
• We use the most appropriateGenerator to restore the image
QF
Predictor
𝐺(𝜃 = 𝜃0; 𝑥)
𝐺(𝜃 = 𝜃𝑁; 𝑥)
…
…
Model
Switcher
TMM’19
𝐺(𝜃 = 𝜃𝑛; 𝑥)
𝑥

Quality Prediction Results
TMM’19

Qualitative Results
JPEG AR-CNN GAN ORIGINAL
TMM’19

Subjective Evaluation
• DSIS setup test image compared to original
and similarity scored in 0-100
• We compare SSIM Loss vs Adversarial
Training using the same Generator
architecture.
• Subjects have a strong preference forGAN
restored images over SSIM ones.
Method MOS Std. Dev.
SSIM 49.51 22.72
GAN 68.32 20.75
TMM’19

Object Detection Results
Class
GAN
AP gain
@QF 20
Dog +18.6
Cat +16.6
Sheep +14.3
Cow +12.5
• Use an object detector, Faster R-CNN to assess the visual quality of restored images
• Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent
reconstructions.
• Large increase in detector
performance
• Largest gainers are
deformable ’furry’ objects
such as animals
TMM’19

Enters MobileNetV2
• MobileNetV2 was originally proposed to reduce computational burden of CNNs
• Depthwise separable convolutions drop-in replacement for convolutional layers
• Inverted residual blocks better propagate gradients across layers but more memory
efficient
Sandler CVPR’18
Residual Block Inverted Residual Block

A Deep Residual Network for Reconstruction
• Keep the Generator identical except for the Inverted Residual Blocks!
•
Train on the small DIV2K dataset
• Augmentation: resizing 256, 384 and 512; random crops of 224x224;
mirror flipping.
Inverted
Residual
Block
Inverted
Residual
Block
CAIP’19

Qualitative Results
ICCV’17 RAW
Bit/Pixel 0.146
FPS 4 onTitan Xp GPU
Bit/Pixel 12
FPS -
All videos 720p
CAIP’19

Qualitative Results
Bit/Pixel 0.146
Very Fast RAW
Bit/Pixel 12
FPS -
CAIP’19
All videos 720p

Qualitative Results
Bit/Pixel 0.146
Bit/Pixel 12
FPS -
Fast RAW
CAIP’19
All videos 720p

RAW
Bit/Pixel 0.0570
FPS 3 onV100 GPU
Bit/Pixel 12
FPS -
Wave.ONE
Qualitative Results
CAIP’19
All videos 720p

Qualitative Results
Bit/Pixel 0.0570
FPS 1,6 onTitan Xp GPU*
Bit/Pixel 0.146 (x3)
FPS 20 onTitan Xp GPU (x12)
Fast
Wave.ONE
All videos 720p
CAIP’19

No-Reference Evaluation
• According to NIQE and BRISQUE value GAN images as ’more natural’ the the
original ones!
• VIIDEO is likely penalizing reconstruction for lack of temporal coherence
CAIP’19

• GANs are well known to work well when the distribution is simpler
• Faces are possibly the most interesting object we are willing to transmit
• Here what we can do with a severe degradation and a specialized GAN
Specialized Artifact Removal
ACM MM’19 Best Demo

Specialized Artifact Removal
ACM MM’19 Best Demo
• GANs are well known to work well when the distribution is simpler
• Faces are possibly the most interesting object we are willing to transmit
• Here what we can do with a severe degradation and a specialized GAN

Original frame
Semantic mask
Transmitter
Semantic Coding + GAN
ACM MM’20

Semantic Coding + GAN
ACM MM’20
Received frame
Receiver

Semantic Segmentation
ACM MM’20
• Use BiSeNet to label each pixel
detected as face or neck as
foreground, the remainder as
background.

ACM MM’20
+
+
Generator Loss
𝑀
𝑀
face
background

ACM MM’20
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 5 10 15 20 25 30 35 40 45
LPIPS
Our Method Baseline
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40 45
BRISQUE
Our Method Baseline
~ 30% improvement ~ 55% improvement

RESULTS – QUALITATIVE 1 min
COMPRESSED

COMPRESSED +
SALIENCY

COMPRESSED +
SALIENCY + GAN

Evaluation using Language
• Task: “evaluate the weighted combination of all of the visually significant
attributes of an image”
Quality ACM MM ASIA’21
Best Paper Award

A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
wearing a red tie
A statue of a woman
wearing a tie
wearing a red tie
A statue of a woman
wearing a tie
wearing a red tie
JPEG GAN HQ
Evaluation using Language
Use image captioning algorithm to evaluate the fine semantics of the image
ACM MM ASIA’21
Best Paper Award

Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award

Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Language Model
Language
Metric
0.752
Language Model Pseudo Ground Truth Caption
Predicted Caption
Input Image
Reference
Image
• “GT” caption can be generated from the reference imagein case captions
are not available (the usual case)

Evaluating Enhanced Images
ACM MM ASIA’21
Best Paper Award
• Our approach scores higher GAN
reconstructions (REC) of JPEG
compressed images across a
wide range of QFs
• Results are consistent for all
captioning metrics across all
qualities/enhancements

Changing the Captioning Model
ACM MM ASIA’21
Best Paper Award
• Using a better captioning
model increases the
correlation with MOS
• Visual features are shared
among [1] and [2]
[1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image
captioning. In Proc. of CVPR 2020
[2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-
down attention for image captioning and visual question answering. In Proc. of CVPR 2018
[1]

JPEG GAN

A man riding a wave on a surfboard in the ocea

A couple of people sitting next to a christmas tree.
Qualitative Analysis
ACM MM ASIA’21
Best Paper Award

What’s Next for Restoration?
• Innovative technologies for restoration replacingGANs
• Smaller architectures and new ways to train them
• Blind restoration

References
L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithms’, ICPR 2018
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removal’,
IEEE ICCV 2017
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Deep Universal Generative Adversarial Compression Artifact
Removal’, IEEE TMM 2019
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Towards Real-Time Image Enhancement GANs’, CAIP 2019
L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, ‘Fast Video Quality Enchancement Using GANs’, ACM
MM Best Demo 2019
L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, ‘Increasing Video Perceptual Quality with GANs and
Semantic Coding’, ACM MM 2020
L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, ‘Language Based Image Quality Enhancement’, ACM
MM Asia Best Paper Award 2021

Advances in Visual Quality Restoration with Generative Adversarial Networks

More Related Content

Similar to Advances in Visual Quality Restoration with Generative Adversarial Networks (20)

More from Förderverein Technische Fakultät (20)

Recently uploaded (20)

Advances in Visual Quality Restoration with Generative Adversarial Networks

Editor's Notes