SlideShare a Scribd company logo
Advances in Visual Quality Restoration
with Generative Adversarial Networks
Leonardo Galteri - PhD
University of Florence, MICC
Advances in Visual Quality Restoration with Generative Adversarial Networks
*Amazon Cloud Outbound Traffic
-1,125,000 $
Why does this happens? (in 2019…)
• First Episode of Season 2 had 15M viewers
• Stream this at reasonable quality at a cost of
0,020 $/GB*
Working on the Encoder
ICPR’18
• Adaptive video coding approach
Working on the Encoder
ICPR’18
X.264
• Adaptive video coding approach
Deep CNN
Is there Another Way?
Improving Compressed Images
Given an uncompressed frame 𝑥𝐻𝑄
𝑥𝐿𝑄 = 𝒞(𝑥𝐻𝑄; 𝜃)
We want to learn a function
𝐺(𝑥𝐿𝑄) ≈ 𝒞−1(𝑥𝐻𝑄; 𝜃)
where 𝜃 are codec parameters.
𝑥𝐿𝑄 𝐺(𝑥𝐿𝑄)
ICCV’17
A Deep Residual Network for
Reconstruction
• We use strided convolution to reduce feature map size.
• We avoid checkerboard artifacts with NN upsampling followed by 2
more convolutional layers
• Trained on patches 128x128 pixel extracted from MS-COCO. ICCV’17
Limitations of MSE and SSIM Losses
JPEG SSIM Loss Original
• SSIM and MSE losses are able to reduce effectively compression artefacts.
• However, reconstructions appear blurry and there are many missing details with respect to the
uncompressed version of the image.
ICCV’17
Generative Adversarial Network
G
Generator
D
Discriminator
Image
REAL or
FAKE?
High Quality
Images (REAL)
Low Quality
Images
Restored
Images (FAKE)
D is trained to tell apart
real from reconstructed
images
G is trained to fool D
ICCV’17
The Sub-Patch Discriminator
• 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent
input sub-patches and processed by the discriminator.
• The discriminator is trained with a binary cross-entropy loss over all the sub-patches.
ICCV’17
Generator Loss
ICCV’17
• The new objective of Generator is:
ℒ𝐺 = ℒ𝑃 + 𝜆ℒ𝐴𝐷𝑉
where: ℒ𝑃 = 𝜙 𝐼𝐻𝑄
𝑥,𝑦 − 𝜙 𝐼𝑅𝑄
𝑥,𝑦
2
is called perceptual loss, a MSE loss in VGG19 feature space, and:
ℒ𝐴𝐷𝑉 = − log 𝐷 𝐼𝑅𝑄|𝐼𝐿𝑄
is the adversarial loss, which measures how good is the
fooling the discriminator.
Effect of Sub-Patch Discriminator
• This technique allows to reduce the mosquito noise present in reconstructions.
W/o Sub-Patch With Sub-Patch ICCV’17
Predicting QF
• We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators
• We use the most appropriateGenerator to restore the image
QF
Predictor
𝐺(𝜃 = 𝜃0; 𝑥)
𝐺(𝜃 = 𝜃𝑁; 𝑥)
…
…
Model
Switcher
TMM’19
𝐺(𝜃 = 𝜃𝑛; 𝑥)
𝑥
Quality Prediction Results
TMM’19
Qualitative Results
JPEG AR-CNN GAN ORIGINAL
TMM’19
Subjective Evaluation
• DSIS setup test image compared to original
and similarity scored in 0-100
• We compare SSIM Loss vs Adversarial
Training using the same Generator
architecture.
• Subjects have a strong preference forGAN
restored images over SSIM ones.
Method MOS Std. Dev.
SSIM 49.51 22.72
GAN 68.32 20.75
TMM’19
Object Detection Results
Class
GAN
AP gain
@QF 20
Dog +18.6
Cat +16.6
Sheep +14.3
Cow +12.5
• Use an object detector, Faster R-CNN to assess the visual quality of restored images
• Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent
reconstructions.
• Large increase in detector
performance
• Largest gainers are
deformable ’furry’ objects
such as animals
TMM’19
Enters MobileNetV2
• MobileNetV2 was originally proposed to reduce computational burden of CNNs
• Depthwise separable convolutions drop-in replacement for convolutional layers
• Inverted residual blocks better propagate gradients across layers but more memory
efficient
Sandler CVPR’18
Residual Block Inverted Residual Block
A Deep Residual Network for Reconstruction
• Keep the Generator identical except for the Inverted Residual Blocks!
• 
Train on the small DIV2K dataset
• Augmentation: resizing 256, 384 and 512; random crops of 224x224;
mirror flipping.
Inverted
Residual
Block
Inverted
Residual
Block
CAIP’19
Qualitative Results
ICCV’17 RAW
Bit/Pixel 0.146
FPS 4 onTitan Xp GPU
Bit/Pixel 12
FPS -
All videos 720p
CAIP’19
Qualitative Results
Bit/Pixel 0.146
FPS 42 onTitan Xp GPU
Very Fast RAW
Bit/Pixel 12
FPS -
CAIP’19
All videos 720p
Qualitative Results
Bit/Pixel 0.146
FPS 20 onTitan Xp GPU
Bit/Pixel 12
FPS -
Fast RAW
CAIP’19
All videos 720p
RAW
Bit/Pixel 0.0570
FPS 3 onV100 GPU
Bit/Pixel 12
FPS -
Wave.ONE
Qualitative Results
CAIP’19
All videos 720p
Qualitative Results
Bit/Pixel 0.0570
FPS 1,6 onTitan Xp GPU*
Bit/Pixel 0.146 (x3)
FPS 20 onTitan Xp GPU (x12)
Fast
Wave.ONE
All videos 720p
CAIP’19
No-Reference Evaluation
• According to NIQE and BRISQUE value GAN images as ’more natural’ the the
original ones!
• VIIDEO is likely penalizing reconstruction for lack of temporal coherence
CAIP’19
• GANs are well known to work well when the distribution is simpler
• Faces are possibly the most interesting object we are willing to transmit
• Here what we can do with a severe degradation and a specialized GAN
Specialized Artifact Removal
ACM MM’19 Best Demo
Specialized Artifact Removal
ACM MM’19 Best Demo
• GANs are well known to work well when the distribution is simpler
• Faces are possibly the most interesting object we are willing to transmit
• Here what we can do with a severe degradation and a specialized GAN
Advances in Visual Quality Restoration with Generative Adversarial Networks
Advances in Visual Quality Restoration with Generative Adversarial Networks
ACM MM’19 Best Demo
Original frame
Semantic mask
Transmitter
Semantic Coding + GAN
ACM MM’20
Semantic Coding + GAN
ACM MM’20
Received frame
Receiver
Semantic Segmentation
ACM MM’20
• Use BiSeNet to label each pixel
detected as face or neck as
foreground, the remainder as
background.
Semantic Segmentation
ACM MM’20
+
+
Generator Loss
𝑀
𝑀
face
background
Semantic Segmentation
ACM MM’20
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0 5 10 15 20 25 30 35 40 45
LPIPS
Our Method Baseline
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40 45
BRISQUE
Our Method Baseline
~ 30% improvement ~ 55% improvement
RESULTS – QUALITATIVE 1 min
COMPRESSED
RESULTS – QUALITATIVE 1 min
COMPRESSED +
SALIENCY
RESULTS – QUALITATIVE 1 min
COMPRESSED +
SALIENCY + GAN
Evaluation using Language
• Task: “evaluate the weighted combination of all of the visually significant
attributes of an image”
Quality ACM MM ASIA’21
Best Paper Award
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
A statue of a woman
wearing a christmas tie
A brown and white dog
wearing a tie
A brown and white dog
wearing a red tie
JPEG GAN HQ
Evaluation using Language
Use image captioning algorithm to evaluate the fine semantics of the image
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Assesment Methodolgy
ACM MM ASIA’21
Best Paper Award
Language Model
Language
Metric
0.752
Language Model Pseudo Ground Truth Caption
Predicted Caption
Input Image
Reference
Image
• “GT” caption can be generated from the reference imagein case captions
are not available (the usual case)
Evaluating Enhanced Images
ACM MM ASIA’21
Best Paper Award
• Our approach scores higher GAN
reconstructions (REC) of JPEG
compressed images across a
wide range of QFs
• Results are consistent for all
captioning metrics across all
qualities/enhancements
Changing the Captioning Model
ACM MM ASIA’21
Best Paper Award
• Using a better captioning
model increases the
correlation with MOS
• Visual features are shared
among [1] and [2]
[1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image
captioning. In Proc. of CVPR 2020
[2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-
down attention for image captioning and visual question answering. In Proc. of CVPR 2018
[1]
JPEG GAN

A man riding a wave on a surfboard in the ocea

A couple of people sitting next to a christmas tree.
Qualitative Analysis
ACM MM ASIA’21
Best Paper Award
What’s Next for Restoration?
• Innovative technologies for restoration replacingGANs
• Smaller architectures and new ways to train them
• Blind restoration
References
L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithms’, ICPR 2018
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removal’,
IEEE ICCV 2017
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Deep Universal Generative Adversarial Compression Artifact
Removal’, IEEE TMM 2019
L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Towards Real-Time Image Enhancement GANs’, CAIP 2019
L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, ‘Fast Video Quality Enchancement Using GANs’, ACM
MM Best Demo 2019
L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, ‘Increasing Video Perceptual Quality with GANs and
Semantic Coding’, ACM MM 2020
L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, ‘Language Based Image Quality Enhancement’, ACM
MM Asia Best Paper Award 2021

More Related Content

PDF
1-s2.0-S09252312240168zádgfsdgdfg01-main.pdf
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
PDF
[212]big models without big data using domain specific deep networks in data-...
PDF
Developing and comparing an encoding system using vector quantization &
PDF
Developing and comparing an encoding system using vector quantization &
PPTX
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
PDF
Seeing what a gan cannot generate: paper review
PDF
Evaluation of conditional images synthesis: generating a photorealistic image...
1-s2.0-S09252312240168zádgfsdgdfg01-main.pdf
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
[212]big models without big data using domain specific deep networks in data-...
Developing and comparing an encoding system using vector quantization &
Developing and comparing an encoding system using vector quantization &
Vladyslav Fliahin: Applications of Gen AI in CV (UA)
Seeing what a gan cannot generate: paper review
Evaluation of conditional images synthesis: generating a photorealistic image...

Similar to Advances in Visual Quality Restoration with Generative Adversarial Networks (20)

PPTX
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
PDF
[CVPR2020] Simple but effective image enhancement techniques
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
PDF
Modeling perceptual similarity and shift invariance in deep networks
PDF
Modelling Framework of a Neural Object Recognition
PDF
Unsupervised Computer Vision: The Current State of the Art
PDF
Photo Editing And Sharing Web Application With AI- Assisted Features
PDF
One Perceptron to Rule Them All: Language and Vision
PDF
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
PPTX
A pipelined approach to deal with image distortion in computer vision - BRACI...
PDF
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
PDF
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
PDF
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PDF
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
PPTX
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visua...
PDF
How AI research is enabling next-gen codecs
PDF
Deep learning for image video processing
PDF
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
PPTX
brief Introduction to Different Kinds of GANs
Paper discussion:Video-to-Video Synthesis (NIPS 2018)
[CVPR2020] Simple but effective image enhancement techniques
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
stable_diffusion_a_tutorial, How stable_diffusion works, build stable_diffusi...
Modeling perceptual similarity and shift invariance in deep networks
Modelling Framework of a Neural Object Recognition
Unsupervised Computer Vision: The Current State of the Art
Photo Editing And Sharing Web Application With AI- Assisted Features
One Perceptron to Rule Them All: Language and Vision
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
A pipelined approach to deal with image distortion in computer vision - BRACI...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visua...
How AI research is enabling next-gen codecs
Deep learning for image video processing
MLIP - Chapter 6 - Generation, Super-Resolution, Style transfer
brief Introduction to Different Kinds of GANs
Ad

More from Förderverein Technische Fakultät (20)

PDF
„Die Klimakrise ist da! Wo führt sie hin?“
PDF
Constrained text generation to measure reading performance: A new approach ba...
PPTX
Greening local government units: Current status and required competences
PDF
Supervisory control of business processes
PPTX
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
PDF
A Game of Chess is Like a Swordfight.pdf
PDF
From Mind to Meta.pdf
PDF
Miniatures Design for Tabletop Games.pdf
PPTX
Distributed Systems in the Post-Moore Era.pptx
PPTX
Don't Treat the Symptom, Find the Cause!.pptx
PDF
Engineering Serverless Workflow Applications in Federated FaaS.pdf
PDF
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
PDF
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
PDF
Towards a data driven identification of teaching patterns.pdf
PPTX
Förderverein Technische Fakultät.pptx
PDF
The Computing Continuum.pdf
PPTX
East-west oriented photovoltaic power systems: model, benefits and technical ...
PDF
Machine Learning in Finance via Randomization
PDF
Recent Trends in Personalization at Netflix
„Die Klimakrise ist da! Wo führt sie hin?“
Constrained text generation to measure reading performance: A new approach ba...
Greening local government units: Current status and required competences
Supervisory control of business processes
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
A Game of Chess is Like a Swordfight.pdf
From Mind to Meta.pdf
Miniatures Design for Tabletop Games.pdf
Distributed Systems in the Post-Moore Era.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Engineering Serverless Workflow Applications in Federated FaaS.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Towards a data driven identification of teaching patterns.pdf
Förderverein Technische Fakultät.pptx
The Computing Continuum.pdf
East-west oriented photovoltaic power systems: model, benefits and technical ...
Machine Learning in Finance via Randomization
Recent Trends in Personalization at Netflix
Ad

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
KodekX | Application Modernization Development
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
KodekX | Application Modernization Development
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
sap open course for s4hana steps from ECC to s4
Big Data Technologies - Introduction.pptx
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf

Advances in Visual Quality Restoration with Generative Adversarial Networks

  • 1. Advances in Visual Quality Restoration with Generative Adversarial Networks Leonardo Galteri - PhD University of Florence, MICC
  • 3. *Amazon Cloud Outbound Traffic -1,125,000 $ Why does this happens? (in 2019…) • First Episode of Season 2 had 15M viewers • Stream this at reasonable quality at a cost of 0,020 $/GB*
  • 4. Working on the Encoder ICPR’18 • Adaptive video coding approach
  • 5. Working on the Encoder ICPR’18 X.264 • Adaptive video coding approach
  • 6. Deep CNN Is there Another Way?
  • 7. Improving Compressed Images Given an uncompressed frame 𝑥𝐻𝑄 𝑥𝐿𝑄 = 𝒞(𝑥𝐻𝑄; 𝜃) We want to learn a function 𝐺(𝑥𝐿𝑄) ≈ 𝒞−1(𝑥𝐻𝑄; 𝜃) where 𝜃 are codec parameters. 𝑥𝐿𝑄 𝐺(𝑥𝐿𝑄) ICCV’17
  • 8. A Deep Residual Network for Reconstruction • We use strided convolution to reduce feature map size. • We avoid checkerboard artifacts with NN upsampling followed by 2 more convolutional layers • Trained on patches 128x128 pixel extracted from MS-COCO. ICCV’17
  • 9. Limitations of MSE and SSIM Losses JPEG SSIM Loss Original • SSIM and MSE losses are able to reduce effectively compression artefacts. • However, reconstructions appear blurry and there are many missing details with respect to the uncompressed version of the image. ICCV’17
  • 10. Generative Adversarial Network G Generator D Discriminator Image REAL or FAKE? High Quality Images (REAL) Low Quality Images Restored Images (FAKE) D is trained to tell apart real from reconstructed images G is trained to fool D ICCV’17
  • 11. The Sub-Patch Discriminator • 128 x 128 patches are split into smaller 16x16 sub-patches, concatenated with correspondent input sub-patches and processed by the discriminator. • The discriminator is trained with a binary cross-entropy loss over all the sub-patches. ICCV’17
  • 12. Generator Loss ICCV’17 • The new objective of Generator is: ℒ𝐺 = ℒ𝑃 + 𝜆ℒ𝐴𝐷𝑉 where: ℒ𝑃 = 𝜙 𝐼𝐻𝑄 𝑥,𝑦 − 𝜙 𝐼𝑅𝑄 𝑥,𝑦 2 is called perceptual loss, a MSE loss in VGG19 feature space, and: ℒ𝐴𝐷𝑉 = − log 𝐷 𝐼𝑅𝑄|𝐼𝐿𝑄 is the adversarial loss, which measures how good is the fooling the discriminator.
  • 13. Effect of Sub-Patch Discriminator • This technique allows to reduce the mosquito noise present in reconstructions. W/o Sub-Patch With Sub-Patch ICCV’17
  • 14. Predicting QF • We train a CNN regressor, named QF predictor, to drive a finite Ensemble of Generators • We use the most appropriateGenerator to restore the image QF Predictor 𝐺(𝜃 = 𝜃0; 𝑥) 𝐺(𝜃 = 𝜃𝑁; 𝑥) … … Model Switcher TMM’19 𝐺(𝜃 = 𝜃𝑛; 𝑥) 𝑥
  • 16. Qualitative Results JPEG AR-CNN GAN ORIGINAL TMM’19
  • 17. Subjective Evaluation • DSIS setup test image compared to original and similarity scored in 0-100 • We compare SSIM Loss vs Adversarial Training using the same Generator architecture. • Subjects have a strong preference forGAN restored images over SSIM ones. Method MOS Std. Dev. SSIM 49.51 22.72 GAN 68.32 20.75 TMM’19
  • 18. Object Detection Results Class GAN AP gain @QF 20 Dog +18.6 Cat +16.6 Sheep +14.3 Cow +12.5 • Use an object detector, Faster R-CNN to assess the visual quality of restored images • Compute mAP on PASCALVOC using several JPEG quality factors and the correspondent reconstructions. • Large increase in detector performance • Largest gainers are deformable ’furry’ objects such as animals TMM’19
  • 19. Enters MobileNetV2 • MobileNetV2 was originally proposed to reduce computational burden of CNNs • Depthwise separable convolutions drop-in replacement for convolutional layers • Inverted residual blocks better propagate gradients across layers but more memory efficient Sandler CVPR’18 Residual Block Inverted Residual Block
  • 20. A Deep Residual Network for Reconstruction • Keep the Generator identical except for the Inverted Residual Blocks! •  Train on the small DIV2K dataset • Augmentation: resizing 256, 384 and 512; random crops of 224x224; mirror flipping. Inverted Residual Block Inverted Residual Block CAIP’19
  • 21. Qualitative Results ICCV’17 RAW Bit/Pixel 0.146 FPS 4 onTitan Xp GPU Bit/Pixel 12 FPS - All videos 720p CAIP’19
  • 22. Qualitative Results Bit/Pixel 0.146 FPS 42 onTitan Xp GPU Very Fast RAW Bit/Pixel 12 FPS - CAIP’19 All videos 720p
  • 23. Qualitative Results Bit/Pixel 0.146 FPS 20 onTitan Xp GPU Bit/Pixel 12 FPS - Fast RAW CAIP’19 All videos 720p
  • 24. RAW Bit/Pixel 0.0570 FPS 3 onV100 GPU Bit/Pixel 12 FPS - Wave.ONE Qualitative Results CAIP’19 All videos 720p
  • 25. Qualitative Results Bit/Pixel 0.0570 FPS 1,6 onTitan Xp GPU* Bit/Pixel 0.146 (x3) FPS 20 onTitan Xp GPU (x12) Fast Wave.ONE All videos 720p CAIP’19
  • 26. No-Reference Evaluation • According to NIQE and BRISQUE value GAN images as ’more natural’ the the original ones! • VIIDEO is likely penalizing reconstruction for lack of temporal coherence CAIP’19
  • 27. • GANs are well known to work well when the distribution is simpler • Faces are possibly the most interesting object we are willing to transmit • Here what we can do with a severe degradation and a specialized GAN Specialized Artifact Removal ACM MM’19 Best Demo
  • 28. Specialized Artifact Removal ACM MM’19 Best Demo • GANs are well known to work well when the distribution is simpler • Faces are possibly the most interesting object we are willing to transmit • Here what we can do with a severe degradation and a specialized GAN
  • 33. Semantic Coding + GAN ACM MM’20 Received frame Receiver
  • 34. Semantic Segmentation ACM MM’20 • Use BiSeNet to label each pixel detected as face or neck as foreground, the remainder as background.
  • 35. Semantic Segmentation ACM MM’20 + + Generator Loss 𝑀 𝑀 face background
  • 36. Semantic Segmentation ACM MM’20 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 5 10 15 20 25 30 35 40 45 LPIPS Our Method Baseline 0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40 45 BRISQUE Our Method Baseline ~ 30% improvement ~ 55% improvement
  • 37. RESULTS – QUALITATIVE 1 min COMPRESSED
  • 38. RESULTS – QUALITATIVE 1 min COMPRESSED + SALIENCY
  • 39. RESULTS – QUALITATIVE 1 min COMPRESSED + SALIENCY + GAN
  • 40. Evaluation using Language • Task: “evaluate the weighted combination of all of the visually significant attributes of an image” Quality ACM MM ASIA’21 Best Paper Award
  • 41. A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie A statue of a woman wearing a christmas tie A brown and white dog wearing a tie A brown and white dog wearing a red tie JPEG GAN HQ Evaluation using Language Use image captioning algorithm to evaluate the fine semantics of the image ACM MM ASIA’21 Best Paper Award
  • 42. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award
  • 43. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award
  • 44. Assesment Methodolgy ACM MM ASIA’21 Best Paper Award Language Model Language Metric 0.752 Language Model Pseudo Ground Truth Caption Predicted Caption Input Image Reference Image • “GT” caption can be generated from the reference imagein case captions are not available (the usual case)
  • 45. Evaluating Enhanced Images ACM MM ASIA’21 Best Paper Award • Our approach scores higher GAN reconstructions (REC) of JPEG compressed images across a wide range of QFs • Results are consistent for all captioning metrics across all qualities/enhancements
  • 46. Changing the Captioning Model ACM MM ASIA’21 Best Paper Award • Using a better captioning model increases the correlation with MOS • Visual features are shared among [1] and [2] [1] M. Cornia, M. Stefanini, L. Baraldi, and R. Cucchiara. Meshed-memory transformer for image captioning. In Proc. of CVPR 2020 [2] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top- down attention for image captioning and visual question answering. In Proc. of CVPR 2018 [1]
  • 47. JPEG GAN  A man riding a wave on a surfboard in the ocea  A couple of people sitting next to a christmas tree. Qualitative Analysis ACM MM ASIA’21 Best Paper Award
  • 48. What’s Next for Restoration? • Innovative technologies for restoration replacingGANs • Smaller architectures and new ways to train them • Blind restoration
  • 49. References L. Galteri, M. Bertini , L. Seidenari, A. Del Bimbo, Video Compression for Object Detection Algorithms’, ICPR 2018 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, 'Deep Generative Adversarial Compression Artifact Removal’, IEEE ICCV 2017 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Deep Universal Generative Adversarial Compression Artifact Removal’, IEEE TMM 2019 L. Galteri, L. Seidenari, M. Bertini, A. Del Bimbo, ‘Towards Real-Time Image Enhancement GANs’, CAIP 2019 L. Galteri, L. Seidenari, M. Bertini, T. Uricchio, A. Del Bimbo, ‘Fast Video Quality Enchancement Using GANs’, ACM MM Best Demo 2019 L. Galteri, M. Bertini, L. Seidenari, T. Uricchio, A. Del Bimbo, ‘Increasing Video Perceptual Quality with GANs and Semantic Coding’, ACM MM 2020 L. Galteri, L. Seidenari, P. Bongini, M. Bertini, A. Del Bimbo, ‘Language Based Image Quality Enhancement’, ACM MM Asia Best Paper Award 2021

Editor's Notes