Enhanced Deep Residual Networks for Single Image Super-Resolution

Enhanced Deep Residual Networks
for Single Image Super-Resolution
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee
Computer Vision Lab.
Dept. of ECE, ASRI, Seoul National University
http://cv.snu.ac.kr

SISR (Single Image Super Resolution)
Goal: Restoring a HR image from a single LR image
Low-resolution
image
High-resolution
image
Super-Resolution

Lessons from Recent Studies
 Skip connections
 Global and local skip connections enable deep architecture & stable training
 Upscaling methods
 Post-upscaling using sub-pixel convolution is more efficient than pre-upscaling
 However, they are limited that only single-scale SR is possible
SRResNet (CVPR2017)VDSR (CVPR2016)

4 Techniques for Better SR
Need Batch-Normalization?
Increasing model size
Better loss function
Geometric self-ensemble
EDSR

Empirical tests show that removing Batch-Normalization improves the
performance!

 Unlike classification problem,
input and output have similar
distributions
 In SR, normalizing intermediate
features may not be desirable
 Also, can save ~40% of memory
→ Can enlarge the model size

Increasing Model Size
 Empirical test show that
increasing #features is
better than increasing
depth
 Instability occurs when
#features increased up to
256
Given a limited memory, which design is better?

Increasing Model Size
 Residual Scaling Layer
 Increasing #features (up to 256) results instability during training
 Constant scaling layers after each residual path prevents such
instability
Proposed in (Szegedy 2016), “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning”

Loss Function: L1 vs L2
 Is MSE (L2 loss) the best choice?
 Comparison between different loss functions
 EDSR baseline(16 res-blocks), scale=2, tested on DIV2K images (791~800)
→ MSE is not a good choice!

Geometric Self-Ensemble
 Motivation
 Model ensemble is nice, but expensive!
 How can we achieve an ensemble effect
while avoiding training new models?
Proposed in (Timofte 2016), “Seven ways to improve example-based single-image super-
resolution”
 Method
 Transform test image 8 times with flips and rotations (x8)
 Build 8 outputs and inverse-transform correspondingly
 Average 8 results

EDSR Summary
 Deeper & Wider: 32 ResBlocks and 256 channels
 Global-local skip connections
 Post-upscaling
 No Batch-Normalization
 Residual scaling
 L1 loss function
 Geometric self-ensemble (EDSR+)

Motivation
 VDSR: Multi-scale SR in a single model
 Multi-scale knowledge transfer
Efficient Multi-Scale Model
 Designing MDSR
 Single vs. Multi-scale learning
 Train & Test method
 EDSR vs. MDSR
MDSR

Motivation
SRCNN, VDSR: A single architecture regardless of upscaling factor
⇨ Multi-scale SR in a single model (VDSR)
FSRCNN, ESPCN, SRResNet: Fast & Efficient, (late upsampling)
but cannot deal with the multiple scales in a single model.

Motivation
FSRCNN, ESPCN, SRResNet
⇨ Different models for different scales?
 Heavy training burden
 Waste of parameters for similar tasks
 Redundancy

Motivation
 Pre-trained scale x2 networks
greatly helps training scale x3
and x4 networks.
 Super-resolution at multiple
scales are inter-related tasks!
Multi-scale knowledge transfer

Designing MDSR
How to make EDSR (post-upscaling) to handle multiscale SR as VDSR?
Requirements
1. Reduce the variance between the different
scales
2. Most parameters are shared across scales
3. For efficiency: Post-upscaling
⇨ Scale-specific pre-processing modules
⇨ main branch
⇨ Scale-specific up-samplers

Train and Test Method
1. Train
 Only one of 3 scale-specific
branches is activated at
each iteration
 A mini-batch consists of
single-scale patches
2. Test
 Select one of the paths
(①~③) according to the
desired SR scale

EDSR vs. MDSR
 Performance:
MDSR ≲ EDSR
 # Parameters:
MDSR << EDSR
(Almost ⅕! + MDSR can handle the multiple scales in a single model)
 Stability:
MDSR << EDSR
(We failed to increase #features
even with residual scaling)

MDSR Summary
 Very deep architecture: 80 ResBlocks
 Most parameters are shared in main branch
 Scale-specific pre-processing modules and up-samplers
 Post-upscaling
 No Batch-Normalization
 L1 loss function
 Geometric self-ensemble (MDSR+)

Extreme SR (up to x64)
1/64 Scale!
How about extreme cases?

Extreme SR (up to x64)
Bicubic EDSRNN

Conclusion
1. State-of-the-art single image super-resolution system using
better ResNet structure
2. Techniques to build & train extremely large model
3. A single network to deal with multi-scale SR problem

Thank you!
http://cv.snu.ac.kr

Enhanced Deep Residual Networks for Single Image Super-Resolution

More Related Content

What's hot (20)

Similar to Enhanced Deep Residual Networks for Single Image Super-Resolution (20)

More from NAVER Engineering (20)

Recently uploaded (20)

Enhanced Deep Residual Networks for Single Image Super-Resolution