SlideShare a Scribd company logo
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
47
JOINT IMAGE REGISTRATION AND EXAMPLE-BASED
SUPER-RESOLUTION ALGORITHM
Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park
Department of Electronic Engineering, School of Engineering, Sogang University
35 Baekbeom-ro, Mapo-gu, Seoul, 121-742, Korea
ABSTRACT
Supper-resolution (SR) methods are classified into two different methods: image registration (IR)-based
methods and example-based methods. The proposed joint SR method is focused on estimating high-
resolution (HR) video sequences from low-resolution (LR) ones by combining the two different methods.
The IR-based SR method collects information from adjacent frames to reconstruct HR images in the video
sequence. Example-based SR methods give good textures and strong edges in the result HR video. In this
paper, IR-based and example-based SR methods are fused based on the gradient features. The proposed
joint SR method gives smaller peak signal to noise ratio than the example-based method, however it shows
better reconstruction results on high-level features such as characters in images. Experimental result of the
proposed joint SR method shows less noise and higher contrast than the example-based method.
KEYWORDS
Super-Resolution, Image Registration, Motion Estimation, Motion Compensation, Example-Based Learning,
Sparse Coding, Neigborhood Regression.
1. INTRODUCTION
Supper-resolution (SR) methods are traditionally classified into two different classes: image
registration (IR)-based and example-based methods [1]. IR is the task of finding the motion
between two or more frames of the same scene. It is noted that the motion estimated by IR does
not necessarily describe the true motion of either the camera or the object in a scene. The most
common way to allow a reliable implementation of the IR is to estimate the motion using only
two-dimensional translations, under the assumption of small motion within the scene. In [1] more
elaborated methods are presented including methods using rotation and scaling, basically
intended for scanned documents.
IR-based SR methods can be classified into two main approaches: featured-based methods and
block-based methods. The first approach fits a motion model by using a sparse set of point
correspondences, while the second one uses the information of the entire pixels within search area.
Feature-based SR methods can cause false matches due to their regression nature and require a
large number of points in order to achieve a high level of accuracy. On the contrary, the block-
based methods can obtain the motion information with a good accuracy for producing a high-
resolution (HR) image. While the feature-based methods use a small number of points, the block-
based methods utilize all the overlapping blocks of the adjacent images. A lot of block-based
methods estimate image motion by minimizing a cost function between two motion-corrected
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
48
images [2], however suffer from a very high computational cost.
Although SR methods which use motion information between adjacent frames have been
successful on some of the video sequences, they are not suitable for the video sequences with
stationary objects and background, which do not contain information to fill in the sub-pixels.
Moreover, estimating a true motion vector in spatially high-frequency regions and handling
occluded regions is not a simple task. Thus, SR techniques on a single image are still important
even though the goal is to perform SR on video sequences. Most single image SR methods use
example-based approach, which learns from example images to form a dictionary and apply it to
each image patch. Xiong et al. [3] used the soft information and decision on the one-to-many
correspondence of dictionaries to solve the dimensionality gap between low-resolution (LR) and
HR spaces. Timofte et al. improved the conventional sparse coding method by introducing
anchored neighborhood regression (ANR) method [4]–[6]. Zhu et al. [7] used optical flow on
image patches to form deformable patches, in order to make the learned dictionary more
expressive.
2. RELATED WORK
2.1. IR-Based SR Methods
IR-based SR methods [1]–[2] generally concentrate on spatial domain and typically consist of
two processes: IR and HR reconstruction. IR is the process that calculates the motion parameters
based on a specific motion model. HR reconstruction incorporates the estimated motion
parameters into inverse estimation. The aliasing effect among LR images may reduce the
accuracy of registration. Accurate registration is a challenging task because motion parameters
are calculated from a number of aliased LR images. To deal with the problem, various SR
algorithms have been proposed to reduce the registration error on the final estimated HR image.
2.2. Dictionary-Based SR Methods
A large number of dictionary-based SR methods [3]–[8] have been developed for the last decade.
They use dictionary of image patches or patch-based atoms that are trained to represent natural
image patches efficiently, which brings advantages in both time complexity and accuracy of the
SR results.
2.2.1. Neighbor Embedding Methods
Neighbor embedding methods are generally used in patch-based methods. They assume that the
LR input patches can be approximated by a weighted sum of HR trained patches. LR and HR
patches are learned simultaneously in the training phase. A typical method, locally linear
embedding [3], can be written as



K
i
i
i
w
1
,
'
x
x (1)
where x denotes the result HR patch and i
'
x represents the i th candidate HR patch, which is the
pair of the i th nearest neighbor LR patch of the input LR patch. K is the number of neighbors in
consideration and i
w denotes the weight of the i th candidate HR patch.
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
49
2.2.2. Sparse Coding Methods
Unlike neighbor embedding methods, sparse coding methods try to find efficient representations,
dictionary atoms, of the image patches. Zeyde et al. [8] built an efficient and improved method to
train a sparse dictionary. They built dictionaries for both LR patches and their corresponding HR
patches by using joint optimization on these two patches. After LR and HR dictionaries are
trained, a sparse representation 
α of an input LR image patch can be calculated as
,
min
arg
1
2
2
α
y
α
α
α





l
D (2)
where l
D denotes the LR dictionary, α is a sparse representation of LR patch, y represents the
input LR patch, and  denotes the regularization parameter which controls the significance of the
sparsity constraint (l1-norm in the second term) over the modelling error (l2-norm in the first term).
2.2.3 ANR Method
ANR method [4] reformulates the dictionary of Zeyde et al. [8] to pre-compute regression
matrices used to calculate the result HR patches. Instead of using LR and HR dictionaries directly,
it considers the nearest neighbors of a specific j th dictionary atom ,
j
l
d which is the j th column
vector of the LR dictionary .
l
D Using only the nearest neighbors of ,
j
l
d a local neighborhood LR
dictionary j
l
N is calculated along with its corresponding local neighborhood HR dictionary .
j
h
N
Instead of 1
l -norm, 2
l -norm is used for the sparsity constraint to calculate a sparse representation
,

β which can be written as
,
min
arg
2
2
2
β
y
β
β
β





l
N (3)
where the superscript j is omitted for simplicity. (3) can be solved in a closed-form solution by
ridge regression [9], which is written as
,
)
(
1
y
β
T
l
l
T
l N
I
N
N



  (4)
and then followed by
,
y
β
x
j
h P
N 


(5)
where x denotes the SR result and j
P is the projection matrix which projects an input LR patch
directly onto .
x Note that j
P can be pre-calculated, which improves the algorithm a lot in terms
of the time complexity. In summary, ANR learns off-line regressors for fast SR, while improving
qualities using the neighborhood concept.
2.2.4. Adjusted ANR Method
Adjusted ANR (A+) method [5] is an improved ANR algorithm, which is based on the following
observation. First, the dictionary atoms are sparsely sampled in the space, whereas the training
pool of image patch samples obtained in off-line training is practically near-infinite. Second, the
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
50
local manifold around an atom is spanned better by dense training samples than by the dictionary
atoms.
Based on the observation, A+ method reformulates the optimization problem (3), which can be
written as
,
min
arg
2
2
2
δ
y
δ
δ
δ





l
S (6)
where l
S denotes a LR dictionary, which contains K neighboring training samples (possibly pre-
calculated per atom in off-line training). l
S replaces l
N that contains neighboring atoms in (3).
As A+ uses the same baseline of ANR method, (6) can also be solved by ridge regression. Thus,
the closed-form solution can be obtained as
,
ˆ y
δ
x
j
h P
S 


(7)
where h
S denotes HR dictionary corresponding to l
S and j
P
ˆ represents the projection matrix
obtained from l
S . Note that h
S projects an input LR patch y directly onto HR patch .
x Using
such strategy, A+ method chooses better neighborhood for local dictionaries, which drastically
improves the SR result.
3. PROPOSED JOINT SR METHOD
The proposed IR-based SR method generates image grid by referencing neighboring frames. The
overall process of the IR-method SR is illustrated in Figure 1. The resolution of image grid is N
times higher than that of the original image in both width and height. N is related to the sub-pixel
accuracy for motion search. If N is set to 4, quarter-pel accuracy motion estimation is performed
on the neighboring reference frames. All pixels in image grid are reconstructed using the
information of neighboring reference frames. The reconstructed image grid has full HR to
reconstruct an HR image using down-sampling. The down-sampling process uses a super
sampling anti-aliasing (SSAA) method, which performs patterned sampling method such as grid,
random, Poisson, jitter, and rotated grid. Rotated grid is well known as good for removing edges
[10], therefore we use it for image sampling.
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
51
Figure 1. An example of the proposed IR-based SR method. The sub-pixel accuracy is set to four. Circles
denote the integer-pels and diamonds represent the sub-pels.
Figure 2 shows experimental results of the proposed IR-based SR method according to different
parameter setting. We adjust three parameters, which are block sizes, sub-pixel accuracy, and the
number of reference frames. Default parameter setting is as follows: block size = 32×32, sub-
pixel accuracy = 1/2, and the number of reference frames = 3. As shown in Figures 2(a)-2(c),
small block size reduces block artifact in object boundaries, however, computational burden is
increased. Figures 2(d)-2(f) show that high level of sub-pixel accuracy increases the quality of the
reconstructed HR image, however, a lot of holes are produced (see Figure 2(f)) because default
parameter setting for the number of references is too small. To solve this problem, a large number
of references are used to obtain lots of temporal information (observed from Figures 2(g)-2(i)),
however, irrelevant information can be presented by scene change. Figure 2(i) shows the quality
loss using irrelevant reference frames in case of fast motion. In consideration of both image
quality and computational complexity, we use the following parameter setting: block size = 8×8,
sub-pixel accuracy = 1/8, and the number of reference frames = 11.
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
52
Figure 2. Experimental results of the proposed IR-based SR method according to different parameter setting
for the Stefan image sequence.
Adjusted ANR method [5] is used for example-based SR. After reconstructing an HR image using
both IR- and example-based methods, the proposed method combines the two HR images [11] by
using a gradient-based weight function. The final reconstructed image is calculated by
   
j
i
F
j
i
F
j
i
j
i
F EX
EX
IR
IR
J ,
ω
,
)
,
(
ω
)
,
( 
 (8)
where FJ represents the final reconstructed image, FIR denotes the HR image reconstructed by the
IR method, and FEX is the HR image reconstructed by the example-based method. The weight
functions are defined as
)
,
(
)
,
(
)
,
(
ω
j
i
F
g
j
i
F
g
j
i
F
g
EX
IR
IR
IR




 (9)
)
,
(
)
,
(
)
,
(
ω
j
i
F
g
j
i
F
g
j
i
F
g
EX
IR
EX
EX




 (10)
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
53
where g represents gradient operation and  denotes convolution operation. The joint SR method
enhances the reconstructed HR image quality by weighting two images reconstructed by different
approaches.
4. EXPERIMENTAL RESULTS AND DISCUSSIONS
The simulated image sequence is a common intermediate format (CIF) video, of which spatial
resolution equals 352 288 at 30 frames per second. The test sequence named Stefan contains
dynamic scenes playing tennis. One can observe cluttered background located at the top of the
scene caused by crowd, fast motions of the tennis players, large camera motion tracking of the
players, and characters of various sizes on the walls.
In this paper, the CIF videos are down-sampled to quarter CIF (QCIF) size, which equals
176 144. SR is performed on the created QCIF videos to recover CIF videos. For example-based
SR, the number of the atoms in the dictionary K is set to 16. For registration-based SR, sub-pixel
accuracy N is set to 8, with the search range=32, and the number of reference frames=11.
Figure 3. Experimental results and performance comparison in terms of the PSNR for the LR image
sequence (Stefan).
Figure 3 shows the SR result on the Stefan sequence with their cropped/zoomed images at the
corners. For a reference of the comparison, SR result using bi-cubic interpolation is shown in
Figure 3(b). Example-based SR result is shown in Figure 3(c). It can be observed that example-
based SR gives fine details on the complex textures such as crowd. Also, the result by example-
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
54
based SR shows the best result on the strong edges (see shoulder of the tennis player, horizontal
lines across the image, and so on). However, English characters located at the bottom left are not
clearly readable, since there is not sufficient information to recover the characters in a single LR
image. The IR-based SR method collects information from a number of adjacent frames to build
an HR image, even though there is only little evidence in a single frame. Figure 3(d) shows the
SR result by the proposed joint SR method which takes advantages from both IR- and example-
based approaches although the peak signal to noise ratio (PSNR) is somewhat decreased.
However, it shows better SR quality on characters (shown in Figures 4(a) and 4(b)).
Figure 4. Quality comparison of the proposed joint SR with the A+ for three different image sequences.
Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015
55
Figures 4(a)-4(b) compare the proposed joint SR with A+ method for the Stefan sequence,
whereas Figures 4(c)-4(d) for the Foreman sequence. The result of the proposed joint SR method
shows less noise and higher contrast than A+ method. Figures 4(e) and 4(f) show the
experimental results performed on the full HD-size (1920×1080) Kimono image sequence. The
proposed joint SR method up-scales the Kimono sequence from full HD to UHD 4K (3840×2160)
resolution. In contrast with the Stefan sequence (LR sequence), the experimental results of the
Kimono sequence (HR sequence) shows little difference in qualitative comparison.
5. CONCLUSION
The proposed SR method is focused on reconstructing HR video sequences from LR video
sequences. The proposed IR-based SR method successfully collects information from adjacent
frames to reconstruct English characters in the video sequences. However, the example-based SR
method gives better textures and strong edges in the result HR video. In this paper, IR- and
example-based SR methods are fused based on the gradient features. The proposed joint SR
method gives smaller PSNRs than the example-based method, however it shows better
reconstruction results on high-level features. Future work will focus on optimizing the joint SR
method using convolutional neural network to reduce the time complexity of the algorithm.
6. ACKNOWLEDGMENTS
This work was supported in part by the Brain Korea 21 Plus.
REFERENCES
[1] Y. Tian and K.-H. Yap, “Joint image registration and super-resolution from low-resolution images
with zooming motion,” IEEE Trans. Circuits and Systems for Video Technology, vol. 23, no. 7, pp.
1224–1234, Jul. 2013.
[2] C. Liu and D. Sun, “On Bayesian adaptive video super resolution,” IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 36, no. 2, pp. 346–360, Feb. 2014.
[3] Z. Xiong, D. Xu, X. Sun, and F. Wu, “Example-based super-resolution with soft information and
decision,” IEEE Trans. Multimedia, vol. 15, no. 6, pp. 1458–1465, May 2013.
[4] R. Timofte, V. D. Smet, and L. V. Gool, “Anchored neighborhood regression for fast example-based
super-resolution,” in Proc. IEEE Int. Conf. Computer Vision, pp. 1920–1927, Sydney, Australia, Dec.
2013.
[5] R. Timofte, V. D. Smet, and L. V. Gool, “A+: Adjusted anchored neighborhood regression for fast
super-resolution,” in Proc. Asian Conf. Computer Vision, pp. 1–15, Singapore, Nov. 2014.
[6] R. Timofte and L. V. Gool, “Adaptive and weighted collaborative representations for image
classification,” Pattern Recognition Letters, vol. 43, pp. 127–135, Jul. 2014.
[7] Y. Zhu, Y. Zhang, and A. L. Yuille, “Single image super-resolution using deformable patches,” in
Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2917–2924, Columbus, OH, USA,
Jun. 2014.
[8] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in
Lecture Notes in Computer Science: Curves and Surfaces, J.-D. Boissonnat and P. Chenin, Eds.,
Springer, pp. 711–730, 2012.
[9] Y. Zhu, Y. Zhang, and A. L. Yuille, “Single image super-resolution using deformable patches,” in
Proc. IEEE Computer Vision and Pattern Recognition, pp. 2917–2924, Columbus, OH, USA, Jun.
2014.
[10] R. Barringer and T. A. Moller, “A4: Asynchronous adaptive anti-aliasing using shared memory,”
ACM Trans. Graphics, vol. 32, no. 4, pp. 100:1–100:10, Jul. 2013.
[11] K. Lee and C. Lee, “High quality spatially registered vertical temporal filtering for deinterlacing,”
IEEE Trans. Consumer Electronics, vol. 59, no. 1, pp. 182–189, Feb. 2013.

More Related Content

PDF
5 single image super resolution using
PDF
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
PDF
Analysis of Various Single Frame Super Resolution Techniques for better PSNR
PDF
Survey on Single image Super Resolution Techniques
PDF
Survey on Single image Super Resolution Techniques
PDF
Super resolution image reconstruction via dual dictionary learning in sparse...
PDF
27 robust super resolution for 276-282
PDF
Ay33292297
5 single image super resolution using
SINGLE IMAGE SUPER RESOLUTION: A COMPARATIVE STUDY
Analysis of Various Single Frame Super Resolution Techniques for better PSNR
Survey on Single image Super Resolution Techniques
Survey on Single image Super Resolution Techniques
Super resolution image reconstruction via dual dictionary learning in sparse...
27 robust super resolution for 276-282
Ay33292297

Similar to Joint Image Registration And Example-Based Super-Resolution Algorithm (20)

PDF
Ay33292297
PDF
557 480-486
PDF
Image resolution enhancement via multi surface fitting
DOCX
Ieee transactions on image processing
PDF
Low-Rank Neighbor Embedding for Single Image Super-Resolution
PDF
Single Image Super Resolution using Interpolation and Discrete Wavelet Transform
PDF
IRJET- Exploring Image Super Resolution Techniques
PDF
A comparison of SIFT, PCA-SIFT and SURF
PDF
Prediction of Interpolants in Subsampled Radargram Slices
PDF
Image Super-Resolution Reconstruction Based On Multi-Dictionary Learning
PDF
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
PDF
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
PDF
gilbert_iccv11_paper
PDF
3 video segmentation
PDF
A Novel Super Resolution Algorithm Using Interpolation and LWT Based Denoisin...
PDF
Image Restoration UsingNonlocally Centralized Sparse Representation and histo...
PDF
Super resolution in deep learning era - Jaejun Yoo
PDF
Seminarpaper
PDF
Ijecet 06 10_002
Ay33292297
557 480-486
Image resolution enhancement via multi surface fitting
Ieee transactions on image processing
Low-Rank Neighbor Embedding for Single Image Super-Resolution
Single Image Super Resolution using Interpolation and Discrete Wavelet Transform
IRJET- Exploring Image Super Resolution Techniques
A comparison of SIFT, PCA-SIFT and SURF
Prediction of Interpolants in Subsampled Radargram Slices
Image Super-Resolution Reconstruction Based On Multi-Dictionary Learning
OBTAINING SUPER-RESOLUTION IMAGES BY COMBINING LOW-RESOLUTION IMAGES WITH HIG...
Learning Based Single Frame Image Super-resolution Using Fast Discrete Curvel...
gilbert_iccv11_paper
3 video segmentation
A Novel Super Resolution Algorithm Using Interpolation and LWT Based Denoisin...
Image Restoration UsingNonlocally Centralized Sparse Representation and histo...
Super resolution in deep learning era - Jaejun Yoo
Seminarpaper
Ijecet 06 10_002

Recently uploaded (20)

PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
OOP with Java - Java Introduction (Basics)
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT
Project quality management in manufacturing
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Geodesy 1.pptx...............................................
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Construction Project Organization Group 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PPT on Performance Review to get promotions
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Internet of Things (IOT) - A guide to understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
OOP with Java - Java Introduction (Basics)
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Project quality management in manufacturing
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Operating System & Kernel Study Guide-1 - converted.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Geodesy 1.pptx...............................................
bas. eng. economics group 4 presentation 1.pptx
Foundation to blockchain - A guide to Blockchain Tech
Construction Project Organization Group 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT on Performance Review to get promotions

Joint Image Registration And Example-Based Super-Resolution Algorithm

  • 1. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 47 JOINT IMAGE REGISTRATION AND EXAMPLE-BASED SUPER-RESOLUTION ALGORITHM Hyo-Song Kim, Jeyong Shin, and Rae-Hong Park Department of Electronic Engineering, School of Engineering, Sogang University 35 Baekbeom-ro, Mapo-gu, Seoul, 121-742, Korea ABSTRACT Supper-resolution (SR) methods are classified into two different methods: image registration (IR)-based methods and example-based methods. The proposed joint SR method is focused on estimating high- resolution (HR) video sequences from low-resolution (LR) ones by combining the two different methods. The IR-based SR method collects information from adjacent frames to reconstruct HR images in the video sequence. Example-based SR methods give good textures and strong edges in the result HR video. In this paper, IR-based and example-based SR methods are fused based on the gradient features. The proposed joint SR method gives smaller peak signal to noise ratio than the example-based method, however it shows better reconstruction results on high-level features such as characters in images. Experimental result of the proposed joint SR method shows less noise and higher contrast than the example-based method. KEYWORDS Super-Resolution, Image Registration, Motion Estimation, Motion Compensation, Example-Based Learning, Sparse Coding, Neigborhood Regression. 1. INTRODUCTION Supper-resolution (SR) methods are traditionally classified into two different classes: image registration (IR)-based and example-based methods [1]. IR is the task of finding the motion between two or more frames of the same scene. It is noted that the motion estimated by IR does not necessarily describe the true motion of either the camera or the object in a scene. The most common way to allow a reliable implementation of the IR is to estimate the motion using only two-dimensional translations, under the assumption of small motion within the scene. In [1] more elaborated methods are presented including methods using rotation and scaling, basically intended for scanned documents. IR-based SR methods can be classified into two main approaches: featured-based methods and block-based methods. The first approach fits a motion model by using a sparse set of point correspondences, while the second one uses the information of the entire pixels within search area. Feature-based SR methods can cause false matches due to their regression nature and require a large number of points in order to achieve a high level of accuracy. On the contrary, the block- based methods can obtain the motion information with a good accuracy for producing a high- resolution (HR) image. While the feature-based methods use a small number of points, the block- based methods utilize all the overlapping blocks of the adjacent images. A lot of block-based methods estimate image motion by minimizing a cost function between two motion-corrected
  • 2. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 48 images [2], however suffer from a very high computational cost. Although SR methods which use motion information between adjacent frames have been successful on some of the video sequences, they are not suitable for the video sequences with stationary objects and background, which do not contain information to fill in the sub-pixels. Moreover, estimating a true motion vector in spatially high-frequency regions and handling occluded regions is not a simple task. Thus, SR techniques on a single image are still important even though the goal is to perform SR on video sequences. Most single image SR methods use example-based approach, which learns from example images to form a dictionary and apply it to each image patch. Xiong et al. [3] used the soft information and decision on the one-to-many correspondence of dictionaries to solve the dimensionality gap between low-resolution (LR) and HR spaces. Timofte et al. improved the conventional sparse coding method by introducing anchored neighborhood regression (ANR) method [4]–[6]. Zhu et al. [7] used optical flow on image patches to form deformable patches, in order to make the learned dictionary more expressive. 2. RELATED WORK 2.1. IR-Based SR Methods IR-based SR methods [1]–[2] generally concentrate on spatial domain and typically consist of two processes: IR and HR reconstruction. IR is the process that calculates the motion parameters based on a specific motion model. HR reconstruction incorporates the estimated motion parameters into inverse estimation. The aliasing effect among LR images may reduce the accuracy of registration. Accurate registration is a challenging task because motion parameters are calculated from a number of aliased LR images. To deal with the problem, various SR algorithms have been proposed to reduce the registration error on the final estimated HR image. 2.2. Dictionary-Based SR Methods A large number of dictionary-based SR methods [3]–[8] have been developed for the last decade. They use dictionary of image patches or patch-based atoms that are trained to represent natural image patches efficiently, which brings advantages in both time complexity and accuracy of the SR results. 2.2.1. Neighbor Embedding Methods Neighbor embedding methods are generally used in patch-based methods. They assume that the LR input patches can be approximated by a weighted sum of HR trained patches. LR and HR patches are learned simultaneously in the training phase. A typical method, locally linear embedding [3], can be written as    K i i i w 1 , ' x x (1) where x denotes the result HR patch and i ' x represents the i th candidate HR patch, which is the pair of the i th nearest neighbor LR patch of the input LR patch. K is the number of neighbors in consideration and i w denotes the weight of the i th candidate HR patch.
  • 3. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 49 2.2.2. Sparse Coding Methods Unlike neighbor embedding methods, sparse coding methods try to find efficient representations, dictionary atoms, of the image patches. Zeyde et al. [8] built an efficient and improved method to train a sparse dictionary. They built dictionaries for both LR patches and their corresponding HR patches by using joint optimization on these two patches. After LR and HR dictionaries are trained, a sparse representation  α of an input LR image patch can be calculated as , min arg 1 2 2 α y α α α      l D (2) where l D denotes the LR dictionary, α is a sparse representation of LR patch, y represents the input LR patch, and  denotes the regularization parameter which controls the significance of the sparsity constraint (l1-norm in the second term) over the modelling error (l2-norm in the first term). 2.2.3 ANR Method ANR method [4] reformulates the dictionary of Zeyde et al. [8] to pre-compute regression matrices used to calculate the result HR patches. Instead of using LR and HR dictionaries directly, it considers the nearest neighbors of a specific j th dictionary atom , j l d which is the j th column vector of the LR dictionary . l D Using only the nearest neighbors of , j l d a local neighborhood LR dictionary j l N is calculated along with its corresponding local neighborhood HR dictionary . j h N Instead of 1 l -norm, 2 l -norm is used for the sparsity constraint to calculate a sparse representation ,  β which can be written as , min arg 2 2 2 β y β β β      l N (3) where the superscript j is omitted for simplicity. (3) can be solved in a closed-form solution by ridge regression [9], which is written as , ) ( 1 y β T l l T l N I N N      (4) and then followed by , y β x j h P N    (5) where x denotes the SR result and j P is the projection matrix which projects an input LR patch directly onto . x Note that j P can be pre-calculated, which improves the algorithm a lot in terms of the time complexity. In summary, ANR learns off-line regressors for fast SR, while improving qualities using the neighborhood concept. 2.2.4. Adjusted ANR Method Adjusted ANR (A+) method [5] is an improved ANR algorithm, which is based on the following observation. First, the dictionary atoms are sparsely sampled in the space, whereas the training pool of image patch samples obtained in off-line training is practically near-infinite. Second, the
  • 4. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 50 local manifold around an atom is spanned better by dense training samples than by the dictionary atoms. Based on the observation, A+ method reformulates the optimization problem (3), which can be written as , min arg 2 2 2 δ y δ δ δ      l S (6) where l S denotes a LR dictionary, which contains K neighboring training samples (possibly pre- calculated per atom in off-line training). l S replaces l N that contains neighboring atoms in (3). As A+ uses the same baseline of ANR method, (6) can also be solved by ridge regression. Thus, the closed-form solution can be obtained as , ˆ y δ x j h P S    (7) where h S denotes HR dictionary corresponding to l S and j P ˆ represents the projection matrix obtained from l S . Note that h S projects an input LR patch y directly onto HR patch . x Using such strategy, A+ method chooses better neighborhood for local dictionaries, which drastically improves the SR result. 3. PROPOSED JOINT SR METHOD The proposed IR-based SR method generates image grid by referencing neighboring frames. The overall process of the IR-method SR is illustrated in Figure 1. The resolution of image grid is N times higher than that of the original image in both width and height. N is related to the sub-pixel accuracy for motion search. If N is set to 4, quarter-pel accuracy motion estimation is performed on the neighboring reference frames. All pixels in image grid are reconstructed using the information of neighboring reference frames. The reconstructed image grid has full HR to reconstruct an HR image using down-sampling. The down-sampling process uses a super sampling anti-aliasing (SSAA) method, which performs patterned sampling method such as grid, random, Poisson, jitter, and rotated grid. Rotated grid is well known as good for removing edges [10], therefore we use it for image sampling.
  • 5. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 51 Figure 1. An example of the proposed IR-based SR method. The sub-pixel accuracy is set to four. Circles denote the integer-pels and diamonds represent the sub-pels. Figure 2 shows experimental results of the proposed IR-based SR method according to different parameter setting. We adjust three parameters, which are block sizes, sub-pixel accuracy, and the number of reference frames. Default parameter setting is as follows: block size = 32×32, sub- pixel accuracy = 1/2, and the number of reference frames = 3. As shown in Figures 2(a)-2(c), small block size reduces block artifact in object boundaries, however, computational burden is increased. Figures 2(d)-2(f) show that high level of sub-pixel accuracy increases the quality of the reconstructed HR image, however, a lot of holes are produced (see Figure 2(f)) because default parameter setting for the number of references is too small. To solve this problem, a large number of references are used to obtain lots of temporal information (observed from Figures 2(g)-2(i)), however, irrelevant information can be presented by scene change. Figure 2(i) shows the quality loss using irrelevant reference frames in case of fast motion. In consideration of both image quality and computational complexity, we use the following parameter setting: block size = 8×8, sub-pixel accuracy = 1/8, and the number of reference frames = 11.
  • 6. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 52 Figure 2. Experimental results of the proposed IR-based SR method according to different parameter setting for the Stefan image sequence. Adjusted ANR method [5] is used for example-based SR. After reconstructing an HR image using both IR- and example-based methods, the proposed method combines the two HR images [11] by using a gradient-based weight function. The final reconstructed image is calculated by     j i F j i F j i j i F EX EX IR IR J , ω , ) , ( ω ) , (   (8) where FJ represents the final reconstructed image, FIR denotes the HR image reconstructed by the IR method, and FEX is the HR image reconstructed by the example-based method. The weight functions are defined as ) , ( ) , ( ) , ( ω j i F g j i F g j i F g EX IR IR IR      (9) ) , ( ) , ( ) , ( ω j i F g j i F g j i F g EX IR EX EX      (10)
  • 7. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 53 where g represents gradient operation and  denotes convolution operation. The joint SR method enhances the reconstructed HR image quality by weighting two images reconstructed by different approaches. 4. EXPERIMENTAL RESULTS AND DISCUSSIONS The simulated image sequence is a common intermediate format (CIF) video, of which spatial resolution equals 352 288 at 30 frames per second. The test sequence named Stefan contains dynamic scenes playing tennis. One can observe cluttered background located at the top of the scene caused by crowd, fast motions of the tennis players, large camera motion tracking of the players, and characters of various sizes on the walls. In this paper, the CIF videos are down-sampled to quarter CIF (QCIF) size, which equals 176 144. SR is performed on the created QCIF videos to recover CIF videos. For example-based SR, the number of the atoms in the dictionary K is set to 16. For registration-based SR, sub-pixel accuracy N is set to 8, with the search range=32, and the number of reference frames=11. Figure 3. Experimental results and performance comparison in terms of the PSNR for the LR image sequence (Stefan). Figure 3 shows the SR result on the Stefan sequence with their cropped/zoomed images at the corners. For a reference of the comparison, SR result using bi-cubic interpolation is shown in Figure 3(b). Example-based SR result is shown in Figure 3(c). It can be observed that example- based SR gives fine details on the complex textures such as crowd. Also, the result by example-
  • 8. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 54 based SR shows the best result on the strong edges (see shoulder of the tennis player, horizontal lines across the image, and so on). However, English characters located at the bottom left are not clearly readable, since there is not sufficient information to recover the characters in a single LR image. The IR-based SR method collects information from a number of adjacent frames to build an HR image, even though there is only little evidence in a single frame. Figure 3(d) shows the SR result by the proposed joint SR method which takes advantages from both IR- and example- based approaches although the peak signal to noise ratio (PSNR) is somewhat decreased. However, it shows better SR quality on characters (shown in Figures 4(a) and 4(b)). Figure 4. Quality comparison of the proposed joint SR with the A+ for three different image sequences.
  • 9. Advanced Computational Intelligence: An International Journal (ACII), Vol.2, No.2, April 2015 55 Figures 4(a)-4(b) compare the proposed joint SR with A+ method for the Stefan sequence, whereas Figures 4(c)-4(d) for the Foreman sequence. The result of the proposed joint SR method shows less noise and higher contrast than A+ method. Figures 4(e) and 4(f) show the experimental results performed on the full HD-size (1920×1080) Kimono image sequence. The proposed joint SR method up-scales the Kimono sequence from full HD to UHD 4K (3840×2160) resolution. In contrast with the Stefan sequence (LR sequence), the experimental results of the Kimono sequence (HR sequence) shows little difference in qualitative comparison. 5. CONCLUSION The proposed SR method is focused on reconstructing HR video sequences from LR video sequences. The proposed IR-based SR method successfully collects information from adjacent frames to reconstruct English characters in the video sequences. However, the example-based SR method gives better textures and strong edges in the result HR video. In this paper, IR- and example-based SR methods are fused based on the gradient features. The proposed joint SR method gives smaller PSNRs than the example-based method, however it shows better reconstruction results on high-level features. Future work will focus on optimizing the joint SR method using convolutional neural network to reduce the time complexity of the algorithm. 6. ACKNOWLEDGMENTS This work was supported in part by the Brain Korea 21 Plus. REFERENCES [1] Y. Tian and K.-H. Yap, “Joint image registration and super-resolution from low-resolution images with zooming motion,” IEEE Trans. Circuits and Systems for Video Technology, vol. 23, no. 7, pp. 1224–1234, Jul. 2013. [2] C. Liu and D. Sun, “On Bayesian adaptive video super resolution,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 36, no. 2, pp. 346–360, Feb. 2014. [3] Z. Xiong, D. Xu, X. Sun, and F. Wu, “Example-based super-resolution with soft information and decision,” IEEE Trans. Multimedia, vol. 15, no. 6, pp. 1458–1465, May 2013. [4] R. Timofte, V. D. Smet, and L. V. Gool, “Anchored neighborhood regression for fast example-based super-resolution,” in Proc. IEEE Int. Conf. Computer Vision, pp. 1920–1927, Sydney, Australia, Dec. 2013. [5] R. Timofte, V. D. Smet, and L. V. Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Proc. Asian Conf. Computer Vision, pp. 1–15, Singapore, Nov. 2014. [6] R. Timofte and L. V. Gool, “Adaptive and weighted collaborative representations for image classification,” Pattern Recognition Letters, vol. 43, pp. 127–135, Jul. 2014. [7] Y. Zhu, Y. Zhang, and A. L. Yuille, “Single image super-resolution using deformable patches,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2917–2924, Columbus, OH, USA, Jun. 2014. [8] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Lecture Notes in Computer Science: Curves and Surfaces, J.-D. Boissonnat and P. Chenin, Eds., Springer, pp. 711–730, 2012. [9] Y. Zhu, Y. Zhang, and A. L. Yuille, “Single image super-resolution using deformable patches,” in Proc. IEEE Computer Vision and Pattern Recognition, pp. 2917–2924, Columbus, OH, USA, Jun. 2014. [10] R. Barringer and T. A. Moller, “A4: Asynchronous adaptive anti-aliasing using shared memory,” ACM Trans. Graphics, vol. 32, no. 4, pp. 100:1–100:10, Jul. 2013. [11] K. Lee and C. Lee, “High quality spatially registered vertical temporal filtering for deinterlacing,” IEEE Trans. Consumer Electronics, vol. 59, no. 1, pp. 182–189, Feb. 2013.