5 single image super resolution using

Single Image Super-Resolution Using
Dictionary-Based Local Regression
Sundaresh Ram and Jeffrey J. Rodriguez.
Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA.
Email: {ram.jjrodrig}@email.arizona.edu
Ahstract-This paper presents a new method of producing a
high-resolution image from a single low-resolution image without
any external training image sets. We use a dictionary-based
regression model for practical image super-resolution using local
self-similar example patches within the image. Our method is
inspired by the observation that image patches can be well rep
resented as a sparse linear combination of elements from a chosen
over-complete dictionary and that a patch in the high-resolution
image have good matches around its corresponding location in the
low-resolution image. A first-order approximation of a nonlinear
mapping function, learned using the local self-similar example
patches, is applied to the low-resolution image patches to obtain
the corresponding high-resolution image patches. We show that
the proposed algorithm provides improved accuracy compared
to the existing single image super-resolution methods by running
them on various input images that contain diverse textures, and
that are contaminated by noise or other artifacts.
Index Terms-Image restoration, dictionary learning, sparse
recovery, image super-resolution, regression.
I. INTRODUCTION
Super-resolution image reconstruction is a very important
task in many computer vision and image processing applica
tions. The goal of image super-resolution (SR) is to generate a
high-resolution (HR) image from one or more low-resolution
(LR) images. Image SR is a widely researched topic, and there
have been numerous SR algorithms that have been proposed in
the literature [1]-[9], [11]-[18]. SR algorithms can be broadly
classified into three main categories: interpolation-based algo
rithms, learning-based algorithms, and reconstruction-based
algorithms. Interpolation-based SR algorithms [2], [8], [9],
[11] are fast but the results may lack some of the fine details.
In learning-based SR algorithms [4]-[6], [14], detailed textures
are elucidated by searching through a training set of LRIHR
images. They need a careful selection of the training im
ages, otherwise erroneous details may be found. Alternatively,
reconstruction-based SR algorithms [1], [3], [12], [15]-[18]
apply various smoothness priors and impose the constraint that
when properly downsampled, the HR image should reproduce
the original LR image.
The image SR problem is a severely ill-posed problem, since
many HR images can produce the same LR image, and thus
it has to rely on some strong image priors for robust estima
tion. The most common image prior is the simple analytical
"smoothness" prior, e.g., bicubic interpolation. As an image
contains sharp discontinuities, such as edges and corners, using
the simple "smoothness" prior for its SR reconstruction will
result in ringing, jagged, blurring and ghosting artifacts. Thus,
978-1-4799-4053-0114/$31.00 ©2014 IEEE 121
more sophisticated statistical image priors learned from natural
images have been explored [1], [2], [12]. Even though natural
images are sparse signals, trying to capture their rich charac
teristics using only a few parameters is impossible. Further,
example-based nonparametric methods [14]-[16], [18] have
been used to predict the missing high-frequency component
of the HR image, using a universal set of training example
LRIHR image patches. But these methods require a large set
of training patches, making them computationally inefficient.
Recently, many SR algorithms have been developed using
the fact that images possess a large number of self-similarities,
I.e., local image structures tend to reappear within and across
different image scales [3], [5], [18], and thus the image SR
problem can be regularized based on these examples rather
than some external database. In particular, Glasner et al.
[5] proposed a framework that uses the self-similar example
patches from within and across different image scales to
regularize the SR problem. Yang et al. [18] developed a
SR method where the SR images are constructed using a
learned dictionary formed using image patch pairs extracted by
building an image pyramid of the LR image. Freedman et al.
[3] extended the example-based SR framework by following a
local self-similarity assumption on the example image patches
and iteratively upscaling the LR image.
In this paper, we describe a new single image super
resolution method using a dictionary-based local regression
approach. Our approach differs from prior work on single
image SR with respect to two aspects: 1) using the in-place
self-similarity [17] to construct and train a dictionary from the
LR image, and 2) using the trained dictionary to learn a robust
first-order approximation of the nonlinear mapping from LR
to HR image patches. The HR image patch is reconstructed
from the given LR image patch using this learned nonlinear
function. We describe our algorithm in detail and present
both quantitative and qualitative results comparing it to several
recent algorithms.
II. METHODS
We assume that some areas of the input LR image Xo
contain high-frequency content that we can borrow for image
SR; i.e., Xo is an image containing some sharp areas but
overall having unsatisfactory pixel resolution. Let Xo and
X denote the LR (input) and HR (output) images, where
the output pixel resolution is r times greater. Let Yo and Y
denote the corresponding low-frequency bands. That is, Yo has
SSIAI2014

first-order regression f
Xo + V'FT(yO)(Y- Yo) ..
low-freq. band Y
Y = bicubic(X,)
Fig. I. For each patch y of the upsampled low-frequency image Y, we find
its in-place match YO from the low-frequency image Yo, and then perform
a first-order regression on Xo to estimate the desired patch x for the target
image X.
the same spatial dimension as Xo, but is missing the high
frequency content, and likewise for Y and X. Let Xo and x
denote a x a HR image patches sampled from Xo and X,
respectively, and let Yo and y denote a x a LR image patches
sampled from Yo and Y, respectively. Let (i,j) and (p, q)
denote coordinates in the 2-D image plane.
A. Proposed Super-Resolution Algorithm
The LR image is denoted as Xo E lRK,XK2, from which
we obtain its low-frequency image Yo E lRK,XK2 by Gaussian
filtering. We upsample Xo using bicubic interpolation by a
factor of r to get Y E lRTK,XTK2. Y is used to approximate
the low-frequency component of the unknown HR image X E
lRTK,XTK2. We aim to estimate X from the knowledge of
Xo,Yo and Y .
Fig. 1 is a block-diagram description of the overall SR
scheme presented. For each image patch y from the image Y at
location (i,j), we find its in-place self-similar example patch
Yo around its corresponding coordinates (is,js) in the image
Yo, where is = lilr + 0.5J and js = Ulr + 0.5J. Similarly,
we can obtain the image patch Xo from image Xo, which is a
HR version of Yo. The image patch pair {Yo,xo} constitutes
a LR/HR image prior example pair from which we learn a
first-order regression model to estimate the HR image patch x
for the LR patch y. We repeat the procedure using overlapping
patches of image Y, and the final HR image X is generated by
aggregating all the HR image patches x obtained. For large
upscaling factors, the algorithm is run iteratively, each time
with a constant scaling factor r.
B. Local Regression
The patch-based single image SR problem can be viewed
as a regression problem, i.e., finding a nonlinear mapping
function f from the LR patch space to the target HR patch
space. However, due to the ill-posed nature of the inverse
problem at hand, learning this nonlinear mapping function
requires good image priors and proper regularization. From
Section II-A, the in-place self-similar example patch pair
{Yo,xo} serves as a good prior example pair for inferring
the HR version of y. Assuming that the mapping function f
is continuously differentiable, we have the following Taylor
series expansion:
x f(y) = f(yo + y - Yo) (1)
f(yo) + "ilF(yo)(y - Yo) + O{II y - Yo lin
:::::; Xo + "ilF(yo)(y - Yo).
Equation (1) is a first-order approximation for the nonlinear
mapping function f. Instead of learning the mapping function
f, we can learn its gradient "ilf, which should be simpler.
We learn the mapping gradient "ilf by building a dictionary
using the prior example pair {Yo,xo} detailed in the next
section. With the function values learned, given any LR input
patch y, we first search its in-place self-similar example patch
pair {Yo,xo}, then find "ilf(yo) using the trained dictionary,
and then use the first-order approximation to compute the HR
image patch x.
Due to the discrete resampling process in downsampling
and upsampling, we expect to find multiple approximate in
place examples for y in the 3 x 3 neighborhood of (is,js),
which contains 9 patches. To reduce the regression variance,
we perform regression on each of them and combine the results
by a weighted average. Given the in-place self-similar example
patch pairs {YO,XO}Y=l for y, we have
9
x = L (XOi + "ilF(yo,)(y - Yo,)) Wi, (2)
i=l
where Wi = (liz) . exp {- II y - YOi II§ 120"2} with z the
normalization factor.
C. Dictionary Learning
The proposed dictionary-based method to learn the mapping
gradient "ilf is a modification of the work by Yang et al.
[15], [16] to guarantee detail enhancement. Yang et al. [15],
[16] developed a method for single image SR based on sparse
modeling. This method utilizes an overcomplete dictionary
Dh E lRnxK built using the HR image, which is an n x K
matrix whose K columns represents K "atoms" of size n,
where an "atom" is a sparse coefficient vector (i.e., a vector
of weights/coefficients in the sparse basis). We assume that
any patch x E lRn in the HR image X can be represented as
a sparse linear combination of the atoms of Dh as follows:
x:::::; DhQ, with II Q 110« K, Q E lRK. (3)
A patch y in the observed LR image can be represented using
a corresponding LR dictionary Dl with the same sparse coeffi
cient vector Q. This is ensured by co-training the dictionary Dh
with the HR patches and dictionary Dl with the corresponding
LR patches.
For a given input LR image patch y we determine the sparse
solution vector
Q* = min II GDIQ - Gy II� +A II Q 111a
(4)
where G is a feature extraction operator to emphasize high
frequency detail. We use the following set of I-D filters:
9
,
= [-1,0,1]' 9
2
= 9;, 93 = [-1,-2,1], 94 = 9� (5)
122

G is obtained as a concatenation of the responses from
applying the above I-D filters to the image. The sparsity of
the solution vector a* is controlled by A. In order to enhance
the texture details while suppressing noise and other artifacts,
we need to adapt the number of non-zero coefficients in the
solution vector a*, as increasing the number of non-zero
coefficients enhances the texture details but also enhances the
noise and artifacts. We use the standard deviation ((J') of a
patch to indicate the local texture content, and empirically
adapted A as follows:
{ 0.5 if (J' < 15
A = 0.1 if 15 � (J' � 25
0.01 otherwise
These (J' thresholds are designed for our 8-bit gray-scale
images and can easily be adapted for other image types.
The mapping gradient 7f for a given Yo is obtained as
7f(yo) = Dha*.
We make use of a bilateral filter as a degradation operator
instead of a Gaussian blurring operator to obtain the image
Yo from the given LR input image Xo for dictionary training,
as we are interested in enhancing the textures present while
suppressing noise and other artifacts. Dictionary training starts
by sampling in-place self-similar example image patch pairs
{Yo,XO}�l from the corresponding LR and HR images. We
generate the HR patch vector Xh = {xolO X02,• • • ,Xo=}, LR
patch feature vector Yi = {Yo" Yo2,• • • ,xo=} and residue
patch vector E = {xo, - Yo" x02 - YOz,'..,XOm - YOm}' We
use the residue patch vector E instead of the HR patch vector
Xh for training. The residue patch vector is concatenated with
the LR patch features, and a concatenated dictionary is defined
by
(6)
Here, Nand M are dimensions of LR and HR image patches
in vector form. Optimized dictionaries are computed by
II Xc - DcZ II� +A II Z 111
s.t. II DCi II�� 1, i = 1, ... , K
(7)
The trammg process is performed in an iterative manner,
alternating between optimizing Z and Dc using the technique
in [15].
III. EXPERIMENTS AND RESULTS
We evaluate the proposed SR algorithm both quantitatively
and qualitatively, on a variety of example images used in the
SR literature [17]. We compare our SR algorithm with recent
algorithms proposed by Glasner et al. [5], Yang et al. [18] and
Freedman et al. [3].We used open source implementations of
these three SR algorithms available online for comparison,
carefully choosing the various parameters within each method
for a fair comparison.
123
TABLE I
PREDICTION RMSE FOR ONE UPSCALlNG STEP (2x)
Images Bicubic
Glasner Yang Freedman
Ours
[5] [18] [3]
Chip 6.03 5.81 5.70 5.85 4.63
Child 7.47 6.74 7.06 6.51 5.92
Peppers 9.11 8.97 9.10 8.72 7.74
House 10.37 10.41 10.16 9.62 8.14
Cameraman 11.61 10.93 11.81 10.64 8.97
Lena 13.31 12.92 12.65 11.97 11.41
Barbara 14.93 14.24 13.92 13.23 12.22
Monarch 16.25 15.71 15.96 15.50 15.42
A. Algorithm Parameter Settings
We chose the image patch size as a = 5 and the iterative
scaling factor as r = 2 in all of our experiments. Bicubic
interpolation on the input LR image Xo generates the low
frequency component Y of the target HR image X. A standard
deviation of 0.4 is used in the low-pass Gaussian filtering to
obtain the low-frequency component Yo of the input LR image
Xo. For clean images, we use the nearest neighbor in-place
example for regression, whereas in the case of noisy images,
we average all the 9 in-place example regressions for robust
estimation, where (J' is the only tuning parameter needed to
compute the weight Wi in (2) depending on the noise level.
K = 512 atoms are used to train and build the dictionaries
Dh and Dl used in the experiments.
B. Quantitative Results
In order to obtain an objective measure of performance
for the SR algorithms under comparison, we validated the
results of several example images taken from [10] (whose
names appear in Table 1) using the root mean square error
(RMSE). The results of all the algorithms are shown in Table
1 for one upscaling step (2 x). From Table 1 we observe
that SR using simple bicubic interpolation performs the worst
due to the assumption of overly smooth image priors. Yang's
SR algorithm performs better than bicubic interpolation in
terms of RMSE values for the different images. Glasner's
and Freedman's SR methods have very similar RMSE values,
since both the methods are closely related by using local self
similar patches to learn the HR image patches from a single
LR image. The proposed SR algorithm has the best RMSE
values, as it combines the advantages of in-place example
patches and their corresponding local self similarity learned
using the dictionary-based approach.
C. Qualitative Results
Real applications requiring SR rely on three main aspects:
image sharpness, image naturalness (affected by visual arti
facts) and the speed of the algorithm to super-resolve. We
will discuss the SR algorithms compared here with respect
to these aspects. Fig. 2 shows the SR results of the different
approaches on "child" by 4 x, "cameraman" by 3 x and on
"castle" by 2 x. As shown, Glasner's and Freedman's SR
algorithms give rise to overly sharp images, resulting in visual
artifacts, e.g., ghosting and ringing artifacts around the eyes

Original Bicubic Glasner Yang Freedman Ours
Fig. 2. Super-resolution results on "child" (4x), "cameraman" (3x) and "castle" (2x). Results are better viewed in zoomed mode.
in "child", and jagged artifacts along the towers in "castle" .
Also, the details of the camera are smudged in "cameraman"
for both algorithms. The results of the Yang's SR algorithm
are generally a little blurry and they contain small visible
noise-like artifacts across the images upon a closer look.
In comparison, our algorithm is able to recover the local
texture details as well as sharp edges without sacrificing the
naturalness of the images.
IV. CONCLUSION
In this paper we propose a robust first-order regression
model for single-image SR based on local self-similarity
within the image. Our approach combines the advantages
of learning from in-place examples and learning from local
self-similar patches within the same image using a trained
dictionary. The in-place examples allow us to learn a local
regression function for the otherwise ill-posed mapping from
LR to HR image patches. On the other hand, by learning
from local self-similar patches elsewhere within the image,
the regression model can overcome the problem of insufficient
number of in-place examples. By conducting various experi
ments and comparing with existing algorithms, we show that
our new approach is more accurate and can produce more
natural looking results with sharp details by suppressing the
noisy artifacts present within the images.
REFERENCES
[I] S. Dai, M. Han, W. Xu, Y, Wu, Y. Gong, and A. K. Katsaggelos,
"SoftCuts: a soft edge smoothness prior for color image super-resolution,"
IEEE Trans. Image. Process., vol. 18,no. 5,pp. 969-981,May 2009.
[2] R. Fattal, "Image upsampling via imposed edge statistics;' ACM Trans
actions on Graphics, vol. 26,no. 3,pp. 95-1-95-8,Jul. 2007.
[3] G. Freedman and R. Fallal, "Image and video upscaling from local self
examples," ACM Transactions on Graphics, vol. 30,no. 2,pp. 12-1-12-
II,Apr. 2011.
124
[4] W. T. Freeman, T. R. Jones, and E. C. Pasztor, "Example-based super
resolution," IEEE Comput. Graph. Appl., vol. 22,no. 2,pp. 56-65,Mar.
2002.
[5] D. Glasner, S. Bagon, and M. Irani, "Super-resolution from a single
image;' in Proc. IEEE Int. Con! Computer Vision, pp. 349-356,2009.
[6] H. He and W-c. Siu, "Single image super-resolution using Gaussian
process regression," in Proc. IEEE C0I1f Computer Vision and Pattern
Recognition, pp. 449-456,2011.
[7] K. I. Kim and Y. Kwon, "Single-image super-resolution using sparse
regression and natural image prior," IEEE Trans. Pattern. Anal. Mach.
buell., vol. 32,no. 6,pp. 1127-1133,Jun. 2010.
[8] X. Li and M. T. Orchard, "New edge-directed interpolation," IEEE Trans.
Image. Process., vol. 10,no. 10,pp. 1521-1527,Oct. 2001.
[9] S. Mallat and G. Yu, "Super-resolution with sparse mixing estimators,"
IEEE Trans. Image. Process., vol. 19,no. II,pp. 2889-2900,Nov. 2010.
[10] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A dataset of human
segmentation natural images and its application to evaluating segmen
tation algorithms and measuring ecological statistics," in Proc. Int. Con!
Computer Vision, pp. 416-423,2001.
[11] Q. Shan, Z. Li, J. Jia, and C-K. Tang, "Fast image/video upsampling,"
ACM Transactions on Graphics, vol. 27,no. 5,pp. 153-1-153-8,Dec.
2008.
[12] J. Sun, J. Sun, Z. Xu, and H-Y. Shum, "Gradient profile prior and its
applications in image super-resolution and enhancement," IEEE Trans.
Image. Process., vol. 20,no. 6,Jun. 2011.
[13] R. Timofte, V. D. Smet, and L. V. Gool, "Anchored neighborhood
regression for fast example-based super-resolution," in IEEE Int. Con!
Computer Vision, 2013.
[14] Q. Wang, X. Tang, and H. Shum, "Patch based blind image super
resolution," in Proc. IEEE Int. COllf Computer Vision, pp. 709-716,2005.
[15] J. Yang, J. Wright, T. S. Huang, and Y. Ma, "Image super-resolution via
sparse representation;' IEEE Trans. Image. Process., vol. 19,no. 11,pp.
2861-2873,Nov. 2010.
[16] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. S. Huang, "Coupled dictio
nary training for image super-resolution," IEEE Trans. Image. Process.,
vol. 21,no. 8,pp. 3467-3478,Aug. 2012.
[17] J. Yang, Z. Lin, and S. Cohen, "Fast image super-resolution based on
in-place example regression," in Proc. IEEE Con! Computer Vision and
Pattern Recognition, pp. 1059-1066,2013.
[18] C-Y. Yang, J-B. Huang, and M-H. Yang, "Exploiting self-similarities for
single frame super-resolution," in Proc. Asian Con! Computer Vision, pp.
497-510,2010.

5 single image super resolution using

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to 5 single image super resolution using (20)

More from Alok Padole (6)

Recently uploaded (20)

5 single image super resolution using