Quality enhamcment

Visual Quality Enhancement in DCTDomain Spatial Downscaling Transcoding
Using Generalized DCT Decimation.

Presented by:
Marwa Ahmed
Mona Ragheb
Sara Serag
Yara Ali

Agenda
•

Introduction



Definition
What is meant by:






Transcoding
Spatial Domain & DCT Domain
Downscaling
Alias
Quantization





•

Video Adaptation

Frequency Synthesis

Generalized DCT Decimation For Spatial
Downscaling

Agenda
•

Computation Reduction Using Sparse matrix
representation

•

Analysis of the proposed DCT decimation filter

•

Experimental results


Optimal list squares Up-Scaling filters ( Steps )

•

Peak Signal to Noise ratio

•

Conclusion

Visual Quality Enhancement in DCT-Domain Spatial
Downscaling Transcoding Using Generalized DCT
Decimation.

•The goal image enhancement is to improve the
image quality so that the processed image PSNR is
high and less computational complexity

Abstract
1. we propose a generalized discrete cosine transform
(DCT) decimation scheme for DCT-domain spatial
downscaling which performs two-fold decimation on
subframes of a flexible size larger than the traditional
8 ᵡ block size to improve the visual quality.
8
2.Efficient sparse-matrix:
representations are then derived to reduce the
computation of the proposed DCT decimation method.

Abstract (cont.)
3.We compare
the filtering performances and computational complexities of the
proposed scheme and the pixel-domain downscaling schemes
Our analysis shows that :

 proposed scheme can reduce the aliasing artifact compared to
the pixel-domain downscaling schemes,
Where as the computational complexity may be increased We also:
 integrate the proposed decimation scheme into the cascaded
DCT-domain transcoder for spatial downscaling of a pre encoded
video into its quarter size
 Experiments show the proposed approach can achieve better
visual quality than the existing schemes

What is meant by:


Transcoding :

Video transcoding is an operation
of converting a video bit-stream into from one format
into another format (e.g., bit-rate , frame-rate, spatial
resolution, and coding syntax). It is an efficient means of
achieving fine and dynamic video adaptation.



Video adaptation :

convert the video bit rate
according to the channel conditions. Since the preencoded video is encoded at high quality and bit rate.
For low bandwidth connections, the video bit rate needs
to be converted to low bit rate.

Spatial Domain & DCT Domain
Spatial Domain (Image Enhancement):

Definition:
is manipulating or changing an image representing an object in
space to enhance the image for a given application.
•Techniques are based on direct manipulation of pixels in an
image
•Used for filtering basics, smoothing filters, sharpening
filters, unsharp masking and laplacian

Discrete Cosine Transform (DCT) domain
•This allows us to discard those equations involving the higher
frequency components, reducing the size of the equation set
considerably.
•in the DCT domain, each equation’s significance is
dependent on the corresponding DCT frequency
•Does not affect the compressibility of the original image
because it enhance the image in the decompression

Aliasing:
• When a signal is under-sampled, aliasing can result
•Aliasing is when high frequency components masquerade as low
frequency ones, and can result from improper image sampling

Downscailing : The operation of retaining the low-frequency
coefficients of aDCT sub-frame and taking the half-size IDCT
Each N M sub-frame is extracting only the (N/2) (M/2)
low-frequency.

Quantization: is the process of converting a continuous analog
audio signal to a digital signal with discrete numerical values.

Frequency synthesis : downscaling method first synthesizes an
incoming macroblock consisting of four 8 ᵡ DCT blocks into
8
one 16 ᵡ DCT block, and then obtains the downscaled 8 ᵡ
16
8
DCT block by extracting the 8 ᵡ low-frequency DCT coefficients
8
of the 16 ᵡ DCT block
16

•In realizing a transcoder, the computational cost and the
picture quality are usually the two most important concerns.
A cascaded DCT-domain transcoder (CDDT):
as depicted in Fig. 1, was first proposed in for spatial
downscaling where a DCT-domain bilinear filter was used as
the anti aliasing filter for the spatial downscaling.
•cascade a decoder followed by an encoder. This cascaded
pixel-domain architecture is flexible and can be used for bit
rate adaptation and spatio-temporal resolution conversion
without drift.

MC: reduces the temporal redundancy.
DCT: reduces the spatial redundancy and achieves energy compaction

Quantization is performed to achieve higher compression ratio.
Variable-length coding.
 VLC: is applied after the quantization to reduce the remaining
redundancy.
 decoder decodes the compressed input video
 encoder re encodes the decoded video into the target format
A video picture is predicted from its reference pictures and only the
prediction errors are coded.
the encoder reuses the motion vectors along with other information
extracted from the input video bit stream.

II. Generalized DCT decimation
for spatial downscaling

Formulation of generalized DCT
decimation

Formulation of generalized
DCT decimation
STEP 1:
A group of consecutive 8-samples DCT vectors are
first transformed into an N-pixel vector by 8-point
IDCT, Where N is a multiple of 8.

DCT decimation
The N-pixel vector is then transformed
into its corresponding DCT vector by Npoint DCT

DCT decimation
The N-point DCT representation of fN can be computed
by:

fN: N-pixel vector that’s composed of 8-pixel vectors bi , i=
1……, N/8

TN: N-point DCT transform matrix that’s divided into N/8
columns of submatrices TN,i of size Nx8

DCT decimation

DCT decimation
STEP 2:
DCT decimation is subsequently performed on
the N-sample DCT vector by extracting the N/2
low-frequency DCT coefficients followed by N/2point IDCT to obtain a downscaled N/2-pixel
vector

DCT decimation
STEP 3:
The N/2-Pixel vector is transformed into a group
of consecutive 8-Sample DCT vectors by 8-Point
DCT to form the output DCT array

•Computation reduction using
sparse matrix representations

•Analyses of DCT-DECIMATION
downscaling filters



To reduce computation for matrix operations in
(4) and (7)

can be represented in sparse matrix form

The following characteristics have been noted in
with dimension (N/2) * 8:
1.
General case:
The entries of r th row in
are all zeros except the r th entry where r
= 0, N/8, 2N/8, 3N/8
About N/8 of the entries are zeros.
2.
Special case 1:
For K = N/8 is even


Where i = 1, …….., N/16 r = 0, ……, N/2 and c = 0, …, 7
3.
Special case 2:
for K = N/8 is odd
Where i = 1, …….., N/16 r = 0, ……,, N/2 and c = 0, …, 7
for matrix with i = N/16 + 1 , the entries of odd values of r + c is
zero for
r ≠ 0, N/8, 2N/8 , 3N/8
at most half of the entries are zeros.

Based on the previous facts:

are defined to reduce computations ,
For i = 1, …., k/2 where k is even :

Substituting in

:








The operation of retaining the low-frequency coefficients of a DCT subframe and taking the half-size IDCT is, in effect, to perform anti-aliasing
filtering and then followed by downsampling on the sub-frame in the pixel
domain.
Following is the analysis of the performances and complexities of various
downscaling filters for the 1-D case.
For N samples of 1-D signal x, when downscaled by a factor of two, the
downscaled N/2-sample signal y is obtained as follows:
The downscaling filter is defined as :

Which is considered as a linear filter.





The linear transform can be represented as
an N-band filter bank structure
the z-transform of the output y can be
obtained by:

N increases, the gain of
DCT decimation
filters, |F0(z)|, becomes
much flatter in the low
frequency part
As

(0~π/2), whereas the gain
decreases rapidly in the highfrequency part (π/2~π).
For the bilinear filter, the
gain in the high-frequency
part is always larger than its
counterparts of DCT
decimation filter and 7-tap
filter.
The smaller gain in the
high-frequency part implies
less visible aliasing artifacts
the magnitude responses of the two pixel-domain in the downscaledfilter
filters: the bilinear image.
and the 7-tap filter, and the generalized DCT decimation filters with N =
8, 16, 72, and 288

N

•increasing the sub-frame size for the
DCT decimation filters will lead to
better quality of downscaled image
but it will also increase the
computational complexity
significantly.
•The shown table lists computational
complexities with different N values
and different filter:

Tab length

Bilinear
7-tab
Gaussian

Avg.
Multiplicati
ons
8

Avg.
additions

56

48

8

Average computational complexity for
pixel-domain downscaling scheme

Avg.
Multiplications
Gen.
DCT
decima
tion

Avg. additions

Sparse
Gen.
Sparse
matrix
DCT
matrix
decom decima decom
positio
tion
positio
n
n
2788
4208
2792

352

4224

32

384

228

368

232

16

192

100

176

104

8

64

20

64

20

Average computational complexity
using generalized DCT-decimation
scheme









We use:
One CIF (352* 288)
Two ITUR(704*576)
In each video 150 frame
Encoded by front-end MPEG-2 encoder
Each coded video is transcoded by using
CDDT







Resulting in a spatially downscaled video
of quarter size
We implement bilinear filter and 7-tap
gaussian filter
Each downscaled image is decoded and
up-scaled to its original size



The optimal least- squares upscaling filter
matrix minimize the error between
original sized image & its reconstructed
(downscaled & then upscaling)








Steps:
Divide each downscaled pixel vector
into N/2 sample pixel vector then
transform it to N/2 sample DCT vector.
Expand the size of each N/2 sample DCT
to N sample by padding zero coefficient in
high frequency bands
Apply N-point IDCT







We compare PSNR values of o/p image of
downscaling filter followed by upscaling
filter & final o/p of transcoders.
PSNR, is an engineering term for the
ratio between the maximum possible
power of a signal and the power of
corrupting noise .
So if PSNR is high the noise is low so the
quality is high
PSNR= 20 log10 (255/ RMSE)







We experiment results in N=8,16,32

In N=16,32 we have better visual quality
over that N=8 but in the same time we
increased computational complexity
So we can use N=8 in low activity region &
N=16,32 in high activity region to achieve
good trade-off between computational
complexity & visual quality.

Quality enhamcment

More Related Content

What's hot (19)

Similar to Quality enhamcment (20)

More from Yara Ali (6)

Recently uploaded (20)

Quality enhamcment

Editor's Notes