SlideShare a Scribd company logo
IEEE 2005 Workshop on Signal Processing Systems (SIPS'05), November 2, Athens, Greece
Flexible Hardware Architecture for 2-D
Separable Convolution-based Scaling
There is not a single scaling technique that suites all kind of images (photo, CAD, Text...) the user is willing to print
or display. Formally, any convolution-based scaling operation can be decomposed in three steps: an anti-aliasing
filter, image reconstruction by continuous convolution and re-sampling to the final grid. Based on this, we propose
a flexible hardware-friendly discrete convolution engine operating a memory that stores a programmable 2D-
separable interpolation kernel. We also state a technique for optimizing the memory size given the kernel and the
scale factor. Finally, we describe a novel flexible filter that overcomes aliasing artifacts regardless of image
frequency content.
Jordi Arnabat and Francisco Cardells
Hewlett-Packard, Large-Format Technology Lab, Barcelona, Spain
{jordi.arnabat, francisco.cardells}@hp.com
Image Scaling
. Not a unique interpolation technique to
achieve good IQ for all types of images:
adaptable HW is the key to survival.
. Formally, scaling can be thought of as:
continuous reconstruction of the discrete
input and re-sampling at the output grid.
. Propose a flexible hardware built from
a classical convolution-based scaler,
where IQ is chosen by means of a
programmable kernel.
Filtering Stage
0 1 2
1 2 3 4
Y1
5 6 7 8

w2
1
w2
2
Y2
A B
1 2 3
A
B
Y1
Digital
interpolator
low pass
filter
low pass
filter
3 4 5
4 5 6
A
B
Y2
Digital
interpolator
low pass
filter
low pass
filter
0
Scaler Data Flow
. Downscaling implies a pre-filtering step
to remove frequencies not
representable in the output grid.
(aliasing)
(a) Moving Average
(b) Frequency-Sharpened CIC
(c) Multistage CIC
. Propose architecture to enable (a) & (b)
pre-filters.
. Wide range of interpolation
techniques: NN, bilinear, bicubic,
gaussian, …, yours!
. Complexity/latency of the hardware
is determined by the interpolation
function support.
. Resampling by means of shift-
variant FIR filter, of length = kernel
support
. Kernel shape can be programmed in
a memory by means of a LUT,
sampled at .
. As a design rule, any kernel shape
needs twice as many samples per
interval as the maximum scale
factor.
. For example, a scaler performing up
to 32x, using a 4 tabs support
kernel, 8-b word precision requires a
2Kb LUT. The datapath for this
interpolator requires 2.2 kgates.
bilinear
nearest neighbor
4
x
Interpolation
Pre-filtering
Conventional Scaling uses a hardwired set
of rules for upscaling and another for
downscaling.
Instead we build any scaling operation as a
flexible prefiltering + interpolation
this flexibility is required as there is not a
single best scaling algorithm for all kind of
images
Programmable Low-pass
FIR filter.
Cut-off frequency given by
downscale factor
Programmable
Continuous
Convolution
up-scaling
down-scaling
nearest neighbor bicubic
Interpolation, Kernel
Sampling
 
w
-1
w
0
w
1
w
2
W
W24W23W22W21
W14W13W12W11
...

Neighbor index ()
Programmable Interpolation Kernel
1/(1-Z -1 ) 1/(1-Z -1 )
1/(1-Z -1 ) R
R
(1-Z -1 ) 2
3
(1-Z -1 ) (1-Z -1 )
1/(1-Z -1 ) 1/(1-Z -1 )
1/(1-Z -1 ) R
R
(1-Z -1 ) 2
3
(1-Z -1 ) (1-Z -1 )
(a)
(b)
(a)
(b)
(c)
Down-scaling by a factor of 1.5 after (a) moving average and (b) frequency sharpened CIC
filter. Artifacts circled and images resized to aid direct comparison.
(a)
(b)
(original)
Frequency response of three different pre-filtering
schemes. (a) & (b) are combined into one flexible
architecture.
(a) Nearest neighbor (b) Bilinear interpolation
(c) B-spline order 3 (d) Keys’ bicubic a=-1/2 Interpolation by continuous convolution. Principles of operation.

k*D2
1 2 3 n
1 2 k
w
1
w
2
o[k]
Shape of the Interpolation kernel is sampled at a
given frequency (). Data (weights) is stored LUT-
wise in a memory.
. In down-scaling the low-pass filter does
not have to be applied to all the
incoming pixels.
. Instead only the base points for the
interpolation are pre-computed to
remove the aliasing frequencies.
. There must be a number of equivalent
serial low-pass filters equal to the
kernel support.

More Related Content

PPTX
Machine Vision on Embedded Hardware
PDF
Image processing by manish myst, ssgbcoet
PDF
Handwritten Digit Recognition using Convolutional Neural Networks
PPTX
Convolutional Neural Network (CNN)
PPTX
Comparison of Learning Algorithms for Handwritten Digit Recognition
PPTX
Convolution Neural Network (CNN)
PDF
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PPTX
Introduction to Convolutional Neural Networks
Machine Vision on Embedded Hardware
Image processing by manish myst, ssgbcoet
Handwritten Digit Recognition using Convolutional Neural Networks
Convolutional Neural Network (CNN)
Comparison of Learning Algorithms for Handwritten Digit Recognition
Convolution Neural Network (CNN)
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
Introduction to Convolutional Neural Networks

What's hot (20)

PPT
Cnn method
PDF
Convolutional neural network
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
[Paper] Multiscale Vision Transformers(MVit)
PDF
2019-06-14:7 - Neutral Network Compression
PPTX
High Performance Parallel Computing with Clouds and Cloud Technologies
PDF
Neural network based image compression with lifting scheme and rlc
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PPTX
Deep learning lecture - part 1 (basics, CNN)
PPTX
Introduction to CNN
PPTX
Scalable Parallel Computing on Clouds
PPTX
Machine Learning - Convolutional Neural Network
PPTX
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
PPTX
Image Compression Using Neural Network
PDF
AI On the Edge: Model Compression
PPTX
Convolutional neural network
PPTX
CONVOLUTIONAL NEURAL NETWORK
Cnn method
Convolutional neural network
International Journal of Engineering and Science Invention (IJESI)
[Paper] Multiscale Vision Transformers(MVit)
2019-06-14:7 - Neutral Network Compression
High Performance Parallel Computing with Clouds and Cloud Technologies
Neural network based image compression with lifting scheme and rlc
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
Deep learning lecture - part 1 (basics, CNN)
Introduction to CNN
Scalable Parallel Computing on Clouds
Machine Learning - Convolutional Neural Network
DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Image Compression Using Neural Network
AI On the Edge: Model Compression
Convolutional neural network
CONVOLUTIONAL NEURAL NETWORK
Ad

Similar to cbs_sips2005 (20)

PDF
Accelerating Real Time Applications on Heterogeneous Platforms
PDF
Comparison of different Fingerprint Compression Techniques
PDF
High Performance Medical Reconstruction Using Stream Programming Paradigms
PDF
Standardising the compressed representation of neural networks
PPTX
imagefiltervhdl.pptx
PDF
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
PDF
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
PPTX
APSys Presentation Final copy2
PDF
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
PPT
Threading Successes 06 Allegorithmic
PDF
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
PPTX
B.tech_project_ppt.pptx
PDF
Performance analysis of sobel edge filter on heterogeneous system using opencl
PDF
Conference Paper: Universal Node: Towards a high-performance NFV environment
PDF
DEEP LEARNING BASED BRAIN STROKE DETECTION
PDF
Types Of Window Being Used For The Selected Granule
PDF
High Speed and Area Efficient 2D DWT Processor Based Image Compression
PDF
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
PDF
A Survey on Image Processing using CNN in Deep Learning
PPT
Multi Processor Architecture for image processing
Accelerating Real Time Applications on Heterogeneous Platforms
Comparison of different Fingerprint Compression Techniques
High Performance Medical Reconstruction Using Stream Programming Paradigms
Standardising the compressed representation of neural networks
imagefiltervhdl.pptx
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
PIPELINED ARCHITECTURE OF 2D-DCT, QUANTIZATION AND ZIGZAG PROCESS FOR JPEG IM...
APSys Presentation Final copy2
Pipelined Architecture of 2D-DCT, Quantization and ZigZag Process for JPEG Im...
Threading Successes 06 Allegorithmic
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
B.tech_project_ppt.pptx
Performance analysis of sobel edge filter on heterogeneous system using opencl
Conference Paper: Universal Node: Towards a high-performance NFV environment
DEEP LEARNING BASED BRAIN STROKE DETECTION
Types Of Window Being Used For The Selected Granule
High Speed and Area Efficient 2D DWT Processor Based Image Compression
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...
A Survey on Image Processing using CNN in Deep Learning
Multi Processor Architecture for image processing
Ad

cbs_sips2005

  • 1. IEEE 2005 Workshop on Signal Processing Systems (SIPS'05), November 2, Athens, Greece Flexible Hardware Architecture for 2-D Separable Convolution-based Scaling There is not a single scaling technique that suites all kind of images (photo, CAD, Text...) the user is willing to print or display. Formally, any convolution-based scaling operation can be decomposed in three steps: an anti-aliasing filter, image reconstruction by continuous convolution and re-sampling to the final grid. Based on this, we propose a flexible hardware-friendly discrete convolution engine operating a memory that stores a programmable 2D- separable interpolation kernel. We also state a technique for optimizing the memory size given the kernel and the scale factor. Finally, we describe a novel flexible filter that overcomes aliasing artifacts regardless of image frequency content. Jordi Arnabat and Francisco Cardells Hewlett-Packard, Large-Format Technology Lab, Barcelona, Spain {jordi.arnabat, francisco.cardells}@hp.com Image Scaling . Not a unique interpolation technique to achieve good IQ for all types of images: adaptable HW is the key to survival. . Formally, scaling can be thought of as: continuous reconstruction of the discrete input and re-sampling at the output grid. . Propose a flexible hardware built from a classical convolution-based scaler, where IQ is chosen by means of a programmable kernel. Filtering Stage 0 1 2 1 2 3 4 Y1 5 6 7 8  w2 1 w2 2 Y2 A B 1 2 3 A B Y1 Digital interpolator low pass filter low pass filter 3 4 5 4 5 6 A B Y2 Digital interpolator low pass filter low pass filter 0 Scaler Data Flow . Downscaling implies a pre-filtering step to remove frequencies not representable in the output grid. (aliasing) (a) Moving Average (b) Frequency-Sharpened CIC (c) Multistage CIC . Propose architecture to enable (a) & (b) pre-filters. . Wide range of interpolation techniques: NN, bilinear, bicubic, gaussian, …, yours! . Complexity/latency of the hardware is determined by the interpolation function support. . Resampling by means of shift- variant FIR filter, of length = kernel support . Kernel shape can be programmed in a memory by means of a LUT, sampled at . . As a design rule, any kernel shape needs twice as many samples per interval as the maximum scale factor. . For example, a scaler performing up to 32x, using a 4 tabs support kernel, 8-b word precision requires a 2Kb LUT. The datapath for this interpolator requires 2.2 kgates. bilinear nearest neighbor 4 x Interpolation Pre-filtering Conventional Scaling uses a hardwired set of rules for upscaling and another for downscaling. Instead we build any scaling operation as a flexible prefiltering + interpolation this flexibility is required as there is not a single best scaling algorithm for all kind of images Programmable Low-pass FIR filter. Cut-off frequency given by downscale factor Programmable Continuous Convolution up-scaling down-scaling nearest neighbor bicubic Interpolation, Kernel Sampling   w -1 w 0 w 1 w 2 W W24W23W22W21 W14W13W12W11 ...  Neighbor index () Programmable Interpolation Kernel 1/(1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) R R (1-Z -1 ) 2 3 (1-Z -1 ) (1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) 1/(1-Z -1 ) R R (1-Z -1 ) 2 3 (1-Z -1 ) (1-Z -1 ) (a) (b) (a) (b) (c) Down-scaling by a factor of 1.5 after (a) moving average and (b) frequency sharpened CIC filter. Artifacts circled and images resized to aid direct comparison. (a) (b) (original) Frequency response of three different pre-filtering schemes. (a) & (b) are combined into one flexible architecture. (a) Nearest neighbor (b) Bilinear interpolation (c) B-spline order 3 (d) Keys’ bicubic a=-1/2 Interpolation by continuous convolution. Principles of operation.  k*D2 1 2 3 n 1 2 k w 1 w 2 o[k] Shape of the Interpolation kernel is sampled at a given frequency (). Data (weights) is stored LUT- wise in a memory. . In down-scaling the low-pass filter does not have to be applied to all the incoming pixels. . Instead only the base points for the interpolation are pre-computed to remove the aliasing frequencies. . There must be a number of equivalent serial low-pass filters equal to the kernel support.