SlideShare a Scribd company logo
ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010




       FPGA Based Design of High Performance
         Decimator using DALUT Algorithm
                                          Rajesh Mehra1, Swapna Devi2
                1
                  National Institute of Technical Teachers’ Training & Research, Chandigarh, India
                                           Email: rajeshmehra@yahoo.com
                2
                  National Institute of Technical Teachers’ Training & Research, Chandigarh, India
                                         Email: swapna_devi_p@yahoo.co.in
Abstract—this paper presents a multiplier less approach         ASICs and DSP chips have been the traditional solution
to implement high speed and area efficient decimator for        for high performance applications, now the technology
down converter of Software Defined Radios. This                 and the market demands are looking for changes.On
technique substitutes multiply-and-accumulate (MAC)             one hand, high development costs and time-to-market
operations with look up table (LUT) accesses. Proposed          factors associated with ASICs can be prohibitive for
decimator has been implemented using Partitioned
distributed arithmetic look up table (DALUT) algorithm
                                                                certain applications while, on the other hand,
by taking optimal advantage of embedded LUTs of target          programmable DSP processors can be unable to meet
FPGA device. This method is useful to enhance the system        desired performance due to their sequential-execution
performance in terms of speed and area. The proposed            architecture [7]. In this context, embedded FPGAs offer
decimator has used half band polyphase decomposition            a very attractive solution that balance high flexibility,
FIR structure. The decimator has been designed with             time-to-market, cost and performance. Therefore, in
Matlab 7.6, simulated with Modelsim 6.3XE simulator,            this paper, a decimator is designed and implemented on
synthesized with Xilinx Synthesis Tool (XST) 10.1 and           FPGA device. An impulse response of an FIR filter
implemented on Spartan-3E based 3s500efg320-4 FPGA                                         K

device. The proposed DALUT approach has shown an                may be expressed as: Y   =¥ k
                                                                                          Ck x        (1)
                                                                                          k=1
improvement of 24% in speed by saving almost 50%                where C1,C2…….CK are fixed coefficients and the x 1,
resources of target device as compared to MAC based
                                                                x2……… xK are the input data words. A typical digital
approach.
                                                                implementation will require K multiply-and-accumulate
Index Terms— ASIC, DALUT, FPGA, MAC, SDR                        (MAC) operations, which are expensive to compute in
                                                                hardware due to logic complexity, area usage, and
                    I. INTRODUCTION                             throughput. Alternatively, the MAC operations may be
                                                                replaced by a series of look-up-table (LUT) accesses
     The widespread use of digital representation of            and summations. Such an implementation of the filter
signals for transmission and storage has created                is known as distributed arithmetic (DA).
challenges in the area of digital signal processing [1].           The digital signal processing application by using
The applications of digital FIR filter and up/down              variable sampling rates can improve the flexibility of a
sampling techniques are found everywhere in modem               software defined radio. It reduces the need for
electronic products. For every electronic product, lower        expensive anti-aliasing analog filters and enables
circuit complexity is always an important design target         processing of different types of signals with different
since it reduces the cost [2]. There are many                   sampling rates. It allows partitioning of the high-speed
applications where the sampling rate must be changed.           processing into parallel multiple lower speed
Interpolators and decimators are utilized to increase or        processing tasks which can lead to a significant saving
decrease the sampling rate. Up sampler and down                 in computational power and cost. Wideband receivers
sampler are used to change the sampling rate of digital         take advantage of multirate signal processing for
signal in multi rate DSP systems. This rate conversion          efficient channelization and offers flexibility for
requirement leads to production of undesired signals            symbol synchronization.
associated with aliasing and imaging errors. So some
kind of filter should be placed to attenuate these errors                           II. DECIMATORS
[3]-[5].Today’s consumer electronics such as cellular
phones and other multi-media and wireless devices                  Typically lowpass filters are used to reduce the
often require digital signal processing (DSP) algorithms        bandwidth of a signal prior to reducing the sampling
for several crucial operations[6] in order to increase          rate. This is done to minimize aliasing due to the
speed, reduce area and power consumption. Due to a              reduction in the sampling rate. Down sampler is basic
growing demand for such complex DSP applications,               sampling rate alteration device used to decrease the
high performance, low-cost Soc implementations of               sampling rate by an integer factor [8]. An down-
DSP algorithms are receiving increased attention                sampler with a down-sampling factor M, where M is a
among researchers and design engineers. Although                positive integer, develops an output sequence y[n] with


                                                            9
© 2010 ACEEE
DOI: 01.ijsip.01.02.02
ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010


a sampling rate that is (1/M)-th of that of the input                                           Ye
                                                                                                      jω
                                                                                                           =
                                                                                                                1{X e
                                                                                                                       jω /2
                                                                                                                             +X −e
                                                                                                                                     jω/2
                                                                                                                                              }
sequence x[n]. The down sampler is shown in Figure1.                                                            2                                   (12)
                                                                                                  The two terms have an overlap due to which original
                                                                                                “shape” of X(ejω/2) is lost when x[n] is down-sampled.
                                                                                                This overlap causes the aliasing that takes place due to
                                                                                                under-sampling. There is no overlap, i.e., no aliasing,
                 Figure1. Down Sampler                                                          only if
   Down-sampling operation is implemented by                                                              jω
                                                                                                     X  e =0 for ∣ω∣≥π /2                        (13)
keeping every Mth sample of x[n] and removing M-1
in-between samples to generate y[n]. The input and                                              In general, Aliasing is absent if and only if
output relation of down sampler can be expressed as:                                                  X e
                                                                                                                jω
                                                                                                                     =0 for ∣ω∣≥π / M
                  y[n] = x[nM]                        (2)                                                                                           (14)
   Applying the z-transform to the input-output relation                                            To overcome the effect of aliasing decimation filters
of a factor-of-M down-sampler, we get                                                           are used. The specifications for the lowpass decimation
                                ∞
                                                                                                filter is given by
                                                                                     (3)
                                                                                                                       {                   }
                                                  −n                                                                   1,    ∣ω∣≤ω / M
                Y  z=         ∑        x [ Mn] z                                                 ∣H  e
                                                                                                            jω
                                                                                                                 ∣=                c
                           n=−∞                                                                                        0,   π / M ≤∣ω∣≤π              (15)
   The expression on the right-hand side of Eq (3)
cannot be directly expressed in terms of X(z). To get
around this problem, a new sequence x int [n] can be                                                                        III. DALUT ALGORITHM
expressed as:
                                                                                                   DALUT algorithm is an efficient method for
x
    int       0, {
        [ n]= x [n ],            n= 0, ± M, ±2M , 
                                 otherwise                           }               (4)
                                                                                                computing inner products when one of the input vectors
                                                                                                is fixed. It uses look-up tables and accumulators instead
Then
                 ∞                                     ∞                                        of multipliers for computing inner products and has
                                   −n                                          −n               been widely used in many DSP applications such as
Y  z=          ∑        x [ Mn] z =               ∑        x
                                                                  int
                                                                        [ Mn] z
            n=−∞                               n=−∞                                             DFT, DCT, convolution, and digital filters. The
            ∞                                                                                   example of direct DA inner-product generation is
                                    −k / M                        1/ M                          shown in Eq. (1) where xk is a 2's-complement binary
     =     ∑         x
                         int
                               [k] z       =X
                                                       int
                                                             z           
                                                                                     (5)
          k=−∞                                                                                  number scaled such that |xk| < 1. We may express each
                                                                                                xk as
Now, xint [n] can be formally related to x[n] as follows:
                                                                                                                                                    (16)
                 x int [ n ]=c [n ]⋅x [ n ]                                          (6)
                                                                                                   where the bkn are the bits, 0 or 1, bk0 is the sign bit.
Where
                                                                                                Now combining Eq. (1) and (16) in order to express y
                                                                                                in terms of the bits of xk ; we see
     c [ n]= 1,
             0,  {        n= 0, ± M, ±2M , 
                          otherwise                          }                       (7)
                                                                                                                                                (17)
A convenient representation of c[n] is given by
                                                                                                   The above Eq.(17) is the conventional form of
                                M −1
                           1                  kn                                                expressing the inner product. Interchanging the order of
                c [ n]=             ∑     W                                          (8)
                           M
                                 k= 0
                                              M                                                 the summations, gives us:
Where
                          W M =e− j2π /M                                                                                                    (18)
                                                  (9)
                                                                                                Eq.(18) shows a DA computation where the bracketed
Taking the z-transform of Eq.(6) and by making use of
                                                                                                term is given by
Eq.(8), we get

                                                                
                                 ∞           M −1
                         1
                                                       W kn x [n ] z−n                                                                      (19)
      X
          int
                 z =
                         M
                                 ∑            ∑          M
                               n=−∞          k= 0                                                  Each bkn can have values of 0 and 1 so Eq.(19) can
                                                                                    (10)
                                                                                                have 2K possible values. Rather than computing these

                                                                              
                          M −1            ∞                                                     values on line, we may pre-compute the values and
                 ¿1 ∑      ∑ x [ n ] W size6kn z −n
                                       M                                                        store them in a ROM. The input data can be used to
                  M
                     k= 0 n=−∞
                   M −1                                                                         directly address the memory and the result. After N
                 1
                 M
                    ∑ X z W −k
                   k= 0
                             M                                                                such cycles, the memory contains the result, y. As an
                                                                                                example, let us consider K = 4, C1 = 0.45, C2 = -0.65,
                                                 (11)
                                                                                                C3 = 0.15, and C4 = 0.55. The memory must contain all
   The spectrum of a factor-of-2 down-sampler with an
                                                                                                possible combinations (24 = 16 values) and their
input x[n] is shown in Fig2. The DTFTs of the output
                                                                                                negatives in order to accommodate the term which
and the input sequences of this down-sampler are then
                                                                                                occurs at the sign-bit time.
related as

                                                                                           10
© 2010 ACEEE
DOI: 01.ijsip.01.02.02
ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010


                                                                                                                                              (20)            Nyquist decimators provide same stop band
                                                                                                                                                          attenuation and transition width with a much lower
       The structure that can be used to compute these                                                                                                    order. An Lth-band Nyquist filter with L = 2 is called a
equations is shown in Fig6. The term xk may be written                                                                                                    half-band filter. The transfer function of a half-band
as                                                                                                                                                        filter is thus given by
              1                                                                                                                                                               −1     2                        (29)
       xk =     [ xk − ( −xk )]                                                                                                               (21)                H  z =α+z    E z 
              2                                                                                                                                                                   1
                                                                                                                                                          with its impulse response satisfying
and in 2's-complement notation the negative of xk may                                                                                                                            n= 0
be written as                                                                                                                                                       {}
                                                                                                                                                            h[ 2n ]= α,
                                                                                                                                                                     0,       otherwise                      (30)
                                                                                                                                              (22)

where the over score symbol indicates the complement
of a bit. By substituting Eq.(16) & (21) into Eq.(22), we
                                                     (23)

In order to simplify the notation later, it is convenient
to define the new variables as
                     −
        akn = bkn − bkn  for n=0                     (24)
and
                                                      −
       ak 0 = b k 0 − b k 0                                                                                                                   (25)             Figure3. MAC based Multiplier Implementation
where the possible values of the akn , including n=0, are
1. Then Eq.(23) may be written as                                                                                                                            In Half band filters about 50% of the coefficients of
                                                    (26)                                                                                                  h[n] are zero. This reduces the hardware requirement of
                                                                                                                                                          the proposed decimator significantly. The first
By substituting the value of xk from Eq.(26) into Eq.                                                                                                     decimator design is implemented by using multiplier
(1), we obtain                                                                                                                                            technique where 67 coefficients are processed MAC
                                                                                                                                                          unit as shown in Figure3. The second decimator design
                                                                                                                                              (27)        replaces MAC unit with LUT unit which is proposed
                                                                                                                                                          multiplier less technique as shown in Figure4.

                                                                                                                                              (28)


   It may be seen that Q(bn) has only 2(K-1) possible
amplitude values with a sign that is given by the
instantaneous combination of bits. The computation of
y is obtained by using a 2(K-1) word memory, a one-word
initial condition register for Q(O) , and a single parallel
adder sub tractor with the necessary control-logic gates.
                                                                                                                                                           Figure4. LUT based Multiplier Less Implementation
               IV. PROPOSED DECIMATOR DESIGN
                                                                                                                                                            All 67 coefficients are divided in two parts by using
   Equiripple based half band polyphase decimator is                                                                                                      polyphase decomposition. The 2 branch polyphase
designed and implemented using Matlab [9]. The                                                                                                            decomposition of an FIR decimator is shown in Figure5
length of the proposed decimator filter is 66 with 0.1                                                                                                    and can be expressed as:
transition widths 60 dB stop band attenuation whose                                                                                                                                                          (31)
                                                                                                                                                                   H  z =E  z 2 +z−1 E  z 2 
output is shown Figure2.                                                                                                                                                  0         1
                                                                                Ma gn itude Res ponse (dB )


                                        0




                                      -10




                                      -20




                                      -30
                     Magnitude (dB)




                                      -40




                                      -50




                                      -60




                                      -70


                                            0   0.1       0 .2   0.3   0.4                  0.5                      0 .6   0.7   0.8   0.9




                                                                                                                                                                     Figure5. Polyphase Decomposition
                                                                         Norma liz ed Freque nc y ( × π rad/sa mp le )




                   Figure2. Decimator Output



                                                                                                                                                     11
© 2010 ACEEE
DOI: 01.ijsip.01.02.02
ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010


                                                                    reduce the size in this proposed work, we can subdivide
                                                                    the LUT into a number of LUTs, called LUT partitions.
                                                                    Each LUT partition operates on a different set of taps.
                                                                    The results obtained from the partitions are summed.
                                                                    For example, for a 160 tap filter, the LUT size is
     Figure6. Computationally Efficient Structure                   (2^160)*W bits, where W is the word size of the LUT
                                                                    data. Dividing this into 16 LUT partitions, each taking
   The proposed computationally efficient equivalent                10 inputs (taps), the total LUT size is reduced to
structure is shown in Figure6. In a DA realization of a             16*(2^10)*W bits, a significant reduction. So in this
FIR filter structure, a sequence of input data words of             proposed design 67 coefficients are divided into two
width W is fed through a parallel to serial shift register,         sections with 34 and 33 coefficients respectively to
producing a serialized stream of bits. The serialized               perform polyphase decomposition. Then 34 coefficients
data is then fed to a bit-wide shift register. This shift           of one part have been processed by using (6 6 6 6 6 4)
register serves as a delay line, storing the bit serial data        DALUT partitioning to limit the size of LUTs. This
samples. The delay line is tapped (based on the input               multiplier less DALUT technique consists of input
word size W), to form a W-bit address that indexes into             registers, 4-input LUT unit and shifter/accumulator
a lookup table (LUT). The LUT stores all possible                   unit.
sums of partial products over the filter coefficients
space. The LUT is followed by a shift and adder                             V. IMPLEMENTATION RESULTS & DISCUSSION
(scaling accumulator) that adds the values obtained
from the LUT sequentially. A lookup table is                           The multiplier based and multiplier less decimators
performed sequentially for each bit (in order of                    are implemented and synthesized on Spartan-3E based
significance starting from the LSB). On each clock                  3s500efg320-4 target device. The modelsim based
cycle, the LUT result is added to the accumulated and               simulated output of the proposed decimator with 16 bit
shifted result from the previous cycle. For the last bit            precision is shown in Figure7.
(MSB), the lookup table result is subtracted, accounting
for the sign of the operand. This basic form of DA is
fully serial, operating on one bit at a time. If the input
data sequence is W bits wide, then a FIR structure takes
W clock cycles to compute the output. Symmetric and
asymmetric FIR structures are an exception, requiring
W+ 1 cycle, because one additional clock cycle is
needed to process the carry bit of the pre-adders.
   The inherently bit serial nature of DA can limit
throughput. To improve throughput, the basic DA
algorithm can be modified to compute more than one                            Figure7. Simulated Decimator Output
bit sum at a time. The number of simultaneously
computed bit sums is expressed as a power of two                    Table1 show the area, and speed comparison of both
called the DA radix. For example, a DA radix of 2                   techniques. The proposed DA based design shows 24%
(2^1) indicates that one bit sum is computed at a time; a           enhancement in speed by saving almost 50% of the
DA radix of 4 (2^2) indicates that two bit sums are                 resources as compared to MAC based design.
computed at a time, and so on. To compute more than
one bit sum at a time, the LUT is replicated. For                                     Table1. Resource Utilization
example, to perform DA on 2 bits at a time (radix 4),                     Logic           Multiplier Approach    Multiplier Less
the odd bits are fed to one LUT and the even bits are                   Utilization                                 Approach
simultaneously fed to an identical LUT. The LUT                         # of Slices         1055 out of 4656     472 out of 4656
results corresponding to odd bits are left-shifted before                                        (22%)                (10%)
                                                                      # of Flip Flops       1210 out of 9312     515 out of 9312
they are added to the LUT results corresponding to                                               (12%)                 (5%)
even bits. This result is then fed into a scaling                       # of LUTs         857 out of 9312 (9%)   590 out of 9312
accumulator that shifts its feedback value by 2 places.                                                                (6%)
Processing more than one bit at a time introduces a                   # of Multipliers        1 out of 20          0 out of 20
                                                                                                 (5%)                  (0%)
degree of parallelism into the operation, improving
                                                                       Speed (MHz)              49.574               61.215
performance at the expense of area.
   The size of the LUT grows exponentially with the
order of the filter. For a filter with N coefficients, the
LUT must have 2^N values. For higher order filters,
LUT size must be reduced to reasonable levels. To


                                                               12
© 2010 ACEEE
DOI: 01.ijsip.01.02.02
ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010


         1400                                                                          REFERENCES
         1200
         1000                                                 [1] Vijay Sundararajan, Keshab K. Parhi, “Synthesis of
          800
          600
                                                              Minimum-Area Folded Architectures for Rectangular
          400                       Multiplier                Multidimensional”, IEEE TRANSACTIONS ON SIGNAL
          200                       Multiplier Less           PROCESSING, pp. 1954-1965, VOL. 51, NO. 7, JULY
            0
                                                              2003.
                                                              [2] ShyhJye Jou, Kai-Yuan Jheng*, Hsiao-Yun Chen and An-
                                                              Yeu Wu, “Multiplierless Multirate Decimator I Interpolator
                                                              Module Generator”, IEEE Asia-Pacific Conference on
                                                              Advanced System Integrated Circuits, pp. 58-61, Aug-2004.
            Figure8. Resource Comparison
                                                              [3] Amir Beygi, Ali Mohammadi, Adib Abrishamifar. “AN
                                                              FPGA-BASED         IRRATIONAL          DECIMATOR           FOR
   The resource comparison of both multiplier and             DIGITAL RECEIVERS” in 9th IEEE International
multiplier less techniques have been shown in Figure8.        Symposium on Signal Processing and its Applications, pp. 1-
The multiplier approach has consumed 9-22 %                   4, ISSPA-2007.
resources as compared to 5-10% in case of multiplier          [4] Zhao Yiqiang; Xing Dongyang; Zhao Hongliang;
less approach in due to efficient LUT partitioning by         “Optimized Design of Digital Filter in Sigma-Delta AID
using proposed DALUT algorithm.                               Converter”, International Conference on Neural Networks
                                                              and Signal Processing, pp. 502 – 505, 2008.
                                                              [5] Nerurkar, S.B.; Abed, K.H.; “Low-Power Decimator
                     CONCLUSION
                                                              Design Using Approximated Linear-Phase N-Band IIR
   In this paper, an optimized half band polyphase            Filter”, IEEE Trans. on signal processing, vol. 54 , pp. 1550 –
decomposition technique has been presented to                 1553,2006.
implement the decimator for wireless applications. DA         [6] D.J. Allred, H. Yoo, V. Krishnan, W. Huang, and D.
                                                              Anderson, “A Novel High Performance Distributed
algorithm has been used to further enhance the speed
                                                              Arithmetic Adaptive Filter Implementation on an FPGA”, in
and area utilization of proposed design by taking an          Proc. IEEE Int. Conference on Acoustics, Speech, and Signal
optimal advantage of look up table structure of target        Processing (ICASSP’04), Vol. 5, pp. 161-164, 2004
FPGA. The proposed multiplier approach has shown an           [7] Patrick Longa and Ali Miri “Area-Efficient FIR Filter
improvement of 24% in speed by saving almost 50%              Design on FPGAs using Distributed Arithmetic”, pp248-252
resources of target device as compared to multiplier          IEEE International Symposium on Signal Processing and
based approach. So proposed design is optimal one to          Information Technology,2006.
provide cost effective solution for down converter            [8] S K Mitra, Digital Signal Processing, Tata Mc Graw Hill,
                                                              Third Edition, 2006.
section of Software Defined Radios
                                                              [9] Mathworks, “Users Guide Filter Design Toolbox”,
                                                              March-2007.




                                                         13
© 2010 ACEEE
DOI: 01.ijsip.01.02.02

More Related Content

PDF
PDF
FIR Filter Implementation by Systolization using DA-based Decomposition
PDF
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
PDF
nternational Journal of Computational Engineering Research(IJCER)
PDF
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
PDF
Ad24210214
PDF
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
PPTX
Signal and image processing on satellite communication using MATLAB
FIR Filter Implementation by Systolization using DA-based Decomposition
“FIELD PROGRAMMABLE DSP ARRAYS” - A NOVEL RECONFIGURABLE ARCHITECTURE FOR EFF...
nternational Journal of Computational Engineering Research(IJCER)
Random Valued Impulse Noise Removal in Colour Images using Adaptive Threshold...
Ad24210214
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...
Signal and image processing on satellite communication using MATLAB

What's hot (19)

PDF
Springer base paper
PDF
Google and SRI talk September 2016
PPTX
Dcp project
PDF
Satellite Image Resolution Enhancement Technique Using DWT and IWT
PDF
CSMR06a.ppt
PPTX
Design and implementation of DADCT
PDF
Lh2419001903
PPTX
discrete wavelet transform
PDF
Ba26343346
PPTX
Ppt
PDF
Resource Oriented Architecture for Managing Multimedia Content by Florian
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN
PDF
Non standard size image compression with reversible embedded wavelets
PDF
International journal of signal and image processing issues vol 2015 - no 1...
PDF
Volume 2-issue-6-2148-2154
PDF
Gh2411361141
PDF
Image compression using Hybrid wavelet Transform and their Performance Compa...
PDF
A High Performance Modified SPIHT for Scalable Image Compression
Springer base paper
Google and SRI talk September 2016
Dcp project
Satellite Image Resolution Enhancement Technique Using DWT and IWT
CSMR06a.ppt
Design and implementation of DADCT
Lh2419001903
discrete wavelet transform
Ba26343346
Ppt
Resource Oriented Architecture for Managing Multimedia Content by Florian
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN
Non standard size image compression with reversible embedded wavelets
International journal of signal and image processing issues vol 2015 - no 1...
Volume 2-issue-6-2148-2154
Gh2411361141
Image compression using Hybrid wavelet Transform and their Performance Compa...
A High Performance Modified SPIHT for Scalable Image Compression
Ad

Viewers also liked (12)

PDF
Neural Network Based Noise Identification in Digital Images
PDF
An Iterative Improved k-means Clustering
PDF
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
PDF
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
PDF
Assessment of Market Power in Deregulated Electricity Market with Congestion
PDF
GR-FB Block Cleaning Scheme in Flash Memory
PDF
An Information Maximization approach of ICA for Gender Classification
PDF
Performance Analysis of GA and PSO over Economic Load Dispatch Problem
PDF
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
PDF
User Controlled Privacy in Participatory Sensing
PDF
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
PDF
Happy at work 4 achieving peak performance at work and in life by bj manalo
Neural Network Based Noise Identification in Digital Images
An Iterative Improved k-means Clustering
Depth-Image-based Facial Analysis between Age Groups and Recognition of 3D Faces
Energy Efficient Data Transmission through Relay Nodes in Wireless Sensor Net...
Assessment of Market Power in Deregulated Electricity Market with Congestion
GR-FB Block Cleaning Scheme in Flash Memory
An Information Maximization approach of ICA for Gender Classification
Performance Analysis of GA and PSO over Economic Load Dispatch Problem
Hybrid approaches in Network Optical Routing with QoS based on Genetic Algori...
User Controlled Privacy in Participatory Sensing
Inverse Gamma Distribution based Delay and Slew Modeling for On- Chip VLSI RC...
Happy at work 4 achieving peak performance at work and in life by bj manalo
Ad

Similar to FPGA Based Design of High Performance Decimator using DALUT Algorithm (20)

PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
PDF
FPGA based Efficient Interpolator design using DALUT Algorithm
PDF
Multirate
PDF
Tele3113 wk7wed
PDF
Tele3113 wk7wed
PDF
Tele3113 wk7wed
PPT
PDF
Digfilt
PDF
Tele3113 wk8wed
PDF
30 CHL PCM PDH SDH BY SKG
PDF
Implementation and validation of multiplier less fpga based digital filter
PPT
Analog to-digital conversion
PDF
Digital Signal Processing[ECEG-3171]-Ch1_L06
PDF
A Novel Methodology for Designing Linear Phase IIR Filters
PDF
43 131-1-pb
PDF
Signal processingcolumbia
PDF
Synthesis network
PDF
DSP Module -5 Multirate Signal Processing and its applications
PPTX
Module1_dsffffffffffffffffffffgggpa.pptx
PDF
Basics of Digital Filters
FPGA based Efficient Interpolator design using DALUT Algorithm
FPGA based Efficient Interpolator design using DALUT Algorithm
Multirate
Tele3113 wk7wed
Tele3113 wk7wed
Tele3113 wk7wed
Digfilt
Tele3113 wk8wed
30 CHL PCM PDH SDH BY SKG
Implementation and validation of multiplier less fpga based digital filter
Analog to-digital conversion
Digital Signal Processing[ECEG-3171]-Ch1_L06
A Novel Methodology for Designing Linear Phase IIR Filters
43 131-1-pb
Signal processingcolumbia
Synthesis network
DSP Module -5 Multirate Signal Processing and its applications
Module1_dsffffffffffffffffffffgggpa.pptx
Basics of Digital Filters

More from IDES Editor (20)

PDF
Power System State Estimation - A Review
PDF
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
PDF
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
PDF
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
PDF
Line Losses in the 14-Bus Power System Network using UPFC
PDF
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
PDF
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
PDF
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
PDF
Selfish Node Isolation & Incentivation using Progressive Thresholds
PDF
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
PDF
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
PDF
Cloud Security and Data Integrity with Client Accountability Framework
PDF
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
PDF
Enhancing Data Storage Security in Cloud Computing Through Steganography
PDF
Low Energy Routing for WSN’s
PDF
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
PDF
Rotman Lens Performance Analysis
PDF
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
PDF
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
PDF
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Power System State Estimation - A Review
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Line Losses in the 14-Bus Power System Network using UPFC
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Selfish Node Isolation & Incentivation using Progressive Thresholds
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Cloud Security and Data Integrity with Client Accountability Framework
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Enhancing Data Storage Security in Cloud Computing Through Steganography
Low Energy Routing for WSN’s
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Rotman Lens Performance Analysis
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Understanding_Digital_Forensics_Presentation.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
MIND Revenue Release Quarter 2 2025 Press Release
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Programs and apps: productivity, graphics, security and other tools
“AI and Expert System Decision Support & Business Intelligence Systems”

FPGA Based Design of High Performance Decimator using DALUT Algorithm

  • 1. ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010 FPGA Based Design of High Performance Decimator using DALUT Algorithm Rajesh Mehra1, Swapna Devi2 1 National Institute of Technical Teachers’ Training & Research, Chandigarh, India Email: rajeshmehra@yahoo.com 2 National Institute of Technical Teachers’ Training & Research, Chandigarh, India Email: swapna_devi_p@yahoo.co.in Abstract—this paper presents a multiplier less approach ASICs and DSP chips have been the traditional solution to implement high speed and area efficient decimator for for high performance applications, now the technology down converter of Software Defined Radios. This and the market demands are looking for changes.On technique substitutes multiply-and-accumulate (MAC) one hand, high development costs and time-to-market operations with look up table (LUT) accesses. Proposed factors associated with ASICs can be prohibitive for decimator has been implemented using Partitioned distributed arithmetic look up table (DALUT) algorithm certain applications while, on the other hand, by taking optimal advantage of embedded LUTs of target programmable DSP processors can be unable to meet FPGA device. This method is useful to enhance the system desired performance due to their sequential-execution performance in terms of speed and area. The proposed architecture [7]. In this context, embedded FPGAs offer decimator has used half band polyphase decomposition a very attractive solution that balance high flexibility, FIR structure. The decimator has been designed with time-to-market, cost and performance. Therefore, in Matlab 7.6, simulated with Modelsim 6.3XE simulator, this paper, a decimator is designed and implemented on synthesized with Xilinx Synthesis Tool (XST) 10.1 and FPGA device. An impulse response of an FIR filter implemented on Spartan-3E based 3s500efg320-4 FPGA K device. The proposed DALUT approach has shown an may be expressed as: Y =¥ k Ck x (1) k=1 improvement of 24% in speed by saving almost 50% where C1,C2…….CK are fixed coefficients and the x 1, resources of target device as compared to MAC based x2……… xK are the input data words. A typical digital approach. implementation will require K multiply-and-accumulate Index Terms— ASIC, DALUT, FPGA, MAC, SDR (MAC) operations, which are expensive to compute in hardware due to logic complexity, area usage, and I. INTRODUCTION throughput. Alternatively, the MAC operations may be replaced by a series of look-up-table (LUT) accesses The widespread use of digital representation of and summations. Such an implementation of the filter signals for transmission and storage has created is known as distributed arithmetic (DA). challenges in the area of digital signal processing [1]. The digital signal processing application by using The applications of digital FIR filter and up/down variable sampling rates can improve the flexibility of a sampling techniques are found everywhere in modem software defined radio. It reduces the need for electronic products. For every electronic product, lower expensive anti-aliasing analog filters and enables circuit complexity is always an important design target processing of different types of signals with different since it reduces the cost [2]. There are many sampling rates. It allows partitioning of the high-speed applications where the sampling rate must be changed. processing into parallel multiple lower speed Interpolators and decimators are utilized to increase or processing tasks which can lead to a significant saving decrease the sampling rate. Up sampler and down in computational power and cost. Wideband receivers sampler are used to change the sampling rate of digital take advantage of multirate signal processing for signal in multi rate DSP systems. This rate conversion efficient channelization and offers flexibility for requirement leads to production of undesired signals symbol synchronization. associated with aliasing and imaging errors. So some kind of filter should be placed to attenuate these errors II. DECIMATORS [3]-[5].Today’s consumer electronics such as cellular phones and other multi-media and wireless devices Typically lowpass filters are used to reduce the often require digital signal processing (DSP) algorithms bandwidth of a signal prior to reducing the sampling for several crucial operations[6] in order to increase rate. This is done to minimize aliasing due to the speed, reduce area and power consumption. Due to a reduction in the sampling rate. Down sampler is basic growing demand for such complex DSP applications, sampling rate alteration device used to decrease the high performance, low-cost Soc implementations of sampling rate by an integer factor [8]. An down- DSP algorithms are receiving increased attention sampler with a down-sampling factor M, where M is a among researchers and design engineers. Although positive integer, develops an output sequence y[n] with 9 © 2010 ACEEE DOI: 01.ijsip.01.02.02
  • 2. ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010 a sampling rate that is (1/M)-th of that of the input Ye jω = 1{X e jω /2 +X −e jω/2  } sequence x[n]. The down sampler is shown in Figure1. 2 (12) The two terms have an overlap due to which original “shape” of X(ejω/2) is lost when x[n] is down-sampled. This overlap causes the aliasing that takes place due to under-sampling. There is no overlap, i.e., no aliasing, Figure1. Down Sampler only if Down-sampling operation is implemented by jω X  e =0 for ∣ω∣≥π /2 (13) keeping every Mth sample of x[n] and removing M-1 in-between samples to generate y[n]. The input and In general, Aliasing is absent if and only if output relation of down sampler can be expressed as: X e jω =0 for ∣ω∣≥π / M y[n] = x[nM] (2) (14) Applying the z-transform to the input-output relation To overcome the effect of aliasing decimation filters of a factor-of-M down-sampler, we get are used. The specifications for the lowpass decimation ∞ filter is given by (3) { } −n 1, ∣ω∣≤ω / M Y  z= ∑ x [ Mn] z ∣H  e jω ∣= c n=−∞ 0, π / M ≤∣ω∣≤π (15) The expression on the right-hand side of Eq (3) cannot be directly expressed in terms of X(z). To get around this problem, a new sequence x int [n] can be III. DALUT ALGORITHM expressed as: DALUT algorithm is an efficient method for x int 0, { [ n]= x [n ], n= 0, ± M, ±2M ,  otherwise } (4) computing inner products when one of the input vectors is fixed. It uses look-up tables and accumulators instead Then ∞ ∞ of multipliers for computing inner products and has −n −n been widely used in many DSP applications such as Y  z= ∑ x [ Mn] z = ∑ x int [ Mn] z n=−∞ n=−∞ DFT, DCT, convolution, and digital filters. The ∞ example of direct DA inner-product generation is −k / M 1/ M shown in Eq. (1) where xk is a 2's-complement binary = ∑ x int [k] z =X int z  (5) k=−∞ number scaled such that |xk| < 1. We may express each xk as Now, xint [n] can be formally related to x[n] as follows: (16) x int [ n ]=c [n ]⋅x [ n ] (6) where the bkn are the bits, 0 or 1, bk0 is the sign bit. Where Now combining Eq. (1) and (16) in order to express y in terms of the bits of xk ; we see c [ n]= 1, 0, { n= 0, ± M, ±2M ,  otherwise } (7) (17) A convenient representation of c[n] is given by The above Eq.(17) is the conventional form of M −1 1 kn expressing the inner product. Interchanging the order of c [ n]= ∑ W (8) M k= 0 M the summations, gives us: Where W M =e− j2π /M (18) (9) Eq.(18) shows a DA computation where the bracketed Taking the z-transform of Eq.(6) and by making use of term is given by Eq.(8), we get   ∞ M −1 1 W kn x [n ] z−n (19) X int  z = M ∑ ∑ M n=−∞ k= 0 Each bkn can have values of 0 and 1 so Eq.(19) can (10) have 2K possible values. Rather than computing these   M −1 ∞ values on line, we may pre-compute the values and ¿1 ∑ ∑ x [ n ] W size6kn z −n M store them in a ROM. The input data can be used to M k= 0 n=−∞ M −1 directly address the memory and the result. After N 1 M ∑ X z W −k k= 0 M   such cycles, the memory contains the result, y. As an example, let us consider K = 4, C1 = 0.45, C2 = -0.65, (11) C3 = 0.15, and C4 = 0.55. The memory must contain all The spectrum of a factor-of-2 down-sampler with an possible combinations (24 = 16 values) and their input x[n] is shown in Fig2. The DTFTs of the output negatives in order to accommodate the term which and the input sequences of this down-sampler are then occurs at the sign-bit time. related as 10 © 2010 ACEEE DOI: 01.ijsip.01.02.02
  • 3. ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010 (20) Nyquist decimators provide same stop band attenuation and transition width with a much lower The structure that can be used to compute these order. An Lth-band Nyquist filter with L = 2 is called a equations is shown in Fig6. The term xk may be written half-band filter. The transfer function of a half-band as filter is thus given by 1 −1 2 (29) xk = [ xk − ( −xk )] (21) H  z =α+z E z  2 1 with its impulse response satisfying and in 2's-complement notation the negative of xk may n= 0 be written as {} h[ 2n ]= α, 0, otherwise (30) (22) where the over score symbol indicates the complement of a bit. By substituting Eq.(16) & (21) into Eq.(22), we (23) In order to simplify the notation later, it is convenient to define the new variables as − akn = bkn − bkn for n=0 (24) and − ak 0 = b k 0 − b k 0 (25) Figure3. MAC based Multiplier Implementation where the possible values of the akn , including n=0, are 1. Then Eq.(23) may be written as In Half band filters about 50% of the coefficients of (26) h[n] are zero. This reduces the hardware requirement of the proposed decimator significantly. The first By substituting the value of xk from Eq.(26) into Eq. decimator design is implemented by using multiplier (1), we obtain technique where 67 coefficients are processed MAC unit as shown in Figure3. The second decimator design (27) replaces MAC unit with LUT unit which is proposed multiplier less technique as shown in Figure4. (28) It may be seen that Q(bn) has only 2(K-1) possible amplitude values with a sign that is given by the instantaneous combination of bits. The computation of y is obtained by using a 2(K-1) word memory, a one-word initial condition register for Q(O) , and a single parallel adder sub tractor with the necessary control-logic gates. Figure4. LUT based Multiplier Less Implementation IV. PROPOSED DECIMATOR DESIGN All 67 coefficients are divided in two parts by using Equiripple based half band polyphase decimator is polyphase decomposition. The 2 branch polyphase designed and implemented using Matlab [9]. The decomposition of an FIR decimator is shown in Figure5 length of the proposed decimator filter is 66 with 0.1 and can be expressed as: transition widths 60 dB stop band attenuation whose (31) H  z =E  z 2 +z−1 E  z 2  output is shown Figure2. 0 1 Ma gn itude Res ponse (dB ) 0 -10 -20 -30 Magnitude (dB) -40 -50 -60 -70 0 0.1 0 .2 0.3 0.4 0.5 0 .6 0.7 0.8 0.9 Figure5. Polyphase Decomposition Norma liz ed Freque nc y ( × π rad/sa mp le ) Figure2. Decimator Output 11 © 2010 ACEEE DOI: 01.ijsip.01.02.02
  • 4. ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010 reduce the size in this proposed work, we can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed. For example, for a 160 tap filter, the LUT size is Figure6. Computationally Efficient Structure (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking The proposed computationally efficient equivalent 10 inputs (taps), the total LUT size is reduced to structure is shown in Figure6. In a DA realization of a 16*(2^10)*W bits, a significant reduction. So in this FIR filter structure, a sequence of input data words of proposed design 67 coefficients are divided into two width W is fed through a parallel to serial shift register, sections with 34 and 33 coefficients respectively to producing a serialized stream of bits. The serialized perform polyphase decomposition. Then 34 coefficients data is then fed to a bit-wide shift register. This shift of one part have been processed by using (6 6 6 6 6 4) register serves as a delay line, storing the bit serial data DALUT partitioning to limit the size of LUTs. This samples. The delay line is tapped (based on the input multiplier less DALUT technique consists of input word size W), to form a W-bit address that indexes into registers, 4-input LUT unit and shifter/accumulator a lookup table (LUT). The LUT stores all possible unit. sums of partial products over the filter coefficients space. The LUT is followed by a shift and adder V. IMPLEMENTATION RESULTS & DISCUSSION (scaling accumulator) that adds the values obtained from the LUT sequentially. A lookup table is The multiplier based and multiplier less decimators performed sequentially for each bit (in order of are implemented and synthesized on Spartan-3E based significance starting from the LSB). On each clock 3s500efg320-4 target device. The modelsim based cycle, the LUT result is added to the accumulated and simulated output of the proposed decimator with 16 bit shifted result from the previous cycle. For the last bit precision is shown in Figure7. (MSB), the lookup table result is subtracted, accounting for the sign of the operand. This basic form of DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Symmetric and asymmetric FIR structures are an exception, requiring W+ 1 cycle, because one additional clock cycle is needed to process the carry bit of the pre-adders. The inherently bit serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one Figure7. Simulated Decimator Output bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two Table1 show the area, and speed comparison of both called the DA radix. For example, a DA radix of 2 techniques. The proposed DA based design shows 24% (2^1) indicates that one bit sum is computed at a time; a enhancement in speed by saving almost 50% of the DA radix of 4 (2^2) indicates that two bit sums are resources as compared to MAC based design. computed at a time, and so on. To compute more than one bit sum at a time, the LUT is replicated. For Table1. Resource Utilization example, to perform DA on 2 bits at a time (radix 4), Logic Multiplier Approach Multiplier Less the odd bits are fed to one LUT and the even bits are Utilization Approach simultaneously fed to an identical LUT. The LUT # of Slices 1055 out of 4656 472 out of 4656 results corresponding to odd bits are left-shifted before (22%) (10%) # of Flip Flops 1210 out of 9312 515 out of 9312 they are added to the LUT results corresponding to (12%) (5%) even bits. This result is then fed into a scaling # of LUTs 857 out of 9312 (9%) 590 out of 9312 accumulator that shifts its feedback value by 2 places. (6%) Processing more than one bit at a time introduces a # of Multipliers 1 out of 20 0 out of 20 (5%) (0%) degree of parallelism into the operation, improving Speed (MHz) 49.574 61.215 performance at the expense of area. The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To 12 © 2010 ACEEE DOI: 01.ijsip.01.02.02
  • 5. ACEEE International Journal on Signal and Image Processing Vol 1, No. 2, July 2010 1400 REFERENCES 1200 1000 [1] Vijay Sundararajan, Keshab K. Parhi, “Synthesis of 800 600 Minimum-Area Folded Architectures for Rectangular 400 Multiplier Multidimensional”, IEEE TRANSACTIONS ON SIGNAL 200 Multiplier Less PROCESSING, pp. 1954-1965, VOL. 51, NO. 7, JULY 0 2003. [2] ShyhJye Jou, Kai-Yuan Jheng*, Hsiao-Yun Chen and An- Yeu Wu, “Multiplierless Multirate Decimator I Interpolator Module Generator”, IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, pp. 58-61, Aug-2004. Figure8. Resource Comparison [3] Amir Beygi, Ali Mohammadi, Adib Abrishamifar. “AN FPGA-BASED IRRATIONAL DECIMATOR FOR The resource comparison of both multiplier and DIGITAL RECEIVERS” in 9th IEEE International multiplier less techniques have been shown in Figure8. Symposium on Signal Processing and its Applications, pp. 1- The multiplier approach has consumed 9-22 % 4, ISSPA-2007. resources as compared to 5-10% in case of multiplier [4] Zhao Yiqiang; Xing Dongyang; Zhao Hongliang; less approach in due to efficient LUT partitioning by “Optimized Design of Digital Filter in Sigma-Delta AID using proposed DALUT algorithm. Converter”, International Conference on Neural Networks and Signal Processing, pp. 502 – 505, 2008. [5] Nerurkar, S.B.; Abed, K.H.; “Low-Power Decimator CONCLUSION Design Using Approximated Linear-Phase N-Band IIR In this paper, an optimized half band polyphase Filter”, IEEE Trans. on signal processing, vol. 54 , pp. 1550 – decomposition technique has been presented to 1553,2006. implement the decimator for wireless applications. DA [6] D.J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. Anderson, “A Novel High Performance Distributed algorithm has been used to further enhance the speed Arithmetic Adaptive Filter Implementation on an FPGA”, in and area utilization of proposed design by taking an Proc. IEEE Int. Conference on Acoustics, Speech, and Signal optimal advantage of look up table structure of target Processing (ICASSP’04), Vol. 5, pp. 161-164, 2004 FPGA. The proposed multiplier approach has shown an [7] Patrick Longa and Ali Miri “Area-Efficient FIR Filter improvement of 24% in speed by saving almost 50% Design on FPGAs using Distributed Arithmetic”, pp248-252 resources of target device as compared to multiplier IEEE International Symposium on Signal Processing and based approach. So proposed design is optimal one to Information Technology,2006. provide cost effective solution for down converter [8] S K Mitra, Digital Signal Processing, Tata Mc Graw Hill, Third Edition, 2006. section of Software Defined Radios [9] Mathworks, “Users Guide Filter Design Toolbox”, March-2007. 13 © 2010 ACEEE DOI: 01.ijsip.01.02.02