Implementation and validation of multiplier less fpga based digital filter

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 2, March – April (2013), © IAEME
348
IMPLEMENTATION AND VALIDATION OF MULTIPLIER LESS
FPGA BASED DIGITAL FILTER
Jaya Koshta1,
Vineeta Saxena(Nigam)2,
Rakesh .K Arya3
1
(Electronics & Communication Department,BIRTS,Bhopal,India)
2
(Electronics & Communication Department,UIT-RGPV,Bhopal,India)
3
(Senior resource scientist, MPCST,Bhopal,India)
ABSTRACT
Finite impulse-response filters (FIR filters) are commonly used in digital signal
processing applications and traditionally implemented using ASICs or DSP-processors.
Nowadays, Field Programmable Gate Array (FPGA) technology is widely used in digital
signal processing area because FPGA-based solution can achieve high speed due to its
parallel structure and configurable logic, which provides great flexibility and high reliability
in the course of design and later maintenance. However, the limitation of resources on an
FPGA, i. e., logic blocks and flip flops, and furthermore, the high routing delays, require
compact implementations of the circuits. Hence, FIR filter is implemented using distributed
arithmetic technique which uses look-up table with offset binary coding. This paper describes
an approach for implementation of FIR filter using distributed arithmetic, based on field
programmable gate arrays (FPGAs).The experimental results shows that implementation of
low pass FIR filter using DA technique with offset binary coding requires less resource
utilization inside FPGA as compared to implementation of FIR filter using conventional
multiply and accumulate (MAC) technique. The advantages of the FPGA approach to FIR
filter implementation include higher sampling rates than are available from traditional DSP
chips, lower costs than an ASIC for moderate volume applications and more flexibility than
the alternate approaches.
Keywords – Binary offset coding, distributed arithmetic, FIR filter, FPGA , Look-up table
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 4, Issue 2, March – April, 2013, pp. 348-356
© IAEME: www.iaeme.com/ijecet.asp
Journal Impact Factor (2013): 5.8896 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E

349
1. INTRODUCTION
In general, Digital filters are divided into two categories, including Finite Impulse
Response (FIR) and Infinite Impulse Response (IIR). And FIR filters are widely applied
to a variety of digital signal processing areas for the virtues of providing linear phase and
system stability. Compared to IIR filters, FIR filters have simple and regular structures
which are easy to implement on hardware. However FIR filters require higher number of
taps compared to IIR filters to achieve the same frequency specification. FIR filter
implementation on FPGA requires special attention as area, power, speed constraints have
to be satisfied. A number of filter architectures for FPGA implementation exist. Out of
these, Distributed Arithmetic (DA) architecture yields better area, power and speed trade
off balance.
A discrete-time linear finite impulse response (FIR) filter generates the output y[n]
as a sum of delayed and scaled input samples x[n] via the equation
1
0
N
k k
k
y w x
−
=
= ∑ (1)
A typical digital implementation will require N multiply-and-accumulate (MAC)
operations, which are expensive to compute in hardware due to logic complexity,
area usage, and throughput. Alternatively, the MAC operations may be replaced by
a series of look-up-table (LUT) accesses and summations. Such an implementation
of the filter, known as distributed arithmetic (DA), achieves higher throughput and
lower logic complexity at the cost of increased memory usage. Recent advances in
memory design technology have resulted in shrinking memory sizes, making this tradeoff
an attractive option. Distributed Arithmetic (DA) appeared as a very efficient solution
especially suited for LUT-based FPGA architectures. This technique, first proposed by
Croisier et al[1], is a multiplier-less architecture that is based on an efficient partition
of the function in partial terms using 2’s complement binary representation of data. The
partial terms can be pre-computed and stored in LUTs. The flexibility of this algorithm on
FPGAs permits everything from bit-serial implementations to pipelined or full-parallel
versions of the scheme, which can greatly improve the design performance. The main
problem with DA is that the requirement of memory/LUT capacity increases
exponentially with the order of the filter, given that DA implementations need 2K
-words
(K being the number of taps of the filter). That constitutes a first obstacle for FIR filters of
high order.
In this paper FIR filter is implemented using distributed arithmetic with offset
binary coding so that the memory size is reduced by a factor of 2 to 2K-1
.Also the resource
utilization inside FPGA of FIR filter implemented using DA technique with offset binary
coding is compared with FIR filter implemented using conventional MAC technique.
2. DISTRIBUTED ARITHMETIC
Distributed arithmetic (DA) is a bit-serial operation that computes the inner
product of two vectors (one of which is a constant) in parallel. DA eliminates the need for
multiply operations by using lookup tables (LUTs).The right balance among versions is

350
tied to specifications for a given application, and basically depends on requirements in
terms of hardware cost and throughput. In each case, the designer has to trade bandwidth
for area.
Conventional Distributed arithmetic
Consider a discrete N-order FIR ﬁlter with constant coefﬁcients, and input samples
coded as B-bit two’s complement numbers with only the sign bit to the left of the binary
point as:
1
0
1
( ) 2
B
j
k kj
j
x n k x x
−
−
=
− =− +∑ (2)
Using (1) to compute the FIR output gives
1
0
0 1 0
( ) + [ ]2
N B N
j
k k k kj
k j k
y n w x w x
−
−
= = =
= −∑ ∑ ∑ (3)
With
0
N
j k kj
k
C w x
=
= ∑ where j= 1to B-1 and 0 0
0
N
k k
k
C w x
=
= −∑ , equation (3) can be rewritten
as
1
0
( ) 2
B
j
j
j
y n C
−
−
=
= ∑ (4)
Since the term Cj depends on xk,j values and has only 2N
possible values, it is
possible to precompute them and store them in look-up table or in read only
memory[2],[3]. An input set of N bits (x0j,x1j… xN-1,j) is used as an address to retrieve the
corresponding Cj values. These intermediate results are accumulated in B clock cycles to
produce one y value. This leads to multiplier free realization of FIR filter.
Table I shows the contents of the look-up table for N = 4. Fig.1 shows a typical
architecture for FIR filter using conventional distributed arithmetic. The shift-
accumulator is a bit-parallel carry-propagate adder that adds the LUT content to the
previous accumulated result. The inverter and the MUX are used for inverting the output
of the LUT in order to compute CB-1 and the control signal S is 1 when j = B-1 and 0
otherwise. The computation runs from j = 0 to j =B-1 and the result is available in bit
parallel format after B clock cycles. This approach corresponds to bit serial arithmetic.
However the main problem with DA is that the requirement of memory/LUT capacity
increases exponentially with the order of the filter, given that DA implementations need
2K
– words (K being the number of taps of the filter). That constitutes a first obstacle for
FIR filters of high order. Therefore offset binary-coding is introduced that can reduce the
LUT size by a factor of 2 to 2N-1
.

351
Table I
Content of LUT (N =4)
x0,j x1,j x 2,j x 3,j Content of LUT
0 0 0 0 0
0 0 0 1 w3
0 0 1 0 w2
0 0 1 1 w2 + w3
0 1 0 0 w1
0 1 0 1 w1 + w3
0 1 1 0 w1 + w2
0 1 1 1 w1 + w2 +w3
1 0 0 0 w0
1 0 0 1 w0 + w2
1 0 1 0 w0 +w2
1 0 1 1 w0 +w2 + w3
1 1 0 0 w0 + w1
1 1 0 1 w0 +w1 + w3
1 1 1 0 w0 + w1 +w2
1 1 1 1 w0 + w1 + w2 + w3
Fig. 1 Implementation of conventional distributed arithmetic FIR filter
3. SUGGESTED METHODOLOGY FOR FIR FILTER IMPLEMENTATION
In suggested methodology for FIR filter implementation offset binary coding is used
for distributed arithmetic. The offset-binary coding (OBC) is used to reduce the look-up table
size by a factor of 2 to 2N-1
.Also to increase the speed of FIR filter look-up table partitioning
can also be done.
Equation (2) can be written as:
1
( ) { ( ) [ ( )]}
2
x n k x n k x n k− = − − − (5)

352
1
( 1)
0
1
( ) 2 2
B
j B
k kj
j
x n k x x
−
− − −
=
− − = − + +∑ (6)
Substituting (2) and (6) into (5),
1
( 1)
0 0
1
1
( ) [ ( ) ( )2 2 ]
2
B
j B
k k kj kj
j
x n k x x x x
−
− − −
=
− = − − + − −∑ (7)
By defining kjD as kj kjx x− , the output from FIR ﬁlter can be written as
1
( 1)
0
0 1
( ) [ 2 2 ]
2
N B
j Bk
k kj
k j
w
y n D D
−
− − −
= =
= − + −∑ ∑
1
( 1)0
0 1 0 0
[ ]2 2
2 2 2
N B N N
k kj j Bk k k
k j k k
w Dw D w−
− − −
= = = =
= − + −∑ ∑ ∑ ∑ (8)
Defining jE as
0 2
N
k kj
k
w D
=
∑ ,and extraE as
0 2
N
k
k
w
=
∑ equation (8) can be rewritten as
1
( 1)
0
1
( ) 2 2
B
j B
j extra
j
y n E E E
−
− − −
=
= − + −∑ (9)
Equations (5)-(9) characterize the OBC scheme. Table II shows the content of the look-up table.
Table II
Content of LUT with OBC coding(N=4)
x0,j x1,j x 2,j x 3,j Content of LUT
0 0 0 0 -(w0 + w1 + w2 + w3) / 2
0 0 0 1 -(w0 + w1 + w2 - w3) / 2
0 0 1 0 -(w0 + w1 - w2 + w3) / 2
0 0 1 1 -(w0 + w1 - w2 - w3) / 2
0 1 0 0 -(w0 - w1 + w2 + w3) / 2
0 1 0 1 -(w0 - w1 + w2 - w3) / 2
0 1 1 0 -(w0 - w1 – w2 + w3) / 2
0 1 1 1 -(w0 - w1 - w2 - w3) / 2
1 0 0 0 (w0 - w1 – w2 - w3) / 2
1 0 0 1 (w0 - w1 -w2 + w3) / 2
1 0 1 0 (w0 - w1 + w2 - w3) / 2
1 0 1 1 (w0 - w1 + w2 + w3) / 2
1 1 0 0 (w0 + w1 - w2 - w3) / 2
1 1 0 1 (w0 + w1 - w2 + w3) / 2
1 1 1 0 (w0 + w1 + w2 - w3) / 2
1 1 1 1 (w0 + w1 + w2 + w3) / 2

353
It is obvious that the Ej values are mirrored along the line between the 8-th and the 9-
th rows in the LUT table. In other words the term Ej has only 2 N-1
possible values depending
on xk,j values. Therefore it is possible to reduce the LUT size by a factor of 2[4-5]. Table III
illustrates the content of LUT. Fig. 2 shows implementation of FIR filter using OBC coding.
The computation starts from the lsb of xi ,i.e., j=0.The XOR gates are used for address
deciding, the MUX with the constant Eextra provides the initial value to the shift –accumulator
and the MUX after the LUT is used to inverse the output of LUT when j= B-1.Two control
signals S1 and S2 are required, where S1 is 1 when j = B-1 and 0 otherwise, and S2 is 1 when
j=0 and 0 otherwise.
Table III Content of reduced LUT for N=4
x1,j x2,j x 3,j Content of LUT
0 0 0 -(w0 + w1 + w2 + w3) / 2
0 0 1 -(w0 + w1 + w2 - w3) / 2
0 1 0 -(w0 + w1 - w2 + w3) / 2
0 1 1 -(w0 + w1 - w2 - w3) / 2
1 0 0 -(w0 - w1 + w2 + w3) / 2
1 0 1 -(w0 - w1 + w2 - w3) / 2
1 1 0 -(w0 - w1 - w2 + w3) / 2
1 1 1 -(w0 - w1 - w2 - w3) / 2
Fig. 2 Implementation of distributed arithmetic FIR filter using OBC coding
The look-up table (LUT) of distributed arithmetic increases exponentially with N.
Generally, LUT access time can be a bottleneck for the speed of filter, especially when LUT
size is large. Therefore, reducing the LUT size is very important[6]. So to reduce the LUT
size one possible solution is to divide the N address bits of the LUT into N/K groups of K
bits. Hence it is possible to decompose the LUT of size 2N-1
into N/K LUTs of size 2K
and
add the outputs of these LUTs using multi-input accumulator.

354
4. RESULTS
The proposed methodology is implemented for 8-tap low pass FIR filter. The FIR
filter is simulated and synthesized using Xilinx ISE on Spartan board. The coefficients of
filter are truncated to four decimals places, scaled to signed integer and are represented in 2’s
complement form. The precision for inputs and coefficients used are 8 and 12 bits
respectively. The results of offset binary coding Distributed arithmetic FIR filter is compared
with conventional multiply and accumulate technique of FIR filter implementation. Fig. 3
shows the simulation results for FIR filter implemented using MAC technique. Fig. 4 shows
the simulation results for FIR filter implemented using DA using offset binary coding
technique.
Fig.3 Simulation result for FIR filter implemented MAC technique.
Fig.4 Simulation result for FIR filter implemented using DA using offset binary coding
technique.

355
Table IV compares the resource utilization of FIR filter on FPGA implemented using both
MAC technique and DA with offset binary coding technique.
Table IV
Device utilization summary for FIR filter designed using MAC and DA with offset binary
coding
Selected Device: Spartan 2- xc2s200-5pq208
MAC DA
Number of Slice
Flip Flops:
101 out of
4704
68 out of 4704
Number of 4 input
LUTs:
319 out of
2352
250 out of 2352
Number of
occupied Slices:
291 out of
4704
205 out of 4704
Number of
GCLKs:
1 out of 4 1 out of 4
Total equivalent
gate count for
design 5423 2423
From Table IV it is seen that DA based implementation of FIR filter requires less
resources inside FPGA as compared to MAC based implementation of FIR filter. Also it is
seen that DA-based filters exhibit lower gate counts than their MAC counterparts because
they don't require multipliers.
5. CONCLUSION
Distributed Arithmetic has proved to be an area efficient technique of FIR filter
implementation. While using it, special care is required against exponential growth of LUT
size. Slicing of LUT of desired length, gives an effective solution, particularly, for high order
filter designs. The FIR filters implemented in FPGAs provide the designer tremendous
flexibility in terms of the number of filter taps and changes in existing coefficients.

356
REFERENCES
[1] Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, Digital Filter for PCM Encoded
Signals, U.S. Patent No. 3,777,130, issued April, 1973.
[2] C.S Burrus,Digital filter structures described by distributed arithmetic, IEEE Trans. On
Circuits and Systems, Dec. 1977.
[3] A. Peled and B. Liu, A new hardware realization of digital filters, IEEE Trans.
Acoustics, Speech and Signal Processing, vol. ASSP- 22, no. 6, pp. 456-462, Dec
1974.
[4] J. Choi, S. Shin and J. Chung, Efficient ROM size reduction for distributed arithmetic,
in Proceedings of the IEEE ISCAS, Geneva, Switzerland, May 2000, vol. 2, pp. 61-64.
[5] H. Yoo and D. V. Anderson, Hardware-Efficient Distributed Arithmetic Architecture
For High-Order
Digital Filters, IEEE International Conference on Acoustics Speech and Signal
Processing,CASSP,
pp.125-128, 2005.
[6] Shanthala S, and S. Y. Kulkarni, High Speed and Low power FPGA Implementation of
FIR Filter for DSP Applications, European Journal of Scientific Research, 2009
[7] Martinez-Peiro, J. Valls, T. Sansaloni, A.P. Pascual, and E.I. Boemo, A Comparison
between Lattice, Cascade and Direct Form FIR Filter Structures by using a FPGA Bit-
Serial DA Implementation, in Proc. IEEE International Conference on Electronics,
Circuits and Systems, 1999, Vol. 1,pp. 241 – 244.
[8] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation.
New York: Wiley, 1999.
[9] S. A. White, Applications of distributed arithmetic to digital signal processing: A
tutorial review, IEEE Acoust. Speech Signal Processing Mag., vol 6,pp.4-19 , July
1989.
[10] M. A. Majed and Prof. C.S. Khandelwal, “Efficient Dynamic System Implementation
of FPGA Based Pid Control Algorithm for Temperature Control System”, International
Journal of Electrical Engineering & Technology (IJEET), Volume 3, Issue 2, 2012,
pp. 306 - 312, ISSN Print : 0976-6545, ISSN Online: 0976-6553.
[11] G.Prasad and N.Vasantha, “Design and Implementation of Multi Channel Frame
Synchronization in FPGA”, International journal of Electronics and Communication
Engineering &Technology (IJECET), Volume 4, Issue 1, 2013, pp. 189 - 199, ISSN
Print: 0976- 6464, ISSN Online: 0976 –6472.
[12] Sriadibhatla Sridevi, Dr. Ravindra Dhuli and Prof. P. L. H. Varaprasad, “FPGA
Implementation of Low Complexity Linear Periodically Time Varying Filter”,
International journal of Electronics and Communication Engineering & Technology
(IJECET), Volume 3, Issue 1, 2012, pp. 130 - 138, ISSN Print: 0976- 6464,
ISSN Online: 0976 –6472.

Implementation and validation of multiplier less fpga based digital filter

More Related Content

What's hot (18)

Similar to Implementation and validation of multiplier less fpga based digital filter (20)

More from IAEME Publication (20)

Recently uploaded (20)

Implementation and validation of multiplier less fpga based digital filter