SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 238
IMPLEMENTATION OF CYCLIC CONVOLUTION BASED ON FNT
A.Laxman1
, A. Vamshidhar Reddy2
, L.Prakash3
, T.Satyanarayana4
1
Asst.Prof.,ECE Department, MAHAVEER INST.OF SCIENCE & TECH,, AP, India, amgothlaxman@gmail.com
2
Assoc.Prof.,ECE Department, RRS COLLEGE OF ENG. & TECH.,AP,India,avamshireddy@gmail.com
3
Asst.Prof.,ECE Department, DVR COLLEGE OF ENG. & TECH., AP, India, laudiya.prakash@gmail.com
4
Asst.Prof.,ECE Department, DVR COLLEGE OF ENG. & TECH, AP, India , satyant234@gmail.com
Abstract
Cyclic convolution is also known as circular convolution. It is simpler to compute and produce less output samples compared to linear
convolution. There are many architectures for calculating cyclic convolution of any two signals. Implementation using Fermat
Number Transform (FNT) is one of them. Fermat Number is a positive integer of the form where n is a
nonnegative integer.The basic property of FNT is that they are recursive.
This paper presents a cyclic convolution based on Fermat Number Transform(FNT) in the diminished-1 number system.A Code
Convolution method Without Addition(CCWA) and a Butterfly Operation method Without Addition(BOWA) are proposed to perform
the FNT and its inverse(IFNT) except their final stages in the convolution.The pointwise multiplication in the convolution is
accomplished by Modulo 2n
+1 Partial Product Multipliers(MPPM) and output partial products which are inputs to the IFNT.Thus
Modulo 2n
+1 carry propagation additions are avoided in the FNT and the IFNT except their final stages and Modulo2n
+1
multiplier.The execution delay of the parallel architecture is reduced evidently due to the decrease of Modulo 2n
+1 carry propagation
addition.compared with the existing cyclic convolution architecture,the proposed one has better throughput performance and involves
less hardware complexity.Synthesis results using 130nm CMOS technology demonstrate the superiority of the proposed architecture
over the reported solution.
Index Terms: FERMAT NUMBER THEORETIC TRANSFORM, BUTTERFLY ARCHITECTURE, PARALLEL
ARCHITECTURE FOR CYCLIC CONVOLUTION, and COMPARISON AND RESULTS
-----------------------------------------------------------------------***-----------------------------------------------------------------------
1. FERMAT NUMBER THEORETIC
TRANSFORM
The cyclic convolution via the FNT is composed of the FNTs,
the point wise multiplications and the IFNT.FNTs of two
sequences {ai} and {bi} will produce two sequences {Ai} and
{Bi}. Modulo 2n
+1multipliers are employed to accomplish the
point wise multiplication between {Ai} and {Bi} and produce
the sequence {Pi}. The final resulting sequence {pi} can be
obtained by taking the inverse FNT of the product sequence
{Pi}.Each element in the {pi} is in the diminished-1
representation.
The FNT of a sequence of length N {xi} (i= 0,1, …N- 1) is
defined as
Xk= 



1
0
)(
)1....1,0()mod(
N
i
t
ik
Ni NkFnx 
where Ft=22t+1, the tth Fermat; N is a power of 2 and α is an
Nth root unit (i.e. αNN mod Ft=1 and α MN mod Ft ≠1 ,1≤m
<N ). The notation < ik > means ik modulo N.
The inverse FNT is given by
xt = 




1
0
)(
)1....1,0(mod
1 N
k
t
ik
Nk NiFx
N

Where 1/N is an element in the finite field or ring of integer
and satisfies the following condition:
(N.1/N) mod Ft =1
Parameters α, Ft, N must be chosen carefully and some
conditions must be satisfied so that the FNT possesses the
cyclic convolution property. In this project, α=2, Ft=22t
+1 and
N=2.2t
where t is an integer.
1.1 Important operations
Important operations of the cyclic convolution based on FNT
with the unity root 2 include the CCWA, the BOWA and the
MPPM. The CCWA and the BOWA both consist of novel
modulo 2n+1 4-2 compressors mainly which are composed of
the 4-2 compressor introduced by Nagamatsu. The 4-2
compressor, the novel modulo 2n+1 4-2 compressor and the
BOWA are shown in Fig.4.1. In the figure, “X*” denotes the
diminished-1 representation of X, i.e. X* =X-1.
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 239
1.2 Code conversion without addition
The CC converts normal binary numbers (NBCs) into their
diminished-1 representation. It is the first stage in the FNT.
Delay and area of CC of a 2n-bit NBC are no less than the
ones of two n-bit carry propagation adders.
Fig-1 Elementary operations of FNT architecture with unity
root 2, (a) 4-2 compressor
Fig-2 modulo 2n+1 4-2 compressor
To reduce the cost, we propose the CCWA that is performed
by the modulo 2n+1 4-2 compressor. Let A and B represent
two operands whose widths are no more than 2n bits. We
define two new variables:
A = 2n AH +AL
B = 2n BH +BL
M0 = (2n - 1) - AH = ĀH
M1 = (2n - 1) - BH = BH
M2 = (2n - 1) – BL = BL
If the subsequent operation of CC is modulo 2n+1 addition,
assign AL, M0, BL and M1 to I0, I1, I2, I3 in the modulo
2n+1 4-2 compressor respectively. I0, I1, I2, I3 are defined as
follows:
I0 = I0(n-1) I0(n-2)…….I01I00
I1 = I1(n-1) I1(n-2)…….I11I10
I2 = I2(n-1) I2(n-2)…….I21I20
I3 = I3(n-1) I3(n-2)…….I31I30
We obtain the sum vector
*
OH and carry vector
*
1H in the
diminished-1 number system. The most significant bit of
*
1H
is complemented and connected back to its least significant
bit. That is to say
*
OH = H0(n-1) H0(n-2)……..H01H00
*
1H = H1(n-2) ……..H11H10 H1(n-1)
The result of modulo 2n+1 addition of A* and B* is equal to
the result of modulo 2n+1 addition of
*
OH and
*
1H in this
way, A and B are converted into their equivalent diminished-1
representations
*
OH and
*
1H .
Let │A* +B*│2n+1, │Ā*│2n+1, │A* - B*│2n+1, and│A*
+2i│2n+1 denote modulo 2n+1 addition, negation, subtraction
and multiplication by the power of 2 respectively which are
proposed by Leibowitz originally.
If the subsequent operation is modulo 2n+1 subtraction, we
assign AL, M0, M2 and BH to I0, I1, I2, I3 respectively. Then
*
OH and
*
1H in the modulo 2n+1 4-2 compressor
constitute the result of the CCWA.
After CCWA, we obtain the result consisting of two
diminished-1 numbers. The result also includes the
information of modulo 2n+1 addition or subtraction in the first
stage of previous BO
1.3 Diminished-One modulo 2n +1 Adder Design
Modulo 2 +1 adders are also utilized as the last stage adder of
modulo 2 +1multipliers. Modulo 2 +1multipliers find
applicability in pseudorandom number generation,
cryptography, and in the Fermat number transform, which is
an effective way to compute convolutions Leibowitz has
proposed the diminished-one number system. In the
diminished-one number system, each number X is represented
by X =X-1. The representation of 0 is treated in a special way.
Since the adoption of this system leads to modulo 2 +1 adders
and multipliers of n bits wide operands, it has been used for
many residue number system implementations; Efficient VLSI
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 240
implementations of modulo 2 +1 adders for the diminished-
one number system have recently been presented. The adders
although fast, are, according to the comparison presented, still
slower than the fastest modulo 2 adders or the fastest modulo
2 -1 adders
In this project, we derive two new design methodologies for
modulo 2 +1 adders in the diminished-one number system.
The first one leads to traditional Carry Look-Ahead (CLA),
while the second to parallel-prefix adder architectures. Using
implementations in a static CMOS technology, we show that
the proposed CLA adder design methodology leads to more
area and time efficient implementations than those presented,
for small operand widths. For wider operands, the proposed
parallel-prefix design methodology leads to considerably
faster adder implementations than those presented and as fast
as the integer or the modulo 2 +1 architecture presented.
2. BUTTERFLY ARCHITECTURE
2.1 Butterfly operation without addition
After the CCWA, we obtain the results of modulo 2n+1
addition and subtraction in the diminished-1 representation.
Each result consists of two diminished-1 values. The
subsequent butterfly operation involves four operands. The
proposed BOWA involves two modulo 2n+1 4-2 compressors,
a multiplier and some inverters.
.
Fig-3 Butterfly operation without addition
The multiplication by an integer power of 2 in the diminished-
1 number system in the BOWA is trivial and can be performed
by left shifting the low-order n-i bits of the number by i bit
positions then inversing and circulating the high order i bits
into the i least significant bit positions. Thus the BOWA can
be performed without the carry-propagation chain so as to
reduce the delay and the area obviously. K*, L*, M*, N* are
corresponding to two inputs and two outputs of previous BO
in the diminished-1 number system respectively and given by
M*= =
=
N*= =
=
=
Where ,
2.2 The FNT Architecture
In the previous sections, we have presented the
reconfiguration at a rather low level. The Butterfly constitutes
a high parameterized function level. The fact to have this
parameterized function allows designing a reconfigurable
operator who’s Butterfly forms the highest level operator.
Figure 4 depicts the global reconfigurable operator. Over C it
is called FFT and over GF (Ft) is called FNT. This
architecture has been validated by software. A simple test of
calculation of FFT and IFFT, showed the validity of this
structure.
.
Fig-4 The architecture of FNT operator
In the proposed parallel architecture for cyclic convolution
based on FNT, the BOWA can accept four operands in the
diminished-1 number system. Every point wise multiplication
only needs to produce two partial products rather than one
product. The operation can be accomplished by taking away
the final modulo 2n+1 adder of two partial products in the
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 241
multiplier. Thus the final modulo 2n+1 adder is omitted and
the modulo 2n+1 partial product multiplier is employed to
save the delay and the area.
2.3 Modulo 2n+1 Partial Product Multiplier
For the modulo 2n+1 multiplier proposed by Efstathiou, there
are n+3 partial products that are derived by simple AND and
NAND gates. An FA based Dadda tree that reduces the n+3
partial products into two summands is followed. Then a
modulo 2n+1 adder for diminished-1 operands is employed to
accept these two summands and produce the required product.
In the proposed parallel architecture for cyclic convolution
based on FNT, the BOWA can accept four operands in the
diminished-1 number system. Every point wise multiplication
only needs to produce two partial products rather than one
product. The operation can be accomplished by taking away
the final modulo 2n+1 adder of two partial products in the
multiplier. Thus the final modulo 2n+1 adder is omitted and
the modulo 2n+1 partial product multiplier is employed to
save the delay and the area.
3. PARALLEL ARCHITECTURE FOR CYCLIC
CONVOLUTION
Based on the CCWA, the BOWA and the MPPM, we design
the whole parallel architecture for the cyclic convolution
based on FNT as shown in Fig.6.1. It includes the FNTs, the
point wise multiplication and the IFNT mainly. FNTs of two
input sequences {ai} and {bi} produce two sequences {Ai}
and {Bi} (i=1, 2 …N- 1). Sequences {Ai} and {Bi} are sent to
N MPPMs to accomplish the point wise multiplication and
produce N pairs of partial products. Then the IFNT of the
partial products are performed to produce the resulting
sequence {pi} of the cyclic convolution.
Fig-5 Parallel architecture for the cyclic convolution based on
FNT
In the architecture, the radix-2 decimation-in-time (DIT)
algorithm which is by far the most widely used algorithm is
employed to perform the FNT and the IFNT
16-ordinary binary 16-pairs of diminished-1
operands operands
Fig-6 Structures for FNT and IFNT (Ft=28+1)
The efficient FNT structure involves log2N+ 1 stages of
operations. The original operands are converted into the
diminished-1 representation in the CCWA stage, containing the
information of modulo 2n+1 addition or subtraction in the first
butterfly operation stage of the previous FNT structure. Then
the results are sent to the next stage of BOWA. After log2N-1
stages of BOWAs, the results composed of two diminished-1
operands are obtained. The final stage of FNT consists of
modulo 2n+1 carry-propagation adders which are used to
evaluate the final results in the diminished-1 representation.
The CCWA stage, the BOWA stage and the modulo 2n+1
addition stage in the FNT involves N/2 couples of code
conversions including the information of modulo 2n+1 addition
and subtraction, N/2 butterfly operations and N/2 couple of
modulo 2n+1 addition respectively.
From the definition of FNT and IFNT in section 2, the only
difference between the FNT and the IFNT is the normalization
factor 1/N and the sign of the phase factor αN. If ignoring the
normalization factor 1/N, the above formula is the same as that
given in the FNT except that all transform coefficients αN ik
used for the FNT need to be replaced by αN-(ik) for the IFNT
computation. The proposed FNT structure can be used to
complete the IFNT as well with little modification as shown in
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 242
Fig. 6.2(b). After the IFNT of N-point bit reversed input data,
the interim results are multiplied by 1/N in the finite field or
ring. Then x[j] and x[j+N/2] (j=1,2,…,N/2-1) exchange their
positions to produce the final results of the IFNT in natural
order. Our architecture for the cyclic convolution gives a good
speed performance without requiring a complicated control.
Furthermore, it is very suitable for implementation of the
overlap-save and overlap adds techniques which are used to
reduce a long linear convolution to a series of short cyclic
convolutions.
3.1 Novel parallel architecture for FNT
Since the FNT has a mathematical algorithm similar to the
FFT, an FFT-type structure can be applied to perform a fast
FNT.The radix-2 decimation-in-time (DIT) fast algorithm
which is by far the most widely used FFT algorithm is
employed. With the input data sequence stored in bit –
reversed order and the CSBO performed in place, the resulting
FNT sequence is obtained in natural order. The novel
architecture is shown in Fig. in the case the transform length
is 8 and the modulus is 24+1.in Fig. , MA and MS denote
“modulo 2n+1 addition” and “modulo 2n+1 subtraction”
respectively. The novel parallel N-point FNT architecture with
the unity root 2 is composed of one stage of CSCC,log2N-1
stages of CSBO and one stage of modulo 2n+1 carry
propagation addition. The final stage is used to evaluate the
final results, each of which is a diminished-1 value.
Fig-7 Novel architecture for 8-point FNT with Modulus 24+1
The existing parallel N- point FNT architecture with the unity
root 2 consists of one stage of CC and log2N stages of BO.
Both architecture involve the same numbers of CC stages and
BO stages except their final stages .The proposed CSCC and
CSBO stages and BO stages except their final stages. The
proposed CSCC and CSBO overcome the disadvantage of
carry-propagation addition and don’t require a zero indicator
.That are more area delay efficient than the BO and CC
respectively. The costs of the final stages of both architectures
and approximately equal since every BO is composed of two
parallel modulo 2n+1 adder chiefly. Thus the proposed
parallel FNT architecture is more novel architecture is very
suitable for implementation of the overlap-save and overlap
Add techniques which are used reduce a long liner
convolution to a series of short cyclic convolution.
4. COMPARISON AND RESULTS
The delay and the area estimations of modulo 2n+1 adder and
modulo 2n+1 multiplier in the cyclic convolution are given in
Table as a function of the operand size n. “D(n+3)” in Table 1
is defined as shown in Table 4.2
Table-1 Area and delay estimations for arithmetic modulo
2n+1
operator
Area Delay
This
project
[3] This
project
[3]
MA 14n 9/2nlogn+n/2+6 8 2logn+3
MM 8n2
+n-
1
9/2nlogn+8n2
+n/2+4 4D(n+3)+1 4D(n+3)
2log2n+3
“MA” and “MM” represent modulo 2n+1 adder and multiplier
respectively.
Table 1 and 2 indicate that for values of Ft ≥28+1 the
proposed architecture comprising the CCWA and the BOWA
require less delay and area than the previous one. The former
results in a 12.6% reduction in area and a 26% reduction in
delay respectively compared with the latter in the case Ft is
232+1 and the transform length is 64. Moreover, our algorithm
will be more and more advantageous with the growth of
modulus width.
Table-2 Area and delay results of cyclic convolution based on
FNT
Ft
Area(µm2
) Delay(ns)
This project [3] This project [3]
28
+1 3.5 x 105
3.5 x 105
8.9 9.9
216
+1 1.86 x 106
2.05 x 06
11.6 14.4
232
+1 1.08 x 107
1.24 x107
15.1 20.4
IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 243
CONCLUSIONS
The modern programmable structures deliver the possibilities
to implement DSP algorithms in dedicated embedded blocks.
This makes designing of such algorithm an easy task.
However the flexibility of programmable structures enables
more advanced implementation methods to be used. In
particular, exploitation of parallelism in the algorithm to be
implemented may yield very good results.
A novel parallel architecture for the cyclic convolution based
on FNT is proposed in the case the principle root of unity is
equal to 2 or its integer power. The FNT and the IFNT are
accomplished by the CCWA and the BOWA mainly. The
pointwise multiplication is performed by the modulo 2n
+1
partial product multiplier. Thus there are very little modulo
2n+1 carry-propagation addition compared to the existing
cyclic convolution architecture. A theoretical model was
applied to access the efficiency independently of the target
technology. VLSI implementations using a 0.13 um standard
cell library show the proposed parallel architecture can attain
lower area and delay than that of the existing solution when
the modulus is no less than 28
+1.
REFERENCES:
[1]. C. Cheng, K.K. Parhi, “Hardware efficient fast DCT based
on novel cyclic convolution structures”, IEEE Trans. Signal
processing, 2006, 54(11), pp. 4419- 4434
[2]. H.C. Chen, J.I. Guo, T.S. Chang, et al., “ A memory
efficient realization of cyclic convolution and its application
to discrete cosine transform”, IEEE Trans. Circuit and system
for video technology, 2005, 15(3), pp. 445-453
[3]. R. Conway, “Modified Overlap Technique Using Fermat
and Mersenne Transforms”, IEEE Trans. Circuits and Systems
II: Express Briefs, 2006, 53(8), pp.632 – 636
[4]. A. B. O'Donnell, C. J. Bleakley, “Area efficient fault
tolerant convolution using RRNS with NTTs and WSCA”,
Electronics Letters, 2008, 44(10), pp.648-649
[5]. H. H. Alaeddine, E. H. Baghious and G. Madre et al.,
“Realization of multi-delay filter using Fermat number
transforms”, IEICE Trans. Fundamentals, 2008, E91A(9), pp.
2571-2577
[6]. N. S. Rubanov, E. I. Bovbel, P. D. Kukharchik, V. J.
Bodrov, “Modified number theoretic transform over the direct
sum of finite fields to compute the linear convolution”, IEEE
Trans. Signal Processing, 1998, 46(3), pp. 813-817
[7]. T. Toivonen, J. Heikkila, “Video filtering with fermat
number theoretic transforms using residue number system”,
IEEE Trans. circuits and systems for video technology, 2006,
16(1), pp. 92-101
[8]. L. M. Leibowitz, “A simplified binary arithmetic for the
Fermat number transform,” IEEE Trans. Acoustics Speech and
Signal Processing, 1976, 24(5):356-359
BIOGRAPHIES:
A.LAXMAN received the M.TECH
degree from Srinidhi Institute Of
Technology, Ghatkesar, Hyderabad
Currently he is working as an asst. prof. in
MAHAVEER Inst. Of Science and
Technology, Hyderabad
A.VAMSHIDHAR REDDY received
the M.TECH degree from Bandari
Srinivas Institute Of Tech.,Chevella,
Hyderabad. Currently he is working as
an assoc. prof. in RRS College of Eng.
And Tech., Hyderabad
L.PRAKASH received the M.TECH
degree from TRR College of Eng. And
Tech.,Hyderabad
Currently he is working as an assoc. prof.
in DVR College of Eng. And Tech.,
Hyderabad
T.SATYANARAYANA received the
M.TECH degree from JBIT,Moinabad,
Hyderabad Currently he is working as an
assoc. prof. in DVR College of Eng. And
Tech, Hyderabad

More Related Content

PDF
Design and Implementation of High Speed Area Efficient Double Precision Float...
PDF
Multiplier and Accumulator Using Csla
PDF
Implementation of Stronger S-Box for Advanced Encryption Standard
PDF
Optimized study of one bit comparator using reversible
PDF
Analysis of GF (2m) Multiplication Algorithm: Classic Method v/s Karatsuba-Of...
PDF
Efficient Design of 64:1 Hybridized MUX for Low Area and Power VLSI
PDF
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
PDF
Chapter 7: Matrix Multiplication
Design and Implementation of High Speed Area Efficient Double Precision Float...
Multiplier and Accumulator Using Csla
Implementation of Stronger S-Box for Advanced Encryption Standard
Optimized study of one bit comparator using reversible
Analysis of GF (2m) Multiplication Algorithm: Classic Method v/s Karatsuba-Of...
Efficient Design of 64:1 Hybridized MUX for Low Area and Power VLSI
Implementation of an Effective Self-Timed Multiplier for Single Precision Flo...
Chapter 7: Matrix Multiplication

What's hot (20)

PDF
Justification of Montgomery Modular Reduction
PDF
Design and implementation of high speed baugh wooley and modified booth multi...
DOCX
Modified booth
PDF
Chapter 5: Mapping and Scheduling
PDF
Design of Power Efficient 4x4 Multiplier Based On Various Power Optimizing Te...
PDF
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
PDF
International Journal of Engineering and Science Invention (IJESI)
PDF
Layout Design Analysis of CMOS Comparator using 180nm Technology
PDF
Design of 8-Bit Comparator Using 45nm CMOS Technology
PDF
Low Power Implementation of Booth’s Multiplier using Reversible Gates
PDF
Faster Interleaved Modular Multiplier Based on Sign Detection
PDF
Algebraic data types: Semilattices
PDF
Burrows wheeler based data compression and secure transmission
PDF
Efficient Layout Design of CMOS Full Subtractor
PDF
A Survey on Various Lightweight Cryptographic Algorithms on FPGA
PDF
Fuzzy Sequencing Problem Using Generalized Triangular Fuzzy Numbers
PDF
Energy Efficient Full Adders for Arithmetic Applications Based on GDI Logic
PDF
Implementation performance analysis of cordic
PDF
Optimized Layout Design of Priority Encoder using 65nm Technology
PDF
Low Power and Area Efficient Multiplier Layout using Transmission Gate
Justification of Montgomery Modular Reduction
Design and implementation of high speed baugh wooley and modified booth multi...
Modified booth
Chapter 5: Mapping and Scheduling
Design of Power Efficient 4x4 Multiplier Based On Various Power Optimizing Te...
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
International Journal of Engineering and Science Invention (IJESI)
Layout Design Analysis of CMOS Comparator using 180nm Technology
Design of 8-Bit Comparator Using 45nm CMOS Technology
Low Power Implementation of Booth’s Multiplier using Reversible Gates
Faster Interleaved Modular Multiplier Based on Sign Detection
Algebraic data types: Semilattices
Burrows wheeler based data compression and secure transmission
Efficient Layout Design of CMOS Full Subtractor
A Survey on Various Lightweight Cryptographic Algorithms on FPGA
Fuzzy Sequencing Problem Using Generalized Triangular Fuzzy Numbers
Energy Efficient Full Adders for Arithmetic Applications Based on GDI Logic
Implementation performance analysis of cordic
Optimized Layout Design of Priority Encoder using 65nm Technology
Low Power and Area Efficient Multiplier Layout using Transmission Gate
Ad

Viewers also liked (19)

PDF
Reusing of glass powder and industrial waste materials in concrete
PDF
Implementation of low power divider techniques using radix
PDF
Water quality index with missing parameters
PDF
Seismic performance assessment of the torsional effect in asymmetric structur...
PDF
Biodiesel as a blended fuel in compression ignition engines
PDF
Finite element simulation of hybrid welding process for welding 304 austeniti...
PDF
Performance analysis of al fec raptor code over 3 gpp embms network
PDF
Study and comparative analysis of resonat frequency for microsrtip fractal an...
PDF
Oscillatory motion control of hinged body using controller
PDF
A survey on hiding user privacy in location based services through clustering
PDF
System identification of a steam distillation pilot scale using arx and narx ...
PDF
A mathematical formulation of urban zoning selection criteria in a distribute...
PDF
An understanding of graphical perception
PDF
Effect of differential settlement on frame forces a parametric study
PDF
Critical analysis of radar data signal de noising by implementation of haar w...
PDF
Survey on cloud computing security techniques
PDF
Harmonized scheme for data mining technique to progress decision support syst...
PDF
Performance evaluation and emission analysis of 4 s, i.c. engine using ethan...
PDF
A study on modelling and simulation of photovoltaic cells
Reusing of glass powder and industrial waste materials in concrete
Implementation of low power divider techniques using radix
Water quality index with missing parameters
Seismic performance assessment of the torsional effect in asymmetric structur...
Biodiesel as a blended fuel in compression ignition engines
Finite element simulation of hybrid welding process for welding 304 austeniti...
Performance analysis of al fec raptor code over 3 gpp embms network
Study and comparative analysis of resonat frequency for microsrtip fractal an...
Oscillatory motion control of hinged body using controller
A survey on hiding user privacy in location based services through clustering
System identification of a steam distillation pilot scale using arx and narx ...
A mathematical formulation of urban zoning selection criteria in a distribute...
An understanding of graphical perception
Effect of differential settlement on frame forces a parametric study
Critical analysis of radar data signal de noising by implementation of haar w...
Survey on cloud computing security techniques
Harmonized scheme for data mining technique to progress decision support syst...
Performance evaluation and emission analysis of 4 s, i.c. engine using ethan...
A study on modelling and simulation of photovoltaic cells
Ad

Similar to Implementation of cyclic convolution based on fnt (20)

PDF
Gv3512031207
PDF
Id2514581462
PDF
Id2514581462
PDF
Fast Fourier Transform utilizing Modified 4:2 & 7:2 Compressor
PDF
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
PDF
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
PDF
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
PDF
IRJET- Implementation of FIR Filter using Self Tested 2n-2k-1 Modulo Adder
PDF
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
PDF
Iaetsd pipelined parallel fft architecture through folding transformation
PPT
binary representation dd_vahid_sampleslides_Feb06.ppt
PDF
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
PDF
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
PDF
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
PPT
dd_sampleslides.ppt
PDF
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
PDF
Fast Fourier Transform
PDF
A Pipelined Fused Processing Unit for DSP Applications
PPT
Basic principle of a systolic system-Convolution
PPT
he organization uses a weak random number generator or an algorithm that gene...
Gv3512031207
Id2514581462
Id2514581462
Fast Fourier Transform utilizing Modified 4:2 & 7:2 Compressor
A Novel VLSI Architecture for FFT Utilizing Proposed 4:2 & 7:2 Compressor
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- Implementation of FIR Filter using Self Tested 2n-2k-1 Modulo Adder
IRJET- Low Complexity Pipelined FFT Design for High Throughput and Low Densit...
Iaetsd pipelined parallel fft architecture through folding transformation
binary representation dd_vahid_sampleslides_Feb06.ppt
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
A BINARY TO RESIDUE CONVERSION USING NEW PROPOSED NON-COPRIME MODULI SET
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
dd_sampleslides.ppt
IRJET - Design and Implementation of FFT using Compressor with XOR Gate Topology
Fast Fourier Transform
A Pipelined Fused Processing Unit for DSP Applications
Basic principle of a systolic system-Convolution
he organization uses a weak random number generator or an algorithm that gene...

More from eSAT Journals (20)

PDF
Mechanical properties of hybrid fiber reinforced concrete for pavements
PDF
Material management in construction – a case study
PDF
Managing drought short term strategies in semi arid regions a case study
PDF
Life cycle cost analysis of overlay for an urban road in bangalore
PDF
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
PDF
Laboratory investigation of expansive soil stabilized with natural inorganic ...
PDF
Influence of reinforcement on the behavior of hollow concrete block masonry p...
PDF
Influence of compaction energy on soil stabilized with chemical stabilizer
PDF
Geographical information system (gis) for water resources management
PDF
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
PDF
Factors influencing compressive strength of geopolymer concrete
PDF
Experimental investigation on circular hollow steel columns in filled with li...
PDF
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
PDF
Evaluation of punching shear in flat slabs
PDF
Evaluation of performance of intake tower dam for recent earthquake in india
PDF
Evaluation of operational efficiency of urban road network using travel time ...
PDF
Estimation of surface runoff in nallur amanikere watershed using scs cn method
PDF
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
PDF
Effect of variation of plastic hinge length on the results of non linear anal...
PDF
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Mechanical properties of hybrid fiber reinforced concrete for pavements
Material management in construction – a case study
Managing drought short term strategies in semi arid regions a case study
Life cycle cost analysis of overlay for an urban road in bangalore
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of compaction energy on soil stabilized with chemical stabilizer
Geographical information system (gis) for water resources management
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Factors influencing compressive strength of geopolymer concrete
Experimental investigation on circular hollow steel columns in filled with li...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Evaluation of punching shear in flat slabs
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of operational efficiency of urban road network using travel time ...
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of morphometric parameters and runoff using rs &amp; gis techniques
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of use of recycled materials on indirect tensile strength of asphalt c...

Recently uploaded (20)

PDF
composite construction of structures.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
web development for engineering and engineering
PPTX
Lecture Notes Electrical Wiring System Components
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Welding lecture in detail for understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
DOCX
573137875-Attendance-Management-System-original
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
PDF
PPT on Performance Review to get promotions
PPTX
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
R24 SURVEYING LAB MANUAL for civil enggi
Operating System & Kernel Study Guide-1 - converted.pdf
additive manufacturing of ss316l using mig welding
web development for engineering and engineering
Lecture Notes Electrical Wiring System Components
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Welding lecture in detail for understanding
bas. eng. economics group 4 presentation 1.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
573137875-Attendance-Management-System-original
Mechanical Engineering MATERIALS Selection
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction
PPT on Performance Review to get promotions
Construction Project Organization Group 2.pptx

Implementation of cyclic convolution based on fnt

  • 1. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 238 IMPLEMENTATION OF CYCLIC CONVOLUTION BASED ON FNT A.Laxman1 , A. Vamshidhar Reddy2 , L.Prakash3 , T.Satyanarayana4 1 Asst.Prof.,ECE Department, MAHAVEER INST.OF SCIENCE & TECH,, AP, India, amgothlaxman@gmail.com 2 Assoc.Prof.,ECE Department, RRS COLLEGE OF ENG. & TECH.,AP,India,avamshireddy@gmail.com 3 Asst.Prof.,ECE Department, DVR COLLEGE OF ENG. & TECH., AP, India, laudiya.prakash@gmail.com 4 Asst.Prof.,ECE Department, DVR COLLEGE OF ENG. & TECH, AP, India , satyant234@gmail.com Abstract Cyclic convolution is also known as circular convolution. It is simpler to compute and produce less output samples compared to linear convolution. There are many architectures for calculating cyclic convolution of any two signals. Implementation using Fermat Number Transform (FNT) is one of them. Fermat Number is a positive integer of the form where n is a nonnegative integer.The basic property of FNT is that they are recursive. This paper presents a cyclic convolution based on Fermat Number Transform(FNT) in the diminished-1 number system.A Code Convolution method Without Addition(CCWA) and a Butterfly Operation method Without Addition(BOWA) are proposed to perform the FNT and its inverse(IFNT) except their final stages in the convolution.The pointwise multiplication in the convolution is accomplished by Modulo 2n +1 Partial Product Multipliers(MPPM) and output partial products which are inputs to the IFNT.Thus Modulo 2n +1 carry propagation additions are avoided in the FNT and the IFNT except their final stages and Modulo2n +1 multiplier.The execution delay of the parallel architecture is reduced evidently due to the decrease of Modulo 2n +1 carry propagation addition.compared with the existing cyclic convolution architecture,the proposed one has better throughput performance and involves less hardware complexity.Synthesis results using 130nm CMOS technology demonstrate the superiority of the proposed architecture over the reported solution. Index Terms: FERMAT NUMBER THEORETIC TRANSFORM, BUTTERFLY ARCHITECTURE, PARALLEL ARCHITECTURE FOR CYCLIC CONVOLUTION, and COMPARISON AND RESULTS -----------------------------------------------------------------------***----------------------------------------------------------------------- 1. FERMAT NUMBER THEORETIC TRANSFORM The cyclic convolution via the FNT is composed of the FNTs, the point wise multiplications and the IFNT.FNTs of two sequences {ai} and {bi} will produce two sequences {Ai} and {Bi}. Modulo 2n +1multipliers are employed to accomplish the point wise multiplication between {Ai} and {Bi} and produce the sequence {Pi}. The final resulting sequence {pi} can be obtained by taking the inverse FNT of the product sequence {Pi}.Each element in the {pi} is in the diminished-1 representation. The FNT of a sequence of length N {xi} (i= 0,1, …N- 1) is defined as Xk=     1 0 )( )1....1,0()mod( N i t ik Ni NkFnx  where Ft=22t+1, the tth Fermat; N is a power of 2 and α is an Nth root unit (i.e. αNN mod Ft=1 and α MN mod Ft ≠1 ,1≤m <N ). The notation < ik > means ik modulo N. The inverse FNT is given by xt =      1 0 )( )1....1,0(mod 1 N k t ik Nk NiFx N  Where 1/N is an element in the finite field or ring of integer and satisfies the following condition: (N.1/N) mod Ft =1 Parameters α, Ft, N must be chosen carefully and some conditions must be satisfied so that the FNT possesses the cyclic convolution property. In this project, α=2, Ft=22t +1 and N=2.2t where t is an integer. 1.1 Important operations Important operations of the cyclic convolution based on FNT with the unity root 2 include the CCWA, the BOWA and the MPPM. The CCWA and the BOWA both consist of novel modulo 2n+1 4-2 compressors mainly which are composed of the 4-2 compressor introduced by Nagamatsu. The 4-2 compressor, the novel modulo 2n+1 4-2 compressor and the BOWA are shown in Fig.4.1. In the figure, “X*” denotes the diminished-1 representation of X, i.e. X* =X-1.
  • 2. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 239 1.2 Code conversion without addition The CC converts normal binary numbers (NBCs) into their diminished-1 representation. It is the first stage in the FNT. Delay and area of CC of a 2n-bit NBC are no less than the ones of two n-bit carry propagation adders. Fig-1 Elementary operations of FNT architecture with unity root 2, (a) 4-2 compressor Fig-2 modulo 2n+1 4-2 compressor To reduce the cost, we propose the CCWA that is performed by the modulo 2n+1 4-2 compressor. Let A and B represent two operands whose widths are no more than 2n bits. We define two new variables: A = 2n AH +AL B = 2n BH +BL M0 = (2n - 1) - AH = ĀH M1 = (2n - 1) - BH = BH M2 = (2n - 1) – BL = BL If the subsequent operation of CC is modulo 2n+1 addition, assign AL, M0, BL and M1 to I0, I1, I2, I3 in the modulo 2n+1 4-2 compressor respectively. I0, I1, I2, I3 are defined as follows: I0 = I0(n-1) I0(n-2)…….I01I00 I1 = I1(n-1) I1(n-2)…….I11I10 I2 = I2(n-1) I2(n-2)…….I21I20 I3 = I3(n-1) I3(n-2)…….I31I30 We obtain the sum vector * OH and carry vector * 1H in the diminished-1 number system. The most significant bit of * 1H is complemented and connected back to its least significant bit. That is to say * OH = H0(n-1) H0(n-2)……..H01H00 * 1H = H1(n-2) ……..H11H10 H1(n-1) The result of modulo 2n+1 addition of A* and B* is equal to the result of modulo 2n+1 addition of * OH and * 1H in this way, A and B are converted into their equivalent diminished-1 representations * OH and * 1H . Let │A* +B*│2n+1, │Ā*│2n+1, │A* - B*│2n+1, and│A* +2i│2n+1 denote modulo 2n+1 addition, negation, subtraction and multiplication by the power of 2 respectively which are proposed by Leibowitz originally. If the subsequent operation is modulo 2n+1 subtraction, we assign AL, M0, M2 and BH to I0, I1, I2, I3 respectively. Then * OH and * 1H in the modulo 2n+1 4-2 compressor constitute the result of the CCWA. After CCWA, we obtain the result consisting of two diminished-1 numbers. The result also includes the information of modulo 2n+1 addition or subtraction in the first stage of previous BO 1.3 Diminished-One modulo 2n +1 Adder Design Modulo 2 +1 adders are also utilized as the last stage adder of modulo 2 +1multipliers. Modulo 2 +1multipliers find applicability in pseudorandom number generation, cryptography, and in the Fermat number transform, which is an effective way to compute convolutions Leibowitz has proposed the diminished-one number system. In the diminished-one number system, each number X is represented by X =X-1. The representation of 0 is treated in a special way. Since the adoption of this system leads to modulo 2 +1 adders and multipliers of n bits wide operands, it has been used for many residue number system implementations; Efficient VLSI
  • 3. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 240 implementations of modulo 2 +1 adders for the diminished- one number system have recently been presented. The adders although fast, are, according to the comparison presented, still slower than the fastest modulo 2 adders or the fastest modulo 2 -1 adders In this project, we derive two new design methodologies for modulo 2 +1 adders in the diminished-one number system. The first one leads to traditional Carry Look-Ahead (CLA), while the second to parallel-prefix adder architectures. Using implementations in a static CMOS technology, we show that the proposed CLA adder design methodology leads to more area and time efficient implementations than those presented, for small operand widths. For wider operands, the proposed parallel-prefix design methodology leads to considerably faster adder implementations than those presented and as fast as the integer or the modulo 2 +1 architecture presented. 2. BUTTERFLY ARCHITECTURE 2.1 Butterfly operation without addition After the CCWA, we obtain the results of modulo 2n+1 addition and subtraction in the diminished-1 representation. Each result consists of two diminished-1 values. The subsequent butterfly operation involves four operands. The proposed BOWA involves two modulo 2n+1 4-2 compressors, a multiplier and some inverters. . Fig-3 Butterfly operation without addition The multiplication by an integer power of 2 in the diminished- 1 number system in the BOWA is trivial and can be performed by left shifting the low-order n-i bits of the number by i bit positions then inversing and circulating the high order i bits into the i least significant bit positions. Thus the BOWA can be performed without the carry-propagation chain so as to reduce the delay and the area obviously. K*, L*, M*, N* are corresponding to two inputs and two outputs of previous BO in the diminished-1 number system respectively and given by M*= = = N*= = = = Where , 2.2 The FNT Architecture In the previous sections, we have presented the reconfiguration at a rather low level. The Butterfly constitutes a high parameterized function level. The fact to have this parameterized function allows designing a reconfigurable operator who’s Butterfly forms the highest level operator. Figure 4 depicts the global reconfigurable operator. Over C it is called FFT and over GF (Ft) is called FNT. This architecture has been validated by software. A simple test of calculation of FFT and IFFT, showed the validity of this structure. . Fig-4 The architecture of FNT operator In the proposed parallel architecture for cyclic convolution based on FNT, the BOWA can accept four operands in the diminished-1 number system. Every point wise multiplication only needs to produce two partial products rather than one product. The operation can be accomplished by taking away the final modulo 2n+1 adder of two partial products in the
  • 4. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 241 multiplier. Thus the final modulo 2n+1 adder is omitted and the modulo 2n+1 partial product multiplier is employed to save the delay and the area. 2.3 Modulo 2n+1 Partial Product Multiplier For the modulo 2n+1 multiplier proposed by Efstathiou, there are n+3 partial products that are derived by simple AND and NAND gates. An FA based Dadda tree that reduces the n+3 partial products into two summands is followed. Then a modulo 2n+1 adder for diminished-1 operands is employed to accept these two summands and produce the required product. In the proposed parallel architecture for cyclic convolution based on FNT, the BOWA can accept four operands in the diminished-1 number system. Every point wise multiplication only needs to produce two partial products rather than one product. The operation can be accomplished by taking away the final modulo 2n+1 adder of two partial products in the multiplier. Thus the final modulo 2n+1 adder is omitted and the modulo 2n+1 partial product multiplier is employed to save the delay and the area. 3. PARALLEL ARCHITECTURE FOR CYCLIC CONVOLUTION Based on the CCWA, the BOWA and the MPPM, we design the whole parallel architecture for the cyclic convolution based on FNT as shown in Fig.6.1. It includes the FNTs, the point wise multiplication and the IFNT mainly. FNTs of two input sequences {ai} and {bi} produce two sequences {Ai} and {Bi} (i=1, 2 …N- 1). Sequences {Ai} and {Bi} are sent to N MPPMs to accomplish the point wise multiplication and produce N pairs of partial products. Then the IFNT of the partial products are performed to produce the resulting sequence {pi} of the cyclic convolution. Fig-5 Parallel architecture for the cyclic convolution based on FNT In the architecture, the radix-2 decimation-in-time (DIT) algorithm which is by far the most widely used algorithm is employed to perform the FNT and the IFNT 16-ordinary binary 16-pairs of diminished-1 operands operands Fig-6 Structures for FNT and IFNT (Ft=28+1) The efficient FNT structure involves log2N+ 1 stages of operations. The original operands are converted into the diminished-1 representation in the CCWA stage, containing the information of modulo 2n+1 addition or subtraction in the first butterfly operation stage of the previous FNT structure. Then the results are sent to the next stage of BOWA. After log2N-1 stages of BOWAs, the results composed of two diminished-1 operands are obtained. The final stage of FNT consists of modulo 2n+1 carry-propagation adders which are used to evaluate the final results in the diminished-1 representation. The CCWA stage, the BOWA stage and the modulo 2n+1 addition stage in the FNT involves N/2 couples of code conversions including the information of modulo 2n+1 addition and subtraction, N/2 butterfly operations and N/2 couple of modulo 2n+1 addition respectively. From the definition of FNT and IFNT in section 2, the only difference between the FNT and the IFNT is the normalization factor 1/N and the sign of the phase factor αN. If ignoring the normalization factor 1/N, the above formula is the same as that given in the FNT except that all transform coefficients αN ik used for the FNT need to be replaced by αN-(ik) for the IFNT computation. The proposed FNT structure can be used to complete the IFNT as well with little modification as shown in
  • 5. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 242 Fig. 6.2(b). After the IFNT of N-point bit reversed input data, the interim results are multiplied by 1/N in the finite field or ring. Then x[j] and x[j+N/2] (j=1,2,…,N/2-1) exchange their positions to produce the final results of the IFNT in natural order. Our architecture for the cyclic convolution gives a good speed performance without requiring a complicated control. Furthermore, it is very suitable for implementation of the overlap-save and overlap adds techniques which are used to reduce a long linear convolution to a series of short cyclic convolutions. 3.1 Novel parallel architecture for FNT Since the FNT has a mathematical algorithm similar to the FFT, an FFT-type structure can be applied to perform a fast FNT.The radix-2 decimation-in-time (DIT) fast algorithm which is by far the most widely used FFT algorithm is employed. With the input data sequence stored in bit – reversed order and the CSBO performed in place, the resulting FNT sequence is obtained in natural order. The novel architecture is shown in Fig. in the case the transform length is 8 and the modulus is 24+1.in Fig. , MA and MS denote “modulo 2n+1 addition” and “modulo 2n+1 subtraction” respectively. The novel parallel N-point FNT architecture with the unity root 2 is composed of one stage of CSCC,log2N-1 stages of CSBO and one stage of modulo 2n+1 carry propagation addition. The final stage is used to evaluate the final results, each of which is a diminished-1 value. Fig-7 Novel architecture for 8-point FNT with Modulus 24+1 The existing parallel N- point FNT architecture with the unity root 2 consists of one stage of CC and log2N stages of BO. Both architecture involve the same numbers of CC stages and BO stages except their final stages .The proposed CSCC and CSBO stages and BO stages except their final stages. The proposed CSCC and CSBO overcome the disadvantage of carry-propagation addition and don’t require a zero indicator .That are more area delay efficient than the BO and CC respectively. The costs of the final stages of both architectures and approximately equal since every BO is composed of two parallel modulo 2n+1 adder chiefly. Thus the proposed parallel FNT architecture is more novel architecture is very suitable for implementation of the overlap-save and overlap Add techniques which are used reduce a long liner convolution to a series of short cyclic convolution. 4. COMPARISON AND RESULTS The delay and the area estimations of modulo 2n+1 adder and modulo 2n+1 multiplier in the cyclic convolution are given in Table as a function of the operand size n. “D(n+3)” in Table 1 is defined as shown in Table 4.2 Table-1 Area and delay estimations for arithmetic modulo 2n+1 operator Area Delay This project [3] This project [3] MA 14n 9/2nlogn+n/2+6 8 2logn+3 MM 8n2 +n- 1 9/2nlogn+8n2 +n/2+4 4D(n+3)+1 4D(n+3) 2log2n+3 “MA” and “MM” represent modulo 2n+1 adder and multiplier respectively. Table 1 and 2 indicate that for values of Ft ≥28+1 the proposed architecture comprising the CCWA and the BOWA require less delay and area than the previous one. The former results in a 12.6% reduction in area and a 26% reduction in delay respectively compared with the latter in the case Ft is 232+1 and the transform length is 64. Moreover, our algorithm will be more and more advantageous with the growth of modulus width. Table-2 Area and delay results of cyclic convolution based on FNT Ft Area(µm2 ) Delay(ns) This project [3] This project [3] 28 +1 3.5 x 105 3.5 x 105 8.9 9.9 216 +1 1.86 x 106 2.05 x 06 11.6 14.4 232 +1 1.08 x 107 1.24 x107 15.1 20.4
  • 6. IJRET: International Journal of Research in Engineering and Technology ISSN: 2319-1163 __________________________________________________________________________________________ Volume: 01 Issue: 03 | Nov-2012, Available @ http://www.ijret.org 243 CONCLUSIONS The modern programmable structures deliver the possibilities to implement DSP algorithms in dedicated embedded blocks. This makes designing of such algorithm an easy task. However the flexibility of programmable structures enables more advanced implementation methods to be used. In particular, exploitation of parallelism in the algorithm to be implemented may yield very good results. A novel parallel architecture for the cyclic convolution based on FNT is proposed in the case the principle root of unity is equal to 2 or its integer power. The FNT and the IFNT are accomplished by the CCWA and the BOWA mainly. The pointwise multiplication is performed by the modulo 2n +1 partial product multiplier. Thus there are very little modulo 2n+1 carry-propagation addition compared to the existing cyclic convolution architecture. A theoretical model was applied to access the efficiency independently of the target technology. VLSI implementations using a 0.13 um standard cell library show the proposed parallel architecture can attain lower area and delay than that of the existing solution when the modulus is no less than 28 +1. REFERENCES: [1]. C. Cheng, K.K. Parhi, “Hardware efficient fast DCT based on novel cyclic convolution structures”, IEEE Trans. Signal processing, 2006, 54(11), pp. 4419- 4434 [2]. H.C. Chen, J.I. Guo, T.S. Chang, et al., “ A memory efficient realization of cyclic convolution and its application to discrete cosine transform”, IEEE Trans. Circuit and system for video technology, 2005, 15(3), pp. 445-453 [3]. R. Conway, “Modified Overlap Technique Using Fermat and Mersenne Transforms”, IEEE Trans. Circuits and Systems II: Express Briefs, 2006, 53(8), pp.632 – 636 [4]. A. B. O'Donnell, C. J. Bleakley, “Area efficient fault tolerant convolution using RRNS with NTTs and WSCA”, Electronics Letters, 2008, 44(10), pp.648-649 [5]. H. H. Alaeddine, E. H. Baghious and G. Madre et al., “Realization of multi-delay filter using Fermat number transforms”, IEICE Trans. Fundamentals, 2008, E91A(9), pp. 2571-2577 [6]. N. S. Rubanov, E. I. Bovbel, P. D. Kukharchik, V. J. Bodrov, “Modified number theoretic transform over the direct sum of finite fields to compute the linear convolution”, IEEE Trans. Signal Processing, 1998, 46(3), pp. 813-817 [7]. T. Toivonen, J. Heikkila, “Video filtering with fermat number theoretic transforms using residue number system”, IEEE Trans. circuits and systems for video technology, 2006, 16(1), pp. 92-101 [8]. L. M. Leibowitz, “A simplified binary arithmetic for the Fermat number transform,” IEEE Trans. Acoustics Speech and Signal Processing, 1976, 24(5):356-359 BIOGRAPHIES: A.LAXMAN received the M.TECH degree from Srinidhi Institute Of Technology, Ghatkesar, Hyderabad Currently he is working as an asst. prof. in MAHAVEER Inst. Of Science and Technology, Hyderabad A.VAMSHIDHAR REDDY received the M.TECH degree from Bandari Srinivas Institute Of Tech.,Chevella, Hyderabad. Currently he is working as an assoc. prof. in RRS College of Eng. And Tech., Hyderabad L.PRAKASH received the M.TECH degree from TRR College of Eng. And Tech.,Hyderabad Currently he is working as an assoc. prof. in DVR College of Eng. And Tech., Hyderabad T.SATYANARAYANA received the M.TECH degree from JBIT,Moinabad, Hyderabad Currently he is working as an assoc. prof. in DVR College of Eng. And Tech, Hyderabad