SlideShare a Scribd company logo
Residue Number systems
P.V. Ananda Mohan
FNAE, Fellow IEEE
pvam@vsnl.net
IEEE CAS Chapter
8th
March 2008
Bangalore
Why RNS
• Using several processors in parallel, some operations can be faster.
Mod m1
Processor
r1
O1
Mod m2
Processor
r2
O2
Mod mj
Processor
rj
Oj
RNS to Binary Converter
Result
Binary to
RNS
converter
Binary to
RNS
converter
Binary to
RNS
converter
Input Binary Number
Instruction
Points to be considered
• Choice of moduli set
• Computation time and area requirements for the
following blocks:
• Binary to RNS conversion
• RNS to Binary conversion
• Multiplication
• Scaling
• Base extension
• Sign detection
• Comparison
Binary to RNS conversion
• (a) Conventional method: division to get
residue throwing away quotient
• --Very time consuming.
• Example (1000 0001 1010) mod 13?
• 2074 mod 13 = 7.
• (b) Iterative reduction mod mi
• (Capocelli and Giancarlo)
• Start with LSBs. Store residues of powers of two in
memory go on accumulating till end mod 13:
• 1,2,4,8,3,6,12,11,9,5,10,7
• Example (1000 0001 1010) mod 13?
• Last three bits you can skip.
• 2+23
mod 13 = 2+8 = 10
• 10+24
mod 13 = 10+3=0 and so on
• Hardware needed : a modulo adder, Memory containing
residues of Powers of 2 mod 13.
• (c) Use periodic properties of moduli
• For example consider modulus 18.
• Residues of powers of two are
(1,2,4,8,3,6), (12,11,9,5,10,7),(1,2,4,8..)
etc
• Note the periodic property
• (1,2,4,8,3,6), (-1,-2,-4,-8,-3,-6),
(1,2,4,8,3,6), (-1,-2,-4,-8,-3,-6)
Consider mod 89
• Residues of successive powers of two are
1,2,4,8,16,32,64,39,78,67,45,
1,2,4,8,16,32,64,39,78,67,45,
• Thus period (or order) is 11
• i.e. 211
mod 89=1
• Implementation: Group input bits based on
period or half period.
• If based on period, add all words with same
period mod 211
and have one Binary to RNS
converter of Capocelli and Giancarlo.
• If based on half-period add all odd fields and
add all even fields, Compute odd-even and use
Capocelli and Giancarlo method
• Example
• 2074 mod 13= (100000 011010) mod 13
• = (26-32) mod 13 = -6 mod 13 = 7.
• 2074 mod 7 = (100 000 011 010) mod7
• = (4+0+3+2) mod 7=2
• Use for full period case, Adders with end
around carry (EAC) and for half period
case, two adders with EAC
• Delay is (2+3+2)D
1 0 0 0 0 0
0 1 1
0 1 0
0
0
100
000
011
-----
111Sum
0000 Carry
010
-------
101 Sum
0100 Carry
------
1001
1
------
010
Modulo adders and subtractors
• (X+Y) mod mi = (X+Y) or (X+Y-mi)
• (X-Y) mod mi = (X-Y) or (X-Y+mi)
(X+Y)
Two’s complement of mi or
(2n
-mi)
X Y
2:1 MUX
select
Sign
bit
(X+Y) mod mi
n bit Adder
(n +1) bit Adder
Delay = nDFA+(n+1)DFA+DMUX
Area = nAFA+(n+1)AFA+n D2:1MUX
Cascade of Adders
Faster Adder Implementations
• Subtractor is same bur two’s compliment
of input to be added.
X Y
select
n bit Adder
(X+Y)
Two’s complement of
mi or (2n
-mi)
2:1 MUX
Sign
bit
(X+Y) mod mi
(n +1) bit Adder
Delay = (n+2)DFA+DMUX
Area = nAFA+2(n+1)AFA+n D2:1MUX
Modulo Multipliers
• Area Multiplier+divider
• Delay Multiplier+divider
• Divider can be restoring or non-restoring.
• Word length of the processor 2n bits
X Y
Multiplier
XY
mi
Divider
Quotient
Throw it.
Reminder
Brickell’s Algorithm based Modulo
Multipliers
• Maximum word length (n+1) bits for taking
one bit at a time.
• Higher radix feasible.
• Area intensive
• Other methods exist such as using
Redundant Arithmetic, non-overlapping
multibit recoding
• 13.15 mod 23
• We do not want to do in a straight forward
manner .
• Write b = 13 in binary form:
• b3b2b1b0 =1101
• Do repeatedly starting from MSB:
• Old= (2.Old + bi.A) mod 23
EXAMPLE
• b3b2b1b0 =1101; A =15, mi = 23
• P= (2.0 + 1.15) mod 23 = 15
• P=(2.15 + 1.15) mod 23 = 22
• P=(2.22 + 0.15) mod 23 = 21
• P=(2.21+ 1.15) mod 23 = 11
• Maximum value of P <3(23) i.e. 3mi
• Modulo subtraction is by two comparisons:
• Is P>N? or Is P>2n?
• Answer is either P, P-mi, P-2mi; choose based on sign of P-mi,
P-2mi.
• Example 45 mod 23, anwers are 45,45-23=22,45-46=-1; since P-
2mi is negative and P-mi is positive, P-mi is the correct result.
• Multiple precision arithmetic to be used in PC based
implementations
Architecture for Modmul
LSB of
Zero
Old
2Old
A bi
(n+2) bit adder
Adder
TC of mi
Adder
TC of 2mi
3:1 Mux
Latch
Latch
ModMUL
• Computation time= n[(n+2)DFA+DMux]
• Area = 3(n+2)AFA+A3:1MUX+nAAND
Modmul for IDEA
• IDEA (International Data Encryption
Algorithm) uses (xy) mod (216
+1) as a
programmable S-Box (Substitution Box),
where x and y are 16 bit words.
• Ideal for DSPs
• Get P=xy a 32 bit word.
• Subtract MSB 16 bit word from LSB 16 bit
word. If negative, add (216
+1)
RNS to Binary Conversion
• CRT based
• MRC based
• CRT: RNS {m1,m2,m3} Residues {x1,x2,x3}
• Define Mi=M/mi and M=m1m2m3
• Decoded Binary number X
• = [M1{(1/M1) mod m1}x1+ {M2 (1/M2) mod m2}x2+ M3{(1/M3) mod
m3}x3]mod M
• e.g. {3,5,7} M=105, M1=35,M2=21,M3=15
• (1/35) mod 3 = 2, (1/21) mod 5=1, (1/15) mod 7=1.
• X= [70x1+21x2+15x3] mod 105
• Consider (1,2,3), X = (70+42+45) mod 105 = 157 mod 105 = 52
• Generally, Mi are large, Mi{(1/Mi) mod mi} are stored,involves
multiplication of these large numbers by xi in parallel and adding.
CRT Implementation
• Modulo M adder may involve n subtractions for a n
moduli system
• Delay = D + D
X1 [M1(1/M1) mod m1]
Multiplier Multiplier
X2
Multiplier
[M3(1/M3) mod m3]
[M2(1/M2) mod m2]
X3
Mod M adder
X
MRC
• Note XA= (1/m3) mod m1 and
• XB= (1/m3) mod m2, XC= (1/m2) mod m1
• UC, UB and r3 are known as MRC digits.
m1 m2 m3
r1 r2 r3
- r3 - r3
(r1-r3) mod m1 = p (r2-r3) mod m2 =q
XA XB
UA UB
-UB
(UA-UB) mod m1 =r
XC
UC
Example RNS {7,8,9}
7 8 9
1 2 3
-3 -3
5 7
x4 x1
6 7
-7
6
x1
6
X = 6.72+7.9+3 = 498
MRC versus CRT
• MRC is sequential but avoids reduction modulo a large
number needed in CRT .
• MRC needs storage of multiplicative inverses, Modulo
subtraction and modulo multiplication, final addition of n
numbers for a n moduli RNS,
• Multiplicative inverses can be powers of two small
numbers such as 6 or 9 for powers of two related moduli
sets.
• Moduli set with all MIs of value unity also suggested e.g
{3,7,22}, Only modulo subtractions will do for evaluating
MRC digits; But multipliers are cumbersome.
• Generally need ROMs.
Architecture for XY mod 17
x3 x2 x1 x0
y3 y2 y1 y0
y0x3 yox2 y0x1 yox0
y1x3 y1x2 y1x1 y1x0 (y1x3)′ added 1
y2x3 y2x2 y2x1 y2x0 (y2x3)′ (y2x2)′ added 3
y3x3 y3x2 y3x1 y3x0 (y3x3)′ (y3x2)′ (y3x1)′ added 7
Write MSBs bi as (1- bi′)
′
Modulo 17 adder
1011
1101
1011
00001
101101
1011010
Adding 4 words in a CSA
1011
0001
1101
0111
10010 Added 1
1010
1111
00101 Added 1
0100 add 4 (correction
0111 term in a modulo
Scaling
• Division by a number
• E.g. RNS given {3,5,7}. Divide 99 (0,4,1)
by 11 (2,1,4).
• If division is exact, multiply 99 by
multiplicative inverse of 11.
• (1/11) = (2,1,2) =86 (Note (1/11) mod 3 =
2 etc.
• (99/11) = (0,4,1)x(2,1,2)= (0,1,4) =9
Scaling by arbitrary number when
division is not exact
• Example 1 : 100/13 in RNS {3,5,7}
• 100 = (1,0,2}
• Direct method by multiplying with (1/13) will not work.
• 100 = 1,0,2
• (1/13) = 1,2,6
• 100/13 = 1,0,5 = 40 wrong.
• First you need to find residue of 100mod 13 = 9.
• Subtract from 100 to get (100-9)=91
• 100 = 1,0,2
• 9 = 0,4,2
• 91 = 1,1,0
• (1/13) = 1,2,6
• 91/13 = 1,2,0 = 7.
Scaling by one modulus
• Divide 100/7
• 100 = 1,0,2
• Subtract residue 100mod 7 first =2
• 100 = 1, 0, 2
• 2 = 2, 2, 2
• 98 = 2, 3, 0
• x(1/7) = x1 x3
• = 2 4
• Now you need to do base extension to get RNS number again
(2,4,0)
• Scaling by another modulus aso feasible in the same way.
• Note that MRC does this.
Scaled Residue /Montgomery’s Modular Multiplication
• Example: To evaluate (5.6) mod 13 = 4.
• Prescaling by 16: 5 = (5.16) mod 13 = 2, (6.16) mod 13
= 5
• Montgomery step = [(5.16)(6.16)/16] mod 13 = (2.5/16)
mod 13 = (10/3) mod 13 = (10.9) mod 13 = 12.
• Result is obtained by post scaling: (12/16) mod 13 =
(12/3) mod 13 = 4.
• Prescaling is Binary to RNS conversion: Successive
multiplication by 2 and modulo reduction , (5.2) mod 13=
10, (10.2) mod 13 = (7.2) mod 13= 1, (1.2) mod 13 = 2.
• Post scaling is another Montgomery step.
• Montgomery step avoids modulo reduction. Only conditional addition. If
LSB is 1 add modulus, ignore LSB.
• Example (2.5/16) mod 13.
• Four steps are needed.
• Each step a partial product is added and result scaled by two.
• 2 = 0010 (binary)
• Computation of (0010)x5/16:
• Formula: (old value+ bix5)/2
• Old value =0.
• (0+0.5)/2= 0
• (0+1x5)/2 = (5+13)/2 = 9 since LSB of current result in brackets is 1.
• (9+0.5)/2 = (9+13)/2 = 11
• (11+0.5)/2 = (11+13)/2 = 12.
• Addition of two numbers using a (n+1)-bit CPA, n AND gates, n Flip-flops
•
Higher Radix Montgomery’s
Technique
• Higher Radix possible.
• 16 or 8 or 4 bits at a time can be considered.
• Example considering 4 bits at a time:
• Consider [(10001100)/16] mod 23
• Find (-1/23) mod 16=(-1/7) mod16 = 9 ((-1/mi) mod 2k
)
• Find 10001100 mod 16 = four LSBs= 12 (X mod 2k
)
• Find (12x9) mod 16 = 12 α= [(-X/mi) mod 2k]
• Find 10001100+12(23) = 11010 0000 (X+ αmi)
• Ignore last 4 bits to get 26. (X+ αmi)/2k
• Need a multiplier mod 16 to get the multiple to be added.
• Then addition of shifted versions of modulus (in this case of
radix 16, four shifted versions) using a CASA tree followed by
CPA.
Popular Powers-of-two related
moduli set
• (2n
-1, 2n
, 2n
+1)
• Dynamic range <3n bits.
• Example 16 bit DSP needs n = 6; RNS
{63,64,65}
• RNS to binary conversion using CRT can
be done very fast.
• .
• The beauty is these are powers of two related
facilitating easy implementation.
 
 
  
  
 
    
 
1
2
1
2
2
mod
1
2
2
1
2
2
1
1
2
1
2
1
2
1
2
1
1
2
2
1
2
2
1
3
1
2
1
1
1






















 
















 n
n
n
n
n
m
n
n
m
n
n
m
x
n
n
x
n
n
x
n
n
B
 
  2
1
2
mod
1
2
2
1 1











 n
n
n
n
  
  1
2
mod
1
2
1
2
1











 n
n
n
 
  1
2
1
2
mod
1
2
2
1 1










 n
n
n
n
The various multiplicative inverses used above are as follows:
• Example {7,8,9}
• [(32+4)x1-8x2+(36-1)x3] mod 63 yields 6
MSBs
      
    
 
1
2
1
2
2
mod
1
2
2
)
1
2
(
1
2
1
2
1
2
2
2 1
3
2
1
1









 
 n
n
n
n
n
n
n
n
n
n
n
x
x
x
B
        
 
1
2
1
2
2
mod
1
2
2
)
1
2
(
2
1
2
2
2
)
( 1
3
2
2
1
1
2 













 
 n
n
n
n
n
n
n
n
n
n
x
x
x
x
B
     
   
1
2
mod
1
2
)
1
2
(
2
1
2
2
2
)
( 2
1
3
2
1
1
2







 
 n
n
n
n
n
n
n
x
x
x
x
B
Subtract x2
from both sides
Divide by 2n
to get 2n MSBs of the result as
Realization
• Andraros and Ahmad : Four 2n-bit words to be added using
two levels of Adders of rotated bits.
• Piestrak suggested using CSA two level with CPA using end
around carry for adding four 2n-bit words
• Delay - (4n+2) DFA, Area = (6n) AFA
• Suggested Low delay version (2n+2) DFA+DMUX also, 2n
A2:1MUXes needed.
• Dhurkadas (NPOL, Cochin) suggested simplification to three
2n-bit inputs to be added
• Delay – (4n+2) DFA, Area = (4n) AFA
• Bhardwaj, Premkumar, Srikanthan [1998] suggested using n-
bit adders e.g Carry select adders n-bit
• Wang et al [2002] 2n-bit as well as n-bit adders three
converters.
{7,8,9} example (x1,x2,x3)
     
   
1
2
mod
1
2
)
1
2
(
2
1
2
2
2
)
( 2
1
3
2
1
1
2







 
 n
n
n
n
n
n
n
x
x
x
x
B
x1, x2 3 bit, x3 4 bit
x12x11x10, x22x21x20, x33x32x31x30
 
   
1
2
mod
)
1
2
2
(
2
)
2
2
(
2
)
( 2
1
1
2
3
2
1
1
1
2
2







 


 n
n
n
n
n
n
n
x
x
x
x
B
•[(32+4)x1-8x2+(36-1)x3] mod 63 :
x10 x12 x11 x10 x12 x11
x22′ x21′ x20′ 1 1 1
X3x x32 x31 x3x x32 x31
1 1 x33 ′ x32 ′ x31′ x30 ′
X3x= x30+x33 since either x30 or
x33 exist
Dhurkadas Simplified as
x10 x12 x11 x10 x12 x11
x22′ x21′ x20′ y x31′ x30 ′
X3x x32 x31 x30 x32 x31
Y= (x33+x32)′
Other three, Four and Five moduli
sets
• {2n
,2n
-1,2n-1
-1} Hiasat and Abdel-Aty-Zohdy, Wang, Wang, Swamy
and Ahmad: not better than popular moduli set, multipliers etc are
simpler
• {2n
,2n
-1,2n+1
-1} Ananda Mohan better in area or time, multipliers
are simpler
• {2n
,22n
-1,22n
+1} Ananda Mohan better than Cao et al four moduli
set, one large modulus
• {2n
,2n
-1,2n
+1, 2n+1
-1 } Vinod and Premkumar
• {2n
,2n
-1,2n
+1, 2n+1
-1 } Bhardwaj, Srikanthan, Ananda Mohan and
Premkumar Area and Time intensive
• {2n
,2n
-1,2n
+1, 22n
+1} Cao et al better than other four moduli sets
but one modulus bigger in size.
• {2n
-3,2n
-1,2n
+1,2n
+3} Sheu et al uses ROM not attractive
• {2n-1
-1, 2n
-1,2n
,2n
+1,2n+1
-1} Cao et al 2007 Increases cardinality to 5,
DR of 5n bits but RNS to Binary conversion is slower/area
consuming
Residue-Number-Systems the organization uses a weak .ppt
• M2 {2k
,2k
-1,2k-1
-1}, M1{2k
-1,2k
,2k
+1},
• M4{2k
,2k
-1,2k+1
-1}, M3{2k
-1,2k
,2k
+1,2k+1
-1}
Comparison of various converters for three
moduli sets
Converter Moduli set FA HA AND
/OR
XOR
/XNOR
Other Delay
[8] M2 6n-1 3n-7 ---- ----- (n-1) MUX 4nDFA
[5] M1 6n+1 ---- n+3 n+1 2n MUX (n+2)DFA
+DMUX
[3,4] M1 4n --- 2 --- ---- (4n+1)DFA
[6] CI M1 4n 1 ----- 1 2 MUX (4n+1)DFA
[6] CII M1 6n 1 1 1 (2n+2) MUX (n+1)DFA
[6] CIII M1 4n 1 (2n+2) (2n-1) (2n+2) MUX (n+1)DFA
Converter I M4 4n+3 --- n n ----- (6n+5)DFA
Converter II M4 14n+21 2n+3 --- --- (2n+1) 3:1MUX (2n+7)DFA
Converter III M4 12n+19 2n+2 --- --- 10(2n+1)AROM
(2n+1) 2:1MUX
(2n+7)DFA
[9] M3 37n+14 -- -- -- -- (14n+8)DFA
[12,13] 4-stage CE M3 n2
/2+11n+4 1 -- -- 2 MUX (11n+l+8)DFA
Base Extension
• Needed in scaling or division.
• Uses MRC fist to divide followed by base
extension.
• CRT can be used but is cumbersome.
Example: {3,5,7} 52= (1,2,3) Scale by 7
3 5 7
1 2 3
-3 -3
1 4
x1 x3
1 2 2 First Base Extension step
-2
2
X2
1 +(1x5)mod 7 Base Extension step
0
RSA using RNS/ECC
• Needs computation of PQ
mod N
• e.g 1023
mod 37 = (1016
)(104
)(102
)(101
) mod 37
• Successive squaring mod 37 and Multiplications mod 37 of selected
results.
• Needs (XY) mod N ass basic step where X,Y,N are 1024 bit
numbers.
• RNS can be used.
• Montgomery technique has been used to find (X′Y′/M) mod N where
M is the product of Moduli in RNS.
• Needs two RNS dynamic ranges M and M′ which are mutually
Prime and a redundant modulus
• Determine q such that (X′Y′+qN) is a multiple of M.
• Extend q to RNS with Dynamic range M′.
• Find r = (X′Y′+qN)/M in second RNS
• Do base extension to First RNS
Sign Detection and Comparison
• Is difficult
• Needed to go to Binary number to detect
sign
• Comparison is also difficult Needed to go
to Binary numbers or sequential
techniques such as comparing Mixed
Radix Digits.
Applications
• FIR Filters (ensure that RNS dynamic
range is larger than that of the filter)
• Digital Frequency Synthesis
• Video Filters
• 2-D filters
• NTTs (Number Theoretic Transforms)
• Cryptography
Applications of RNS
• [5] Freking, W.L., and Parhi, K.K., "Low-power FIR digital filters using residue
arithmetic, " in Conf. Record 31st Asil. Conf. Signals, Syst. and Comput. (ACSSC
1997), vol. 1, Pacific Grove, CA USA [1997], 739-43.
• [6] D'Amora, A. et al., "Reducing power dissipation in complex digital filters by using
the quadratic residue number system, " in Conf. Record 34th Asil. Conf. Signals, Syst.
Comput. (ACSSC 2000), vol. 2, Pacific Grove, CA USA [2000], 879-83.
• [7] Cardarilli, G.C. et al., "Low-power implementation of polyphase filters in Quadratic
Residue Number system," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 2004), vol. 2,
Vancouver, BC, Canada [2004], 725-728.
• [8] Shanbag, N.R., and Siferd, R.E., A single-chip pipelined 2-D FIR filter using residue
Arithmetic, IEEE JSSC -26[1991], 796-805.
• [9] Tuukka Toivonen., and Janne Heikkilä., Video Filtering With Fermat Number
Theoretic Transforms Using Residue Number System, IEEE CSVT-16[2006], 128-135.
• [10] Schwemmlein, J., and Posch, K.C., Reinhard Posch. RNS-modulo reduction upon
a restricted base value set and its applicability to RSA cryptography, Computer &
Security [1998], 17, 637-650.
• [11]Hanae Nozaki., Masahiko Motoyama., Atsushi Shimbo., and Shinichi Kawamura.,
Implementation of RSA algorithm based on RNS Montgomery multiplication, In C. Paar
(ed). Cryptographic Hardware and Embedded Systems – CHES, Springer-Verlag,
Berlin, Germany [2001], 364-376.
• [12] Jean-Claude Bajard., Laurent Stephane Didier., Peter Kornerup.,
An RNS Montgomery modular multiplication Algorithm, IEEE C-47
[1998], 766-776.
• [13] Jean-Claude Bajard., and Laurent Imbert., A Full RNS
Implementation of RSA, IEEE C-53[2004],769-774.
• [14] Schinianakis, D.M., Kakarountas. A.P., and Stouraitis. T., A New
Approach to Elliptic Curve Cryptography: an RNS Architecture, IEEE
MELECON, May 16-19, Benalmádena (Málaga), Spain [2006], 1241-
1245.
• [15] Lie-Liang Yang., and Lajos Hanzo., A Residue Number System
Based Parallel Communication Scheme Using Orthogonal Signaling:
Part I—System Outline, IEEE VT-51[2002],1534-1546.
• [16] Chaves, R., and Sousa, L., “RDSP: A RISC DSP based on
residue number system,” in Proc. Euro. Symp. Digital System
Design: Architectures, Methods, and Tools, Antalya, Turkey [2003],
128-135.
• [17] Wei, W. et al., "RNS application for digital image processing," in
4th IEEE Int. Workshop Syst.-on-Chip for Real Time Applications,
Banff, Alta., Canada [2004],77-80.
Conclusion
• Very mature today
• Can be used in place of Custom DSP
blocks
• Research on newer moduli sets with high
cardinality and Faster Reverse
Conversion is of interest

More Related Content

PDF
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
PDF
Number Theory and Its Applications in Cryptography
PDF
A study on number theory and its applications
PPT
microprocessors
PPTX
Modular arithmetic
PDF
Cryptography
PDF
Subquad multi ff
PDF
routing (1).pdf shjsjajajaaknsjsjskakakskaksksksk
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
Number Theory and Its Applications in Cryptography
A study on number theory and its applications
microprocessors
Modular arithmetic
Cryptography
Subquad multi ff
routing (1).pdf shjsjajajaaknsjsjskakakskaksksksk

Similar to Residue-Number-Systems the organization uses a weak .ppt (20)

PPT
lections tha detail aboot andom nummerss
PPTX
Rtl design optimizations and tradeoffs
PPTX
Teknik Simulasi
PPTX
Sandia Fast Matmul
PPTX
Chinese_Remainder_Theorem.pptx
PDF
Pseudo Random Number Generators
PPTX
distance_matrix_ch
PDF
Naist2015 dec ver1
PDF
"Mesh of Periodic Minimal Surfaces in CGAL."
PPTX
SIMD.pptx
PDF
IRJET- Radix 8 Booth Encoded Interleaved Modular Multiplication
PDF
2016 03-03 marchand
PPTX
PPTX
Computer Graphics Unit 1
PPT
Intro week3 excel vba_114e
PDF
Computer graphics 2
PPTX
A framework for practical fast matrix multiplication
PPTX
Digital Logic Design Lectures on Flip-flops and latches and counters
PPTX
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
lections tha detail aboot andom nummerss
Rtl design optimizations and tradeoffs
Teknik Simulasi
Sandia Fast Matmul
Chinese_Remainder_Theorem.pptx
Pseudo Random Number Generators
distance_matrix_ch
Naist2015 dec ver1
"Mesh of Periodic Minimal Surfaces in CGAL."
SIMD.pptx
IRJET- Radix 8 Booth Encoded Interleaved Modular Multiplication
2016 03-03 marchand
Computer Graphics Unit 1
Intro week3 excel vba_114e
Computer graphics 2
A framework for practical fast matrix multiplication
Digital Logic Design Lectures on Flip-flops and latches and counters
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
Ad

Recently uploaded (20)

PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Sustainable Sites - Green Building Construction
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
Construction Project Organization Group 2.pptx
PPTX
Welding lecture in detail for understanding
PDF
composite construction of structures.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
UNIT 4 Total Quality Management .pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
Embodied AI: Ushering in the Next Era of Intelligent Systems
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Foundation to blockchain - A guide to Blockchain Tech
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Sustainable Sites - Green Building Construction
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Geodesy 1.pptx...............................................
Construction Project Organization Group 2.pptx
Welding lecture in detail for understanding
composite construction of structures.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
UNIT 4 Total Quality Management .pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Ad

Residue-Number-Systems the organization uses a weak .ppt

  • 1. Residue Number systems P.V. Ananda Mohan FNAE, Fellow IEEE pvam@vsnl.net IEEE CAS Chapter 8th March 2008 Bangalore
  • 2. Why RNS • Using several processors in parallel, some operations can be faster. Mod m1 Processor r1 O1 Mod m2 Processor r2 O2 Mod mj Processor rj Oj RNS to Binary Converter Result Binary to RNS converter Binary to RNS converter Binary to RNS converter Input Binary Number Instruction
  • 3. Points to be considered • Choice of moduli set • Computation time and area requirements for the following blocks: • Binary to RNS conversion • RNS to Binary conversion • Multiplication • Scaling • Base extension • Sign detection • Comparison
  • 4. Binary to RNS conversion • (a) Conventional method: division to get residue throwing away quotient • --Very time consuming. • Example (1000 0001 1010) mod 13? • 2074 mod 13 = 7.
  • 5. • (b) Iterative reduction mod mi • (Capocelli and Giancarlo) • Start with LSBs. Store residues of powers of two in memory go on accumulating till end mod 13: • 1,2,4,8,3,6,12,11,9,5,10,7 • Example (1000 0001 1010) mod 13? • Last three bits you can skip. • 2+23 mod 13 = 2+8 = 10 • 10+24 mod 13 = 10+3=0 and so on • Hardware needed : a modulo adder, Memory containing residues of Powers of 2 mod 13.
  • 6. • (c) Use periodic properties of moduli • For example consider modulus 18. • Residues of powers of two are (1,2,4,8,3,6), (12,11,9,5,10,7),(1,2,4,8..) etc • Note the periodic property • (1,2,4,8,3,6), (-1,-2,-4,-8,-3,-6), (1,2,4,8,3,6), (-1,-2,-4,-8,-3,-6)
  • 7. Consider mod 89 • Residues of successive powers of two are 1,2,4,8,16,32,64,39,78,67,45, 1,2,4,8,16,32,64,39,78,67,45, • Thus period (or order) is 11 • i.e. 211 mod 89=1
  • 8. • Implementation: Group input bits based on period or half period. • If based on period, add all words with same period mod 211 and have one Binary to RNS converter of Capocelli and Giancarlo. • If based on half-period add all odd fields and add all even fields, Compute odd-even and use Capocelli and Giancarlo method
  • 9. • Example • 2074 mod 13= (100000 011010) mod 13 • = (26-32) mod 13 = -6 mod 13 = 7. • 2074 mod 7 = (100 000 011 010) mod7 • = (4+0+3+2) mod 7=2 • Use for full period case, Adders with end around carry (EAC) and for half period case, two adders with EAC
  • 10. • Delay is (2+3+2)D 1 0 0 0 0 0 0 1 1 0 1 0 0 0 100 000 011 ----- 111Sum 0000 Carry 010 ------- 101 Sum 0100 Carry ------ 1001 1 ------ 010
  • 11. Modulo adders and subtractors • (X+Y) mod mi = (X+Y) or (X+Y-mi) • (X-Y) mod mi = (X-Y) or (X-Y+mi) (X+Y) Two’s complement of mi or (2n -mi) X Y 2:1 MUX select Sign bit (X+Y) mod mi n bit Adder (n +1) bit Adder Delay = nDFA+(n+1)DFA+DMUX Area = nAFA+(n+1)AFA+n D2:1MUX Cascade of Adders
  • 12. Faster Adder Implementations • Subtractor is same bur two’s compliment of input to be added. X Y select n bit Adder (X+Y) Two’s complement of mi or (2n -mi) 2:1 MUX Sign bit (X+Y) mod mi (n +1) bit Adder Delay = (n+2)DFA+DMUX Area = nAFA+2(n+1)AFA+n D2:1MUX
  • 13. Modulo Multipliers • Area Multiplier+divider • Delay Multiplier+divider • Divider can be restoring or non-restoring. • Word length of the processor 2n bits X Y Multiplier XY mi Divider Quotient Throw it. Reminder
  • 14. Brickell’s Algorithm based Modulo Multipliers • Maximum word length (n+1) bits for taking one bit at a time. • Higher radix feasible. • Area intensive • Other methods exist such as using Redundant Arithmetic, non-overlapping multibit recoding
  • 15. • 13.15 mod 23 • We do not want to do in a straight forward manner . • Write b = 13 in binary form: • b3b2b1b0 =1101 • Do repeatedly starting from MSB: • Old= (2.Old + bi.A) mod 23
  • 16. EXAMPLE • b3b2b1b0 =1101; A =15, mi = 23 • P= (2.0 + 1.15) mod 23 = 15 • P=(2.15 + 1.15) mod 23 = 22 • P=(2.22 + 0.15) mod 23 = 21 • P=(2.21+ 1.15) mod 23 = 11 • Maximum value of P <3(23) i.e. 3mi • Modulo subtraction is by two comparisons: • Is P>N? or Is P>2n? • Answer is either P, P-mi, P-2mi; choose based on sign of P-mi, P-2mi. • Example 45 mod 23, anwers are 45,45-23=22,45-46=-1; since P- 2mi is negative and P-mi is positive, P-mi is the correct result. • Multiple precision arithmetic to be used in PC based implementations
  • 17. Architecture for Modmul LSB of Zero Old 2Old A bi (n+2) bit adder Adder TC of mi Adder TC of 2mi 3:1 Mux Latch Latch
  • 18. ModMUL • Computation time= n[(n+2)DFA+DMux] • Area = 3(n+2)AFA+A3:1MUX+nAAND
  • 19. Modmul for IDEA • IDEA (International Data Encryption Algorithm) uses (xy) mod (216 +1) as a programmable S-Box (Substitution Box), where x and y are 16 bit words. • Ideal for DSPs • Get P=xy a 32 bit word. • Subtract MSB 16 bit word from LSB 16 bit word. If negative, add (216 +1)
  • 20. RNS to Binary Conversion • CRT based • MRC based • CRT: RNS {m1,m2,m3} Residues {x1,x2,x3} • Define Mi=M/mi and M=m1m2m3 • Decoded Binary number X • = [M1{(1/M1) mod m1}x1+ {M2 (1/M2) mod m2}x2+ M3{(1/M3) mod m3}x3]mod M • e.g. {3,5,7} M=105, M1=35,M2=21,M3=15 • (1/35) mod 3 = 2, (1/21) mod 5=1, (1/15) mod 7=1. • X= [70x1+21x2+15x3] mod 105 • Consider (1,2,3), X = (70+42+45) mod 105 = 157 mod 105 = 52 • Generally, Mi are large, Mi{(1/Mi) mod mi} are stored,involves multiplication of these large numbers by xi in parallel and adding.
  • 21. CRT Implementation • Modulo M adder may involve n subtractions for a n moduli system • Delay = D + D X1 [M1(1/M1) mod m1] Multiplier Multiplier X2 Multiplier [M3(1/M3) mod m3] [M2(1/M2) mod m2] X3 Mod M adder X
  • 22. MRC • Note XA= (1/m3) mod m1 and • XB= (1/m3) mod m2, XC= (1/m2) mod m1 • UC, UB and r3 are known as MRC digits. m1 m2 m3 r1 r2 r3 - r3 - r3 (r1-r3) mod m1 = p (r2-r3) mod m2 =q XA XB UA UB -UB (UA-UB) mod m1 =r XC UC Example RNS {7,8,9} 7 8 9 1 2 3 -3 -3 5 7 x4 x1 6 7 -7 6 x1 6 X = 6.72+7.9+3 = 498
  • 23. MRC versus CRT • MRC is sequential but avoids reduction modulo a large number needed in CRT . • MRC needs storage of multiplicative inverses, Modulo subtraction and modulo multiplication, final addition of n numbers for a n moduli RNS, • Multiplicative inverses can be powers of two small numbers such as 6 or 9 for powers of two related moduli sets. • Moduli set with all MIs of value unity also suggested e.g {3,7,22}, Only modulo subtractions will do for evaluating MRC digits; But multipliers are cumbersome. • Generally need ROMs.
  • 24. Architecture for XY mod 17 x3 x2 x1 x0 y3 y2 y1 y0 y0x3 yox2 y0x1 yox0 y1x3 y1x2 y1x1 y1x0 (y1x3)′ added 1 y2x3 y2x2 y2x1 y2x0 (y2x3)′ (y2x2)′ added 3 y3x3 y3x2 y3x1 y3x0 (y3x3)′ (y3x2)′ (y3x1)′ added 7 Write MSBs bi as (1- bi′) ′ Modulo 17 adder 1011 1101 1011 00001 101101 1011010 Adding 4 words in a CSA 1011 0001 1101 0111 10010 Added 1 1010 1111 00101 Added 1 0100 add 4 (correction 0111 term in a modulo
  • 25. Scaling • Division by a number • E.g. RNS given {3,5,7}. Divide 99 (0,4,1) by 11 (2,1,4). • If division is exact, multiply 99 by multiplicative inverse of 11. • (1/11) = (2,1,2) =86 (Note (1/11) mod 3 = 2 etc. • (99/11) = (0,4,1)x(2,1,2)= (0,1,4) =9
  • 26. Scaling by arbitrary number when division is not exact • Example 1 : 100/13 in RNS {3,5,7} • 100 = (1,0,2} • Direct method by multiplying with (1/13) will not work. • 100 = 1,0,2 • (1/13) = 1,2,6 • 100/13 = 1,0,5 = 40 wrong. • First you need to find residue of 100mod 13 = 9. • Subtract from 100 to get (100-9)=91 • 100 = 1,0,2 • 9 = 0,4,2 • 91 = 1,1,0 • (1/13) = 1,2,6 • 91/13 = 1,2,0 = 7.
  • 27. Scaling by one modulus • Divide 100/7 • 100 = 1,0,2 • Subtract residue 100mod 7 first =2 • 100 = 1, 0, 2 • 2 = 2, 2, 2 • 98 = 2, 3, 0 • x(1/7) = x1 x3 • = 2 4 • Now you need to do base extension to get RNS number again (2,4,0) • Scaling by another modulus aso feasible in the same way. • Note that MRC does this.
  • 28. Scaled Residue /Montgomery’s Modular Multiplication • Example: To evaluate (5.6) mod 13 = 4. • Prescaling by 16: 5 = (5.16) mod 13 = 2, (6.16) mod 13 = 5 • Montgomery step = [(5.16)(6.16)/16] mod 13 = (2.5/16) mod 13 = (10/3) mod 13 = (10.9) mod 13 = 12. • Result is obtained by post scaling: (12/16) mod 13 = (12/3) mod 13 = 4. • Prescaling is Binary to RNS conversion: Successive multiplication by 2 and modulo reduction , (5.2) mod 13= 10, (10.2) mod 13 = (7.2) mod 13= 1, (1.2) mod 13 = 2. • Post scaling is another Montgomery step.
  • 29. • Montgomery step avoids modulo reduction. Only conditional addition. If LSB is 1 add modulus, ignore LSB. • Example (2.5/16) mod 13. • Four steps are needed. • Each step a partial product is added and result scaled by two. • 2 = 0010 (binary) • Computation of (0010)x5/16: • Formula: (old value+ bix5)/2 • Old value =0. • (0+0.5)/2= 0 • (0+1x5)/2 = (5+13)/2 = 9 since LSB of current result in brackets is 1. • (9+0.5)/2 = (9+13)/2 = 11 • (11+0.5)/2 = (11+13)/2 = 12. • Addition of two numbers using a (n+1)-bit CPA, n AND gates, n Flip-flops •
  • 30. Higher Radix Montgomery’s Technique • Higher Radix possible. • 16 or 8 or 4 bits at a time can be considered. • Example considering 4 bits at a time: • Consider [(10001100)/16] mod 23 • Find (-1/23) mod 16=(-1/7) mod16 = 9 ((-1/mi) mod 2k ) • Find 10001100 mod 16 = four LSBs= 12 (X mod 2k ) • Find (12x9) mod 16 = 12 α= [(-X/mi) mod 2k] • Find 10001100+12(23) = 11010 0000 (X+ αmi) • Ignore last 4 bits to get 26. (X+ αmi)/2k • Need a multiplier mod 16 to get the multiple to be added. • Then addition of shifted versions of modulus (in this case of radix 16, four shifted versions) using a CASA tree followed by CPA.
  • 31. Popular Powers-of-two related moduli set • (2n -1, 2n , 2n +1) • Dynamic range <3n bits. • Example 16 bit DSP needs n = 6; RNS {63,64,65} • RNS to binary conversion using CRT can be done very fast. • .
  • 32. • The beauty is these are powers of two related facilitating easy implementation.                    1 2 1 2 2 mod 1 2 2 1 2 2 1 1 2 1 2 1 2 1 2 1 1 2 2 1 2 2 1 3 1 2 1 1 1                                          n n n n n m n n m n n m x n n x n n x n n B     2 1 2 mod 1 2 2 1 1             n n n n      1 2 mod 1 2 1 2 1             n n n     1 2 1 2 mod 1 2 2 1 1            n n n n The various multiplicative inverses used above are as follows:
  • 33. • Example {7,8,9} • [(32+4)x1-8x2+(36-1)x3] mod 63 yields 6 MSBs               1 2 1 2 2 mod 1 2 2 ) 1 2 ( 1 2 1 2 1 2 2 2 1 3 2 1 1             n n n n n n n n n n n x x x B            1 2 1 2 2 mod 1 2 2 ) 1 2 ( 2 1 2 2 2 ) ( 1 3 2 2 1 1 2                  n n n n n n n n n n x x x x B           1 2 mod 1 2 ) 1 2 ( 2 1 2 2 2 ) ( 2 1 3 2 1 1 2           n n n n n n n x x x x B Subtract x2 from both sides Divide by 2n to get 2n MSBs of the result as
  • 34. Realization • Andraros and Ahmad : Four 2n-bit words to be added using two levels of Adders of rotated bits. • Piestrak suggested using CSA two level with CPA using end around carry for adding four 2n-bit words • Delay - (4n+2) DFA, Area = (6n) AFA • Suggested Low delay version (2n+2) DFA+DMUX also, 2n A2:1MUXes needed. • Dhurkadas (NPOL, Cochin) suggested simplification to three 2n-bit inputs to be added • Delay – (4n+2) DFA, Area = (4n) AFA • Bhardwaj, Premkumar, Srikanthan [1998] suggested using n- bit adders e.g Carry select adders n-bit • Wang et al [2002] 2n-bit as well as n-bit adders three converters.
  • 35. {7,8,9} example (x1,x2,x3)           1 2 mod 1 2 ) 1 2 ( 2 1 2 2 2 ) ( 2 1 3 2 1 1 2           n n n n n n n x x x x B x1, x2 3 bit, x3 4 bit x12x11x10, x22x21x20, x33x32x31x30       1 2 mod ) 1 2 2 ( 2 ) 2 2 ( 2 ) ( 2 1 1 2 3 2 1 1 1 2 2             n n n n n n n x x x x B •[(32+4)x1-8x2+(36-1)x3] mod 63 : x10 x12 x11 x10 x12 x11 x22′ x21′ x20′ 1 1 1 X3x x32 x31 x3x x32 x31 1 1 x33 ′ x32 ′ x31′ x30 ′ X3x= x30+x33 since either x30 or x33 exist Dhurkadas Simplified as x10 x12 x11 x10 x12 x11 x22′ x21′ x20′ y x31′ x30 ′ X3x x32 x31 x30 x32 x31 Y= (x33+x32)′
  • 36. Other three, Four and Five moduli sets • {2n ,2n -1,2n-1 -1} Hiasat and Abdel-Aty-Zohdy, Wang, Wang, Swamy and Ahmad: not better than popular moduli set, multipliers etc are simpler • {2n ,2n -1,2n+1 -1} Ananda Mohan better in area or time, multipliers are simpler • {2n ,22n -1,22n +1} Ananda Mohan better than Cao et al four moduli set, one large modulus • {2n ,2n -1,2n +1, 2n+1 -1 } Vinod and Premkumar • {2n ,2n -1,2n +1, 2n+1 -1 } Bhardwaj, Srikanthan, Ananda Mohan and Premkumar Area and Time intensive • {2n ,2n -1,2n +1, 22n +1} Cao et al better than other four moduli sets but one modulus bigger in size. • {2n -3,2n -1,2n +1,2n +3} Sheu et al uses ROM not attractive • {2n-1 -1, 2n -1,2n ,2n +1,2n+1 -1} Cao et al 2007 Increases cardinality to 5, DR of 5n bits but RNS to Binary conversion is slower/area consuming
  • 38. • M2 {2k ,2k -1,2k-1 -1}, M1{2k -1,2k ,2k +1}, • M4{2k ,2k -1,2k+1 -1}, M3{2k -1,2k ,2k +1,2k+1 -1} Comparison of various converters for three moduli sets Converter Moduli set FA HA AND /OR XOR /XNOR Other Delay [8] M2 6n-1 3n-7 ---- ----- (n-1) MUX 4nDFA [5] M1 6n+1 ---- n+3 n+1 2n MUX (n+2)DFA +DMUX [3,4] M1 4n --- 2 --- ---- (4n+1)DFA [6] CI M1 4n 1 ----- 1 2 MUX (4n+1)DFA [6] CII M1 6n 1 1 1 (2n+2) MUX (n+1)DFA [6] CIII M1 4n 1 (2n+2) (2n-1) (2n+2) MUX (n+1)DFA Converter I M4 4n+3 --- n n ----- (6n+5)DFA Converter II M4 14n+21 2n+3 --- --- (2n+1) 3:1MUX (2n+7)DFA Converter III M4 12n+19 2n+2 --- --- 10(2n+1)AROM (2n+1) 2:1MUX (2n+7)DFA [9] M3 37n+14 -- -- -- -- (14n+8)DFA [12,13] 4-stage CE M3 n2 /2+11n+4 1 -- -- 2 MUX (11n+l+8)DFA
  • 39. Base Extension • Needed in scaling or division. • Uses MRC fist to divide followed by base extension. • CRT can be used but is cumbersome. Example: {3,5,7} 52= (1,2,3) Scale by 7 3 5 7 1 2 3 -3 -3 1 4 x1 x3 1 2 2 First Base Extension step -2 2 X2 1 +(1x5)mod 7 Base Extension step 0
  • 40. RSA using RNS/ECC • Needs computation of PQ mod N • e.g 1023 mod 37 = (1016 )(104 )(102 )(101 ) mod 37 • Successive squaring mod 37 and Multiplications mod 37 of selected results. • Needs (XY) mod N ass basic step where X,Y,N are 1024 bit numbers. • RNS can be used. • Montgomery technique has been used to find (X′Y′/M) mod N where M is the product of Moduli in RNS. • Needs two RNS dynamic ranges M and M′ which are mutually Prime and a redundant modulus • Determine q such that (X′Y′+qN) is a multiple of M. • Extend q to RNS with Dynamic range M′. • Find r = (X′Y′+qN)/M in second RNS • Do base extension to First RNS
  • 41. Sign Detection and Comparison • Is difficult • Needed to go to Binary number to detect sign • Comparison is also difficult Needed to go to Binary numbers or sequential techniques such as comparing Mixed Radix Digits.
  • 42. Applications • FIR Filters (ensure that RNS dynamic range is larger than that of the filter) • Digital Frequency Synthesis • Video Filters • 2-D filters • NTTs (Number Theoretic Transforms) • Cryptography
  • 43. Applications of RNS • [5] Freking, W.L., and Parhi, K.K., "Low-power FIR digital filters using residue arithmetic, " in Conf. Record 31st Asil. Conf. Signals, Syst. and Comput. (ACSSC 1997), vol. 1, Pacific Grove, CA USA [1997], 739-43. • [6] D'Amora, A. et al., "Reducing power dissipation in complex digital filters by using the quadratic residue number system, " in Conf. Record 34th Asil. Conf. Signals, Syst. Comput. (ACSSC 2000), vol. 2, Pacific Grove, CA USA [2000], 879-83. • [7] Cardarilli, G.C. et al., "Low-power implementation of polyphase filters in Quadratic Residue Number system," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 2004), vol. 2, Vancouver, BC, Canada [2004], 725-728. • [8] Shanbag, N.R., and Siferd, R.E., A single-chip pipelined 2-D FIR filter using residue Arithmetic, IEEE JSSC -26[1991], 796-805. • [9] Tuukka Toivonen., and Janne Heikkilä., Video Filtering With Fermat Number Theoretic Transforms Using Residue Number System, IEEE CSVT-16[2006], 128-135. • [10] Schwemmlein, J., and Posch, K.C., Reinhard Posch. RNS-modulo reduction upon a restricted base value set and its applicability to RSA cryptography, Computer & Security [1998], 17, 637-650. • [11]Hanae Nozaki., Masahiko Motoyama., Atsushi Shimbo., and Shinichi Kawamura., Implementation of RSA algorithm based on RNS Montgomery multiplication, In C. Paar (ed). Cryptographic Hardware and Embedded Systems – CHES, Springer-Verlag, Berlin, Germany [2001], 364-376.
  • 44. • [12] Jean-Claude Bajard., Laurent Stephane Didier., Peter Kornerup., An RNS Montgomery modular multiplication Algorithm, IEEE C-47 [1998], 766-776. • [13] Jean-Claude Bajard., and Laurent Imbert., A Full RNS Implementation of RSA, IEEE C-53[2004],769-774. • [14] Schinianakis, D.M., Kakarountas. A.P., and Stouraitis. T., A New Approach to Elliptic Curve Cryptography: an RNS Architecture, IEEE MELECON, May 16-19, Benalmádena (Málaga), Spain [2006], 1241- 1245. • [15] Lie-Liang Yang., and Lajos Hanzo., A Residue Number System Based Parallel Communication Scheme Using Orthogonal Signaling: Part I—System Outline, IEEE VT-51[2002],1534-1546. • [16] Chaves, R., and Sousa, L., “RDSP: A RISC DSP based on residue number system,” in Proc. Euro. Symp. Digital System Design: Architectures, Methods, and Tools, Antalya, Turkey [2003], 128-135. • [17] Wei, W. et al., "RNS application for digital image processing," in 4th IEEE Int. Workshop Syst.-on-Chip for Real Time Applications, Banff, Alta., Canada [2004],77-80.
  • 45. Conclusion • Very mature today • Can be used in place of Custom DSP blocks • Research on newer moduli sets with high cardinality and Faster Reverse Conversion is of interest