SlideShare a Scribd company logo
ISSN: 2277 – 9043
                                                         International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                       Volume 1, Issue 2, April 2012




             Tripartite Modular Multiplication using
                   Toom-Cook Multiplication
                                                      Amar Mandal, Rupali Syal


                                                                          compute modular multiplication these method also execution
   Abstract— Modular multiplication is the fundamental                     in parallel way.
operation in most public-key cryptosystem. Therefore, the
efficiency of modular multiplication directly affects the
                                                                           The proposed modular multiplication algorithm that
efficiency of whole crypto-system. This paper presents an                  efficiently integrates three existing algorithms, Barrett
efficient modular multiplication algorithm for large integer.              modular multiplication, Montgomery modular multiplication
The proposed algorithm integrates with three existing                      and Toom-Cook multiplication, this proposed algorithm
algorithm, Barrett Algorithm and Montgomery algorithm for                  divide into two step multiplication and modular
modular      multiplication,   Toom-Cook        algorithm     for          multiplication step. Multiplication step Toom-cook algorithm
multiplication. This algorithm execution done in parallel way so
that enhance the performance. These algorithms Analysis with
                                                                           is used. Modular multiplication step Barrett and Montgomery
respect to their performance and compare to other modular                  algorithms are used in parallel way. The proposed algorithm
multiplication algorithms.                                                 minimizes the number of single-precision multiplication
   Index Terms— Barrett algorithm, Bipartite modular                       enable more than three way parallel computation.
multiplication,     Karatsuba      multiplication     algorithm,               The remainder of this paper is structured as follows.
Montgomery algorithm, Toom-Cook multiplication, Tripartite                 Section 2 describes Barrett algorithm, Montgomery
modular multiplication.                                                    algorithm, Bipartite algorithm and Tripartite algorithm. In
                                                                           Section 3, our proposed algorithm is introduced. Software
                                                                           implementation results are introduced in Section 4. Section 5
                        I. INTRODUCTION                                    concludes the paper.
Public Key Cryptography (PKC) introduced by Diffie and
Hellman [1] in the mid-1970s. Many cryptographic protocols,
such as the RSA scheme [2], ElGamal [3], Diffie-Hellman key                                      II. RELATED WORK
exchange, and DSA [4], are based on modular arithmetic
operations.                                                                These algorithms for modular multiplication are described
                                                                           for use with large nonnegative integers expressed in radix b
    The efficiency of a particular cryptosystem will depend on             notation, where b can be any integer ≥ 2. Given a modulus M
a number of factors, such as parameter size, time-memory                   and two elements X, Y ∈ ZM where ZM is the ring of integers
tradeoffs, available processing power, parallel computing,                 modulo M we define the ordinary modular multiplication as
software and/or hardware optimization, and mathematical                    XY mod M
algorithms. An efficient implementation of this operation is               Mathematical representation of X, Y and M is inputs of
the key to high performance. A basic operation in public key
                                                                           modular multiplication algorithms.
cryptosystems is the modular multiplication of large numbers.
                                                                                k 1
                                                                            M i 0 mibi 0<mk-1 < b and 0 ≤ mi <b, for i=0,1,…..,k-1
    This paper deals with different modular multiplication
                                                                            X  i 0 xibi 0<xk-1 < b and 0 ≤ xi <b, for i=0,1,…..,k-1
algorithms namely Barrett algorithm [11], Montgomery                                  k 1

algorithm [9], Bipartite algorithm [5], Tripartite algorithm
                                                                            Y  i 0 yibi 0<yk-1 < b and 0 ≤ yi <b, for i=0,1,…..,k-1
                                                                                     k 1
[19] and purposed algorithm. Barrett and Montgomery
algorithms are widely used today. Barrett algorithm output in
this algorithm is (X.Y)modM and this algorithm required
preprocessed value. Montgomery modular multiplication                       These algorithms for performing the modular multiplication
algorithm output is (X.Y)R-1modM and also required                         and analyze their time and space requirements. . The analysis
preprocessed value . Bipartite modular multiplication                      is performed by counting the total number of multiplications,
integrates Barrett and Montgomery method these methods                     additions, subtractions, and memories read and write
execution in parallel way. Tripartite modular multiplication               operations in terms of the input size parameter k. They are
use Karatsuba multiplication for multiplication of two large               counted to calculate the proportion of the memory access
number and two efficient Barrett and Montgomery algorithms                 time in the total running time of the modular multiplication
                                                                           algorithm. In our analysis, loop establishment and index
.
                                                                           computations are not taken into account. The space analysis
   Amar Mandal, Department of Computer Science and Engineering, PEC        is performed by counting the total number of words used as
University of Technology, Chandigarh, India,                               the temporary space. However, the space required keeping
                                                                           the input and output values.
   Rupali Syal, Department of Information Technology, PEC University of
Technology, Chandigarh, India,


                                                                                                                                               100
                                                  All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                     International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                   Volume 1, Issue 2, April 2012

A. BARRETT MODULAR MULTIPLICATION                                      Montgomery reduction is isomorphic to the ordinary modular
                                                                       multiplication. The rationale behind the m-residue
P. Barrett [11] introduced the idea of estimating the quotient         transformation is the ability to perform a Montgomery
S/M, S=XY with operations that either are less expensive in            reduction (XR−1)modM for 0 ≤ X<RM in almost the same
time than a multiprecision division by M (viz., 2 divisions by         time as a multiplication. In this algorithm required one pre
a power of band a partial multiprecision multiplication), or           compute value M’=-M-1
can be done as a pre calculation for a given m (viz., U = b2k /
M, i.e., U is a scaled estimate of the modulus’ reciprocal).           MONTGOMERY MODULAR MULTIPLICATION
The estimate q of S/M is obtained by replacing the floating             ALGORITHAM

point divisions in q  
                                              
                            S / b 2 k t b 2 k / M                   Input: X=(x[k-1],x[-2],..x[1],x[0])b ,
                                                                       Y=(y[k-1],y[-2],..y[1],y[0])b,
                                                     by integer
                                       bt                            M=(m[k-1],m[-2],..m[1],m[0])b ,
                                                                       M’=(m’[k-1],m’[-2],..m’[1],m’[0])b, b≥2
divisions q   
               S / b b / M 
                     2k t    2k
                                                                       Output: XYR-1 mod M
          ˆ                              .
                         bt                                             1. S=XY;
                                                                          2. for (i = 0; i < k; i++) do {
This estimate will never be too large and, if k<t≤2k, the error
                                                                          3. ti= (Si M’0) mod b;
                          ˆ
is at most two: S/M−2 ≤ q ≤S/M, for k<t≤2k.                               4. S = S + tiMbi;
The best choice for t, resulting in the least single precision            5. }
multiplications and the smallest maximal error, is k+1, which             6. S = Sdiv bk;
                                                ˆ
also was Barrett’s original choice. An estimate r for S mod M             7. if (S ≥ M) then
is then given by r=x-qm, or, as r < bk+1 (if b>2), by                     8. S = S − M;
 ˆ
 r =((S)mod bk+1 −(qm)mod bk+1)mod bk+1, which means that
once again only a partial multiprecision multiplication is             This algorithm (with the slight improvement above) requires
needed. At most two further subtractions of mare required to           2k2+k multiplications, 4k2+4k+2 additions, 6k2+7k+2 reads,
obtain the correct remainder.                                          and 2k2+5k+1 writes, including the final multi-precision
                                                                       subtraction, and uses k + 3 words of memory space.
                                                                       The Montgomery representation of an integer X, denoted by
BARRETT MODULAR MULTIPLICATION                                         XMont, can be computed by performing a Montgomery
ALGORITHAM                                                             multiplication on X and R2, denoted by MontM(X,R2),
Input: X=(x[k-1],x[-2],..x[1],x[0])b,                                  resulting in XMont = MontM(X,R2) = (X·R2 ·R−1) mod M =
Y=(y[k-1],y[-2],..y[1],y[0])b,                                         (X·R) mod M. After computing the Montgomery
M=(m[k-1],m[-2],..m[1],m[0])b ,                                        multiplication of two operands in Montgomery
U=(u[k-1],u[-2],..u[1],u[0])b , b≥2                                    representation, the result is also in Montgomery
Output: XY mod M                                                       representation and can be converted back by multiplication
    1. S=XY;                                                           with R−1, which comes down to Montgomery multiplication
    2. q = ((S div bk−1)U) div bk+1;                                   with 1.Computation of the result:
                                                                       T = MontM(XMont, Y ) = (X · R · Y · R−1) mod M
    3. S = Smod bk+1 − (qM) mod bk+1;
                                                                                = (X · Y ) mod M .
    4. if (S < 0) then
                                                                       This means that two Montgomery multiplications are needed
    5. S = S + bk+1;
                                                                       for one modular multiplication. That is why the use of
    6. while (S ≥ M) do
                                                                       Montgomery multiplication is only interesting when many
    7. S = S − M;
                                                                       consecutive modular multiplications need to be performed.
This Algorithm requires 3k2 multiplications, 6k2+k+1
                                                                       C. BIPARTITE MODULAR MULTIPLICATION
additions, 9k2+2k+2 reads, 3k2+4k writes and uses
2k+1words of memory space.                                             In Bipartite Modular multiplication both Barrett and
                                                                       Montgomery algorithms are used in this algorithm X is
                                                                       dividing in to two parts upper parts calculate using
B. MONTGOMERY MODULAR MULTIPLICATION                                   Montgomery algorithm and lower part calculate using Barrett
The Let R>M be an integer relatively prime to M such that              algorithm .The bipartite algorithm was introduced for the
computations modulo R are easy to process: R = bk . Notice             purpose of a two-way parallel computation [6]. It uses two
that the condition gcd(M, b)=1 means that this method cannot           custom modular multipliers, a Barrett modular multiplier and
be used for all moduli. In case b is a power of 2, it simply           a Montgomery multiplier, in order to improve the speed. By
means that m should be odd. The m-residue with respect to R            combining a Barrett modular multiplication with
of an integer X<M is defined as XRmod M. The set {XR mod                Montgomery modular multiplication, it splits the operand
M | 0 ≤ x<M} clearly forms a complete residue system. The              multiplier into two parts and processes them in parallel,
Montgomery reduction of X is defined as XR−1 mod M,                     increasing the calculation speed. Parallel execution of this
where R−1 is the inverse of R modulo m, and it is the inverse          method with the help of fork() system call in Linux operating
operation of the m-residue transformation. It can be shown             system. The calculation is performed using Montgomery
that the multiplication of two m-residues followed by                  residues defined by a modulus M and a Montgomery radix R,

                                                                                                                                           101
                                             All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                    International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                  Volume 1, Issue 2, April 2012

R < M. Next, we outline the main idea of the bipartite                11/4k2 +k multiplications, 11/2k2+21/2k+8 additions,
algorithm. Let R = bl for some 0 < l < k. Consider the                31/4k2+16k+10 reads, and 3/2k2+25/2k+6 writes 7k+5
multiplier Y to be split into two parts Y1 and Y0 so that Y =         subtraction and first multiplication step required 3/4k2
Y1R + Y0. Then, the Montgomery multiplication modulo M                multiplications, k2+2k additions, 9/4k2+4k reads, and
of the integers X and Y can be computed as follows:                   3/2k2+7/2k-4 writes k subtraction. The all above algorithms
 XYR-1 mod M                                                          they are slightly more number of operations in read, write,
          = X(Y1R + Y0)R-1 mod M                                      multiplication, subtraction and addition but this algorithm
          =((XY1 mod M) + (XY0R-1 mod M))mod M                        computes parallel way first and second parts using Barrett
          = BarrettM(X,Y) +MontM(X,Y)                                 algorithm third term using Montgomery algorithm, modular
let l =(k/2) than This algorithm (with the slight improvement         multiplication step of This algorithm execution in parallel
above) requires (5/2k2 ) multiplications, (5k2+5/2k+1)                way so that time consuming is less than other algorithm.
additions, (15/2k2+19/2k+5) reads, and (5/2k2+17/2k+3)
writes 5k+3 subtraction, and uses 2k + 1 words of memory
space. In this algorithm use both Montgomery and Barrett               III. THE PROPOSED MODULAR MULTIPLICATION
methods execution in parallel way so that enhance the speed.
D. TRIPARTITE MODULAR MULTIPLICATION                                     The proposed modular multiplication algorithm divides
Tripartite modular multiplication algorithm divides into two          into two step multiplication and modular multiplication.
step multiplication and modular multiplication, first                 Multiplication step using Toom-Cook algorithm and split in
multiplication step using Karatsuba algorithm and split in            five parts and in modular multiplication step compute of
three parts and second part modular multiplication part               these five parts by using Barrett and Montgomery modular
execution of these three parts by using Barrett and                   multiplication in parallel way.
Montgomery modular multiplication in parallel way.                       Multiplication        step     computes     by    Toom-Cook
The first multiplication step computes by Karatsuba                   multiplication algorithm. Given two large integers, X and Y,
algorithm. The Karatsuba algorithm is a fast multiplication           Toom–Cook splits up X and Y into t smaller parts each of
algorithm. It reduces the multiplication of two k-digit               length l, and performs operations on the parts, Toom-3 is
numbers     to   at   most  3k log2 3  3k 1.585 single-digit         only a single instance of the Toom–Cook algorithm,
                                            log 3
multiplications in general (and exactly k 2 when k is a               where t = 3.Toom-3 reduces 9 multiplications to 5, and runs
                                                                      in Θ(nlog(5)/log(3)), about Θ(n1.465). In general, Toom-t runs in
power of 2). It is therefore faster than the classical algorithm,
which requires k2 single-digit products.                              Θ(c(t) ne), where e = log(2t − 1) / log(t), ne is the time spent
The basic step of Karatsuba's algorithm is a formula that             on sub-multiplications, and c is the time spent on additions
allows us to compute the product of two large numbers X               and multiplication by small constants. The Karatsuba
and Y using three multiplications of smaller numbers, each            algorithm is a special case of Toom–Cook, where the number
with about half as many digits as X or Y, plus some additions         is split into two smaller ones. It reduces 4 multiplications to 3
and digit shifts. Let X and Y are represented as n-digit              and so operates at Θ(nlog(3)/log(2)), which is about Θ(n1.585).
strings in some base B. For any positive integer l less than k,       Ordinary long multiplication is equivalent to Toom-1, with
one can split the two given numbers as follows                        complexity Θ(n2).
R=Bl                                                                     In a typical large integer implementation, each integer is
X=X1R+X0                                                              represented as a sequence of digits in positional notation,
Y=Y1R+Y0                                                              with the base or radix set to some (typically large) value b, (in
Where X0 and Y0 are less than R. The product is then
                                                                      a computer implementation, b would typically be a power of
XY=( X1R+X0)( Y1R+Y0)
  =Z2R2 +Z1R+Z0                                                       2 instead). Say the two integers being multiplied are: The
Where                                                                 base B = bi, such that the number of digits of both m and n in
Z2=X1Y1                                                               base B is at most t (e.g., 3 in Toom-3). Then
Z0=X0Y0                                                               separate m and n into their base B digits mi, ni: Then use
Z1=X1Y0+X0Y1=(X1+X0) (Y1+Y0)-Z2-Z0                                    these       digits      as      coefficients    in    degree t−1
Karatsuba observed that XY can be computed in only three              polynomials p and q,          with the property that p(B)
multiplications, and few extra additions:                             = m and q(B) = n:
First step multiplication of number and splitting three parts            p(x)=m2x2+m1x+m0
Z0, Z1, Z2 Modular multiplication step compute this three                q(x)=n2x2+n1x+n0
parts as follows.
(XYR-1)mod M
                                                                        The purpose of defining these polynomials is that if
          = (Z2R2 +Z1R+Z0)R-1 mod M
          =(Z2R)mod M +Z1mod M +Z0R-1 mod M                           compute their product r(x) = p(x)q(x), our answer will
          =(Z2R)mod M +Z1mod M+X0Y0R-1 mod M                          be r(B) = m×n. In the case where the numbers being
          =BarrettM(Z2,R) + BarrettM(Z1,1)                            multiplied are of different sizes, it's useful to use different
          +MontM(X0,Y0)                                               values of t for m and n, which we'll call tm and tn.       The
          To obtain a high-speed implementation, one can              number of elementary operations (addition/subtraction) can
compute these three different terms in parallel. We take l=k/2        be reduced. Executed here over the first operand
for calculation. Modular multiplication step two Barrett              (polynomial p) of the running example is the following:
methods and one Montgomery requires. this step requires                 p0=m0+m2

                                                                                                                                          102
                                              All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                 International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                               Volume 1, Issue 2, April 2012

   p(0)=m0
   p(1)=p0+m1
   p(-1)=p0+m1                                                                                   IV. RESULT
   p(-2)= (p(−1) + m2)×2 − m0                                      Use After Software performance; Execution times for the
   p(∞)=m2                                                         modular multiplication of a 2k-digit number modulo a k-digit
   This sequence requires five addition/subtraction                modulus M for the five modular multiplication algorithms
operations, one less than the straightforward evaluation. In       compared to the execution time of a k × k-digit multiplication
practical implementations, as the operands become smaller,         (b = 23, on a 1.73 GHz Intel Celeron R based PC with gcc
the algorithm will switch to the Schoolbook long                   4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)).
multiplication. Letting r be the product polynomial:
                                                                                                  RESULT
                                                                  Lenth                     Times in milliseconds
                                                                  of M in
   r(0)=p(0)q(0)                                                  bits
                                                                            Barre   Montgomery    Bipartite   Tripartite    Proposed
                                                                            tt                                (Karatsuba)   Algorithm
   r(1)=p(1)q(1)                                                  128       8       9             22          30            47
   r(-1)=p(-1)q(-1)                                               256       42      50            11          50            52
   r(-2)=p(-2)q(-2)                                               512       61      59            20          58            58
   r(∞)=p(∞)q(∞)                                                  1024      94      105           89          60            74
    A difficult design challenge in Toom–Cook is to find an       2048      173     143           153         120           91
efficient sequence of operations to compute this product; one
sequence given by Bodrato[14] for Toom-3 is the following.
   r0=r(0)                                                            These observations are confirmed by a software
   r4=r(∞)                                                         implementation of these algorithms, see in Table. The
   r3=(r(−2) − r(1))/3                                             implementation is written in ANSI C [4] and hence should be
   r1=(r(1) − r(−1))/2                                             portable to any computer for which an implementation of the
   r2=r(−1) − r(0)                                                 ANSI C standard exists. All figures in this article are
   r3=(r2 − r3)/2 + 2r(∞)                                          obtained on a 1.73 GHz Intel Celeron R based PC using the
   r2=r2 + r1 − r(∞)                                               32-bit compiler gcc 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4).
   r1=r1 − r3                                                      Parallel execution of Bipartite, Tripartite and Proposed
 now product polynomial r,                                         Algorithm with the help of fork() system call in Linux
   r(x)=r0+r1x+r2x2+r3x3+r4x4                                      operating system.
   Finally, evaluate r(B) to obtain our final answer. This is
straightforward since B is a power of b and so the
multiplications by powers of B are all shifts by a whole                                  V. CONCLUSIONS
number of digits in base b.
   r(B)=r0+r1B+r2B2+r3B3+r4B4                                         This paper discusses various algorithms for modular
   First step multiplication of number and splitting parts,        multiplication of large numbers and evaluated them with
Modular multiplication step compute these parts as follows.        respect to their accuracy, computation performance and
   let B=R                                                         efficiency. Each algorithm has its own features suitable for a
   r(R)= r0+r1R+r2R2+r3R3+r4R4                                     specific field of application. No single algorithm provides a
                                                                   perfect solution to meet all demands; depending on the
   r(R)R-2 = (r0+r1R+r2R2+r3R3+r4R4)R-2                            environment in which computation are to be performed, one
   = r0R-2+r1R-1+r2+r3R1+r4R2                                      algorithm may be preferable over another. Barrett and
   Modular multiplication with M in both sides as follows:         Montgomery Algorithm are efficient for smaller modular
   (r(R)R-2)modM= (r0R-2+r1R-1+r2+r3R1+r4R2)modM                   multiplication but for large modular multiplication tripartite
   =MontM(MontM(r0,1),1)+MontM(r1,1)+                              and proposed Algorithm are efficient as shown in result.
BarrettM(r2,1)+BarrettM(r3,R)+BarrettM(r4,R2)                         The future work would be to use the Schönhage–Strassen
   To obtain a high-speed implementation, one can compute          algorithm for the multiplication step instead of Toom-Cook’s
these five different terms in parallel. where B=bi for             method in proposed algorithm.
calculation of this algorithm we take i=k/3, Second modular
multiplication part having three Barrett algorithms and three
Montgomery          algorithms       requires     29/18k2+2k
                       2                          2
multiplications, 29/3k +11k+12 additions, 17/6k +34/3k+16
reads, and 17/18k2+25/3k+10 writes 11/3k+8 subtraction. In
the above all algorithms they are slightly difference in there
number of operations in read, write, multiplication,
subtraction and addition, this algorithm execution in parallel
way so that time consuming is less than other algorithms.
    .
                                                                                                                                        103
                                            All Rights Reserved © 2012 IJARCSEE
ISSN: 2277 – 9043
                                                            International Journal of Advanced Research in Computer Science and Electronics Engineering
                                                                                                                          Volume 1, Issue 2, April 2012




                          REFERENCES
 [1] W. Diffie and M.E. Hellman, “New Directions in Cryptography,”
          IEEE Trans. Information Theory, vol. IT-22, no. 6, pp. 644-654,
          Nov. 1976
 [2] R.L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining
          Digital Signatures and Public-Key Cryptosystems,” Comm.
          ACM, vol. 21, no. 2, pp. 120-126, Feb. 1978.
 [3] T. ElGamal, “A Public Key Cryptosystem and a Signature Scheme
          Based on Discrete Logarithms,” IEEE Trans. Information
          Theory, vol. 31, no. 4, pp. 469-472, July 1985.
 [4] ANSI X9.30, Public Key Cryptography for the Financial Services
          Industry: Part 1: The Digital Signature Algorithm (DSA), Am.
          Nat’l Standards Inst., Am. Bankers Assoc., 1997.
 [5] Marcelo E. Kaihara and Naofumi Takagi, “Bipartite Modular
          Multiplication Method” IEEE Transactions on Computers, vol.
          57, no. 2, pp. 157-164, Feb. 2008
 [6] M. E. Kaihara and N. Takagi. Bipartite Modular Multiplication. In J.
          R. Rao and B. Sunar, editors, Proceedings of 7th International
          Workshop on Cryptographic Hardware and Embedded Systems
          (CHES), number 3659 in Lecture Notes in Computer Science.
          Springer-Verlag, 2005
 [7] G.R. Blakley, “A Computer Algorithm for Calculating the Product AB
          Modulo M,” IEEE Trans. Computers, vol. 32, no. 5, pp. 497-500,
          May 1983.
 [8] E.F. Brickell, “A Fast Modular Multiplication Algorithm with
          Application to Two Key Cryptography,” Advances in Cryptology
          Proc. CRYPTO ’82, pp. 51-60, 1983.
 [9] P.L. Montgomery, “Modular Multiplication without Trial Division,”
          Math. Computation, vol. 44, no. 170, pp. 519-521, Apr. 1985.
[10] K.R. Sloan, “Comments on a Computer Algorithm for Calculating the
          Product AB Modulo M,” IEEE Trans. Computers, vol. 34, no. 3,
          pp. 290-292, Mar. 1985.
[11] P.D. Barrett, “Implementing the Rivest Shamir and Adleman public
          key encryption algorithm on a standard digital signal processor,”
          Advances in Cryptology, Proc. Crypto’86, LNCS 263, A.M.
          Odlyzko, Ed., Springer-Verlag, pp. 311–323, 1987.
[12] Bosselaers, R. Govaerts, and J. Vandewalle, "Comparison of Three
          Modular Reduction Functions," Proc. CRYPTO'93, pp.175-186.
[13] Menezes, J., van Oorschot, P. C., and Vanstone, S. A., "Handbook of
          Applied Cryptology," chapter 14.3.3, pp. 603-604.
[14] Marco Bodrato. Towards Optimal Toom–Cook Multiplication for
          Univariate and Multivariate Polynomials in Characteristic 2 and
          0. In WAIFI'07 proceedings, volume 4547 of LNCS, pages
          116–133. June 21–22, 2007
[15] A. Toom. The Complexity of a Scheme of Functional Elements
          Realizing the Multiplication of Integers.Translations of Dokl.
          Adad. Nauk. SSSR, 3, 1963.
[16] A. Karatsuba and Y. Ofman. Multiplication of Many-Digital Numbers
          by Automatic Computers. Translation in Physics-Doklady,
          145:595-596, 7 1963.
[17] N. Koblitz. Elliptic Curve Cryptosystem. Math. Comp., 48:203-209,
          1987.
[18] Ç.K. Koç, T. Acar, and BS Kaliski, “Analyzing and Comparing
          Montgomery Multiplication Algorithms,” IEEE Micro, vol. 16,
          no. 3, pp. 26-33, June 1996.
[19] Kazuo Sakiyama, Miroslav Knezevic, Junfeng Fan, Bart Preneel, and
          Ingrid Verbauwhede. Tripartite modular multiplication.
          Integration, 44(4):259 269, 2011
[20] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. i-hop
          homomorphic encryption and rerandomizable yao circuits. In Tal
          Rabin, editor, CRYPTO, volume 6223 of Lecture Notes in
          Computer Science, pages 155 172. Springer, 2010.




                                                                                                                                                  104
                                                     All Rights Reserved © 2012 IJARCSEE

More Related Content

PDF
On Fixed Point error analysis of FFT algorithm
PDF
F011123134
PDF
An Efficient Multiplierless Transform algorithm for Video Coding
PDF
Fk3110791084
PDF
An advancement in the N×N Multiplier Architecture Realization via the Ancient...
PDF
Novel Methods of Generating Self-Invertible Matrix for Hill Cipher Algorithm.
PDF
Low Power 32×32 bit Multiplier Architecture based on Vedic Mathematics Using ...
PDF
Design of Low Power Vedic Multiplier Based on Reversible Logic
On Fixed Point error analysis of FFT algorithm
F011123134
An Efficient Multiplierless Transform algorithm for Video Coding
Fk3110791084
An advancement in the N×N Multiplier Architecture Realization via the Ancient...
Novel Methods of Generating Self-Invertible Matrix for Hill Cipher Algorithm.
Low Power 32×32 bit Multiplier Architecture based on Vedic Mathematics Using ...
Design of Low Power Vedic Multiplier Based on Reversible Logic

What's hot (20)

PDF
N046018089
PDF
Design of Efficient 4×4 Quaternary Vedic Multiplier Using Current-Mode Multi-...
PDF
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
PDF
Ak04605259264
PDF
IRJET- Chatbot Using Gated End-to-End Memory Networks
PDF
Ik2515011504
PDF
Eg25807814
PDF
Error control coding using bose chaudhuri hocquenghem bch codes
PDF
Ge3611231125
PDF
Parallel Hardware Implementation of Convolution using Vedic Mathematics
PDF
I1035563
PDF
PDF
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
PDF
Transformation and dynamic visualization of images from computer through an F...
PDF
A detection technique of signal in mimo system
PDF
A detection technique of signal in mimo system
PDF
Efficient implementation of bit parallel finite
PDF
Efficient implementation of bit parallel finite field multipliers
PDF
Efficient video compression using EZWT
PDF
Analysis of various mcm algorithms for reconfigurable rrc fir filter
N046018089
Design of Efficient 4×4 Quaternary Vedic Multiplier Using Current-Mode Multi-...
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
Ak04605259264
IRJET- Chatbot Using Gated End-to-End Memory Networks
Ik2515011504
Eg25807814
Error control coding using bose chaudhuri hocquenghem bch codes
Ge3611231125
Parallel Hardware Implementation of Convolution using Vedic Mathematics
I1035563
Implementation of Vedic Multiplier in Image Compression Using Discrete Wavele...
Transformation and dynamic visualization of images from computer through an F...
A detection technique of signal in mimo system
A detection technique of signal in mimo system
Efficient implementation of bit parallel finite
Efficient implementation of bit parallel finite field multipliers
Efficient video compression using EZWT
Analysis of various mcm algorithms for reconfigurable rrc fir filter
Ad

Viewers also liked (9)

PDF
34 107-1-pb
PDF
33 102-1-pb
PDF
48 144-1-pb
PDF
PDM Built Portfolio
PDF
46 138-1-pb
PDF
Concept rapport derde deskundige in de zaak gorrissen & van de zande london v...
PDF
PDM Portfolio
DOC
Reasonable Expectations
PDF
Hotels &amp; Resorts Projects
34 107-1-pb
33 102-1-pb
48 144-1-pb
PDM Built Portfolio
46 138-1-pb
Concept rapport derde deskundige in de zaak gorrissen & van de zande london v...
PDM Portfolio
Reasonable Expectations
Hotels &amp; Resorts Projects
Ad

Similar to 38 116-1-pb (20)

PDF
An Area-efficient Montgomery Modular Multiplier for Cryptosystems
PDF
Analysis of GF (2m) Multiplication Algorithm: Classic Method v/s Karatsuba-Of...
PDF
Modified montgomery modular multiplier for cryptosystems
PDF
The International Journal of Engineering and Science
PDF
A compact FPGA-based montgomery modular multiplier
PDF
PDF
IRJET- VLSI Architecture for Montgomery Modular Multiplication
PDF
Hz3115131516
PDF
Hz3115131516
PDF
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
PDF
Bs25412419
PPT
Datapath subsystem multiplication
PDF
Justification of Montgomery Modular Reduction
PDF
Analysis of different multiplication algorithm and FPGA implementation of rec...
PDF
Compare Efficiency of Different Multipliers Using Verilog Simulation & Modify...
PDF
Iceei2011 marisa br_fajar_intan_kuspriyanto revision 1
PDF
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
PDF
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
PDF
Design of Wallace Tree Multiplier using 45nm Technology
An Area-efficient Montgomery Modular Multiplier for Cryptosystems
Analysis of GF (2m) Multiplication Algorithm: Classic Method v/s Karatsuba-Of...
Modified montgomery modular multiplier for cryptosystems
The International Journal of Engineering and Science
A compact FPGA-based montgomery modular multiplier
IRJET- VLSI Architecture for Montgomery Modular Multiplication
Hz3115131516
Hz3115131516
SCS-MCSA- Based Architecture for Montgomery Modular Multiplication
Bs25412419
Datapath subsystem multiplication
Justification of Montgomery Modular Reduction
Analysis of different multiplication algorithm and FPGA implementation of rec...
Compare Efficiency of Different Multipliers Using Verilog Simulation & Modify...
Iceei2011 marisa br_fajar_intan_kuspriyanto revision 1
High Performance Baugh Wooley Multiplier Using Carry Skip Adder Structure
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
Design of Wallace Tree Multiplier using 45nm Technology

More from Mahendra Sisodia (11)

PDF
47 141-1-pb
PDF
45 135-1-pb
PDF
43 131-1-pb
PDF
42 128-1-pb
PDF
41 125-1-pb
PDF
37 112-1-pb
PDF
32 99-1-pb
PDF
27 122-1-pb
PDF
24 83-1-pb
PDF
23 79-1-pb
PDF
20 74-1-pb
47 141-1-pb
45 135-1-pb
43 131-1-pb
42 128-1-pb
41 125-1-pb
37 112-1-pb
32 99-1-pb
27 122-1-pb
24 83-1-pb
23 79-1-pb
20 74-1-pb

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Monthly Chronicles - July 2025
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...

38 116-1-pb

  • 1. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 Tripartite Modular Multiplication using Toom-Cook Multiplication Amar Mandal, Rupali Syal  compute modular multiplication these method also execution Abstract— Modular multiplication is the fundamental in parallel way. operation in most public-key cryptosystem. Therefore, the efficiency of modular multiplication directly affects the The proposed modular multiplication algorithm that efficiency of whole crypto-system. This paper presents an efficiently integrates three existing algorithms, Barrett efficient modular multiplication algorithm for large integer. modular multiplication, Montgomery modular multiplication The proposed algorithm integrates with three existing and Toom-Cook multiplication, this proposed algorithm algorithm, Barrett Algorithm and Montgomery algorithm for divide into two step multiplication and modular modular multiplication, Toom-Cook algorithm for multiplication step. Multiplication step Toom-cook algorithm multiplication. This algorithm execution done in parallel way so that enhance the performance. These algorithms Analysis with is used. Modular multiplication step Barrett and Montgomery respect to their performance and compare to other modular algorithms are used in parallel way. The proposed algorithm multiplication algorithms. minimizes the number of single-precision multiplication Index Terms— Barrett algorithm, Bipartite modular enable more than three way parallel computation. multiplication, Karatsuba multiplication algorithm, The remainder of this paper is structured as follows. Montgomery algorithm, Toom-Cook multiplication, Tripartite Section 2 describes Barrett algorithm, Montgomery modular multiplication. algorithm, Bipartite algorithm and Tripartite algorithm. In Section 3, our proposed algorithm is introduced. Software implementation results are introduced in Section 4. Section 5 I. INTRODUCTION concludes the paper. Public Key Cryptography (PKC) introduced by Diffie and Hellman [1] in the mid-1970s. Many cryptographic protocols, such as the RSA scheme [2], ElGamal [3], Diffie-Hellman key II. RELATED WORK exchange, and DSA [4], are based on modular arithmetic operations. These algorithms for modular multiplication are described for use with large nonnegative integers expressed in radix b The efficiency of a particular cryptosystem will depend on notation, where b can be any integer ≥ 2. Given a modulus M a number of factors, such as parameter size, time-memory and two elements X, Y ∈ ZM where ZM is the ring of integers tradeoffs, available processing power, parallel computing, modulo M we define the ordinary modular multiplication as software and/or hardware optimization, and mathematical XY mod M algorithms. An efficient implementation of this operation is Mathematical representation of X, Y and M is inputs of the key to high performance. A basic operation in public key modular multiplication algorithms. cryptosystems is the modular multiplication of large numbers. k 1 M i 0 mibi 0<mk-1 < b and 0 ≤ mi <b, for i=0,1,…..,k-1 This paper deals with different modular multiplication X  i 0 xibi 0<xk-1 < b and 0 ≤ xi <b, for i=0,1,…..,k-1 algorithms namely Barrett algorithm [11], Montgomery k 1 algorithm [9], Bipartite algorithm [5], Tripartite algorithm Y  i 0 yibi 0<yk-1 < b and 0 ≤ yi <b, for i=0,1,…..,k-1 k 1 [19] and purposed algorithm. Barrett and Montgomery algorithms are widely used today. Barrett algorithm output in this algorithm is (X.Y)modM and this algorithm required preprocessed value. Montgomery modular multiplication These algorithms for performing the modular multiplication algorithm output is (X.Y)R-1modM and also required and analyze their time and space requirements. . The analysis preprocessed value . Bipartite modular multiplication is performed by counting the total number of multiplications, integrates Barrett and Montgomery method these methods additions, subtractions, and memories read and write execution in parallel way. Tripartite modular multiplication operations in terms of the input size parameter k. They are use Karatsuba multiplication for multiplication of two large counted to calculate the proportion of the memory access number and two efficient Barrett and Montgomery algorithms time in the total running time of the modular multiplication algorithm. In our analysis, loop establishment and index . computations are not taken into account. The space analysis Amar Mandal, Department of Computer Science and Engineering, PEC is performed by counting the total number of words used as University of Technology, Chandigarh, India, the temporary space. However, the space required keeping the input and output values. Rupali Syal, Department of Information Technology, PEC University of Technology, Chandigarh, India, 100 All Rights Reserved © 2012 IJARCSEE
  • 2. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 A. BARRETT MODULAR MULTIPLICATION Montgomery reduction is isomorphic to the ordinary modular multiplication. The rationale behind the m-residue P. Barrett [11] introduced the idea of estimating the quotient transformation is the ability to perform a Montgomery S/M, S=XY with operations that either are less expensive in reduction (XR−1)modM for 0 ≤ X<RM in almost the same time than a multiprecision division by M (viz., 2 divisions by time as a multiplication. In this algorithm required one pre a power of band a partial multiprecision multiplication), or compute value M’=-M-1 can be done as a pre calculation for a given m (viz., U = b2k / M, i.e., U is a scaled estimate of the modulus’ reciprocal). MONTGOMERY MODULAR MULTIPLICATION The estimate q of S/M is obtained by replacing the floating ALGORITHAM point divisions in q       S / b 2 k t b 2 k / M  Input: X=(x[k-1],x[-2],..x[1],x[0])b , Y=(y[k-1],y[-2],..y[1],y[0])b,  by integer  bt  M=(m[k-1],m[-2],..m[1],m[0])b , M’=(m’[k-1],m’[-2],..m’[1],m’[0])b, b≥2 divisions q     S / b b / M  2k t 2k Output: XYR-1 mod M ˆ .  bt  1. S=XY; 2. for (i = 0; i < k; i++) do { This estimate will never be too large and, if k<t≤2k, the error 3. ti= (Si M’0) mod b; ˆ is at most two: S/M−2 ≤ q ≤S/M, for k<t≤2k. 4. S = S + tiMbi; The best choice for t, resulting in the least single precision 5. } multiplications and the smallest maximal error, is k+1, which 6. S = Sdiv bk; ˆ also was Barrett’s original choice. An estimate r for S mod M 7. if (S ≥ M) then is then given by r=x-qm, or, as r < bk+1 (if b>2), by 8. S = S − M; ˆ r =((S)mod bk+1 −(qm)mod bk+1)mod bk+1, which means that once again only a partial multiprecision multiplication is This algorithm (with the slight improvement above) requires needed. At most two further subtractions of mare required to 2k2+k multiplications, 4k2+4k+2 additions, 6k2+7k+2 reads, obtain the correct remainder. and 2k2+5k+1 writes, including the final multi-precision subtraction, and uses k + 3 words of memory space. The Montgomery representation of an integer X, denoted by BARRETT MODULAR MULTIPLICATION XMont, can be computed by performing a Montgomery ALGORITHAM multiplication on X and R2, denoted by MontM(X,R2), Input: X=(x[k-1],x[-2],..x[1],x[0])b, resulting in XMont = MontM(X,R2) = (X·R2 ·R−1) mod M = Y=(y[k-1],y[-2],..y[1],y[0])b, (X·R) mod M. After computing the Montgomery M=(m[k-1],m[-2],..m[1],m[0])b , multiplication of two operands in Montgomery U=(u[k-1],u[-2],..u[1],u[0])b , b≥2 representation, the result is also in Montgomery Output: XY mod M representation and can be converted back by multiplication 1. S=XY; with R−1, which comes down to Montgomery multiplication 2. q = ((S div bk−1)U) div bk+1; with 1.Computation of the result: T = MontM(XMont, Y ) = (X · R · Y · R−1) mod M 3. S = Smod bk+1 − (qM) mod bk+1; = (X · Y ) mod M . 4. if (S < 0) then This means that two Montgomery multiplications are needed 5. S = S + bk+1; for one modular multiplication. That is why the use of 6. while (S ≥ M) do Montgomery multiplication is only interesting when many 7. S = S − M; consecutive modular multiplications need to be performed. This Algorithm requires 3k2 multiplications, 6k2+k+1 C. BIPARTITE MODULAR MULTIPLICATION additions, 9k2+2k+2 reads, 3k2+4k writes and uses 2k+1words of memory space. In Bipartite Modular multiplication both Barrett and Montgomery algorithms are used in this algorithm X is dividing in to two parts upper parts calculate using B. MONTGOMERY MODULAR MULTIPLICATION Montgomery algorithm and lower part calculate using Barrett The Let R>M be an integer relatively prime to M such that algorithm .The bipartite algorithm was introduced for the computations modulo R are easy to process: R = bk . Notice purpose of a two-way parallel computation [6]. It uses two that the condition gcd(M, b)=1 means that this method cannot custom modular multipliers, a Barrett modular multiplier and be used for all moduli. In case b is a power of 2, it simply a Montgomery multiplier, in order to improve the speed. By means that m should be odd. The m-residue with respect to R combining a Barrett modular multiplication with of an integer X<M is defined as XRmod M. The set {XR mod Montgomery modular multiplication, it splits the operand M | 0 ≤ x<M} clearly forms a complete residue system. The multiplier into two parts and processes them in parallel, Montgomery reduction of X is defined as XR−1 mod M, increasing the calculation speed. Parallel execution of this where R−1 is the inverse of R modulo m, and it is the inverse method with the help of fork() system call in Linux operating operation of the m-residue transformation. It can be shown system. The calculation is performed using Montgomery that the multiplication of two m-residues followed by residues defined by a modulus M and a Montgomery radix R, 101 All Rights Reserved © 2012 IJARCSEE
  • 3. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 R < M. Next, we outline the main idea of the bipartite 11/4k2 +k multiplications, 11/2k2+21/2k+8 additions, algorithm. Let R = bl for some 0 < l < k. Consider the 31/4k2+16k+10 reads, and 3/2k2+25/2k+6 writes 7k+5 multiplier Y to be split into two parts Y1 and Y0 so that Y = subtraction and first multiplication step required 3/4k2 Y1R + Y0. Then, the Montgomery multiplication modulo M multiplications, k2+2k additions, 9/4k2+4k reads, and of the integers X and Y can be computed as follows: 3/2k2+7/2k-4 writes k subtraction. The all above algorithms XYR-1 mod M they are slightly more number of operations in read, write, = X(Y1R + Y0)R-1 mod M multiplication, subtraction and addition but this algorithm =((XY1 mod M) + (XY0R-1 mod M))mod M computes parallel way first and second parts using Barrett = BarrettM(X,Y) +MontM(X,Y) algorithm third term using Montgomery algorithm, modular let l =(k/2) than This algorithm (with the slight improvement multiplication step of This algorithm execution in parallel above) requires (5/2k2 ) multiplications, (5k2+5/2k+1) way so that time consuming is less than other algorithm. additions, (15/2k2+19/2k+5) reads, and (5/2k2+17/2k+3) writes 5k+3 subtraction, and uses 2k + 1 words of memory space. In this algorithm use both Montgomery and Barrett III. THE PROPOSED MODULAR MULTIPLICATION methods execution in parallel way so that enhance the speed. D. TRIPARTITE MODULAR MULTIPLICATION The proposed modular multiplication algorithm divides Tripartite modular multiplication algorithm divides into two into two step multiplication and modular multiplication. step multiplication and modular multiplication, first Multiplication step using Toom-Cook algorithm and split in multiplication step using Karatsuba algorithm and split in five parts and in modular multiplication step compute of three parts and second part modular multiplication part these five parts by using Barrett and Montgomery modular execution of these three parts by using Barrett and multiplication in parallel way. Montgomery modular multiplication in parallel way. Multiplication step computes by Toom-Cook The first multiplication step computes by Karatsuba multiplication algorithm. Given two large integers, X and Y, algorithm. The Karatsuba algorithm is a fast multiplication Toom–Cook splits up X and Y into t smaller parts each of algorithm. It reduces the multiplication of two k-digit length l, and performs operations on the parts, Toom-3 is numbers to at most 3k log2 3  3k 1.585 single-digit only a single instance of the Toom–Cook algorithm, log 3 multiplications in general (and exactly k 2 when k is a where t = 3.Toom-3 reduces 9 multiplications to 5, and runs in Θ(nlog(5)/log(3)), about Θ(n1.465). In general, Toom-t runs in power of 2). It is therefore faster than the classical algorithm, which requires k2 single-digit products. Θ(c(t) ne), where e = log(2t − 1) / log(t), ne is the time spent The basic step of Karatsuba's algorithm is a formula that on sub-multiplications, and c is the time spent on additions allows us to compute the product of two large numbers X and multiplication by small constants. The Karatsuba and Y using three multiplications of smaller numbers, each algorithm is a special case of Toom–Cook, where the number with about half as many digits as X or Y, plus some additions is split into two smaller ones. It reduces 4 multiplications to 3 and digit shifts. Let X and Y are represented as n-digit and so operates at Θ(nlog(3)/log(2)), which is about Θ(n1.585). strings in some base B. For any positive integer l less than k, Ordinary long multiplication is equivalent to Toom-1, with one can split the two given numbers as follows complexity Θ(n2). R=Bl In a typical large integer implementation, each integer is X=X1R+X0 represented as a sequence of digits in positional notation, Y=Y1R+Y0 with the base or radix set to some (typically large) value b, (in Where X0 and Y0 are less than R. The product is then a computer implementation, b would typically be a power of XY=( X1R+X0)( Y1R+Y0) =Z2R2 +Z1R+Z0 2 instead). Say the two integers being multiplied are: The Where base B = bi, such that the number of digits of both m and n in Z2=X1Y1 base B is at most t (e.g., 3 in Toom-3). Then Z0=X0Y0 separate m and n into their base B digits mi, ni: Then use Z1=X1Y0+X0Y1=(X1+X0) (Y1+Y0)-Z2-Z0 these digits as coefficients in degree t−1 Karatsuba observed that XY can be computed in only three polynomials p and q, with the property that p(B) multiplications, and few extra additions: = m and q(B) = n: First step multiplication of number and splitting three parts p(x)=m2x2+m1x+m0 Z0, Z1, Z2 Modular multiplication step compute this three q(x)=n2x2+n1x+n0 parts as follows. (XYR-1)mod M The purpose of defining these polynomials is that if = (Z2R2 +Z1R+Z0)R-1 mod M =(Z2R)mod M +Z1mod M +Z0R-1 mod M compute their product r(x) = p(x)q(x), our answer will =(Z2R)mod M +Z1mod M+X0Y0R-1 mod M be r(B) = m×n. In the case where the numbers being =BarrettM(Z2,R) + BarrettM(Z1,1) multiplied are of different sizes, it's useful to use different +MontM(X0,Y0) values of t for m and n, which we'll call tm and tn. The To obtain a high-speed implementation, one can number of elementary operations (addition/subtraction) can compute these three different terms in parallel. We take l=k/2 be reduced. Executed here over the first operand for calculation. Modular multiplication step two Barrett (polynomial p) of the running example is the following: methods and one Montgomery requires. this step requires p0=m0+m2 102 All Rights Reserved © 2012 IJARCSEE
  • 4. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 p(0)=m0 p(1)=p0+m1 p(-1)=p0+m1 IV. RESULT p(-2)= (p(−1) + m2)×2 − m0 Use After Software performance; Execution times for the p(∞)=m2 modular multiplication of a 2k-digit number modulo a k-digit This sequence requires five addition/subtraction modulus M for the five modular multiplication algorithms operations, one less than the straightforward evaluation. In compared to the execution time of a k × k-digit multiplication practical implementations, as the operands become smaller, (b = 23, on a 1.73 GHz Intel Celeron R based PC with gcc the algorithm will switch to the Schoolbook long 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4)). multiplication. Letting r be the product polynomial: RESULT Lenth Times in milliseconds of M in r(0)=p(0)q(0) bits Barre Montgomery Bipartite Tripartite Proposed tt (Karatsuba) Algorithm r(1)=p(1)q(1) 128 8 9 22 30 47 r(-1)=p(-1)q(-1) 256 42 50 11 50 52 r(-2)=p(-2)q(-2) 512 61 59 20 58 58 r(∞)=p(∞)q(∞) 1024 94 105 89 60 74 A difficult design challenge in Toom–Cook is to find an 2048 173 143 153 120 91 efficient sequence of operations to compute this product; one sequence given by Bodrato[14] for Toom-3 is the following. r0=r(0) These observations are confirmed by a software r4=r(∞) implementation of these algorithms, see in Table. The r3=(r(−2) − r(1))/3 implementation is written in ANSI C [4] and hence should be r1=(r(1) − r(−1))/2 portable to any computer for which an implementation of the r2=r(−1) − r(0) ANSI C standard exists. All figures in this article are r3=(r2 − r3)/2 + 2r(∞) obtained on a 1.73 GHz Intel Celeron R based PC using the r2=r2 + r1 − r(∞) 32-bit compiler gcc 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4). r1=r1 − r3 Parallel execution of Bipartite, Tripartite and Proposed now product polynomial r, Algorithm with the help of fork() system call in Linux r(x)=r0+r1x+r2x2+r3x3+r4x4 operating system. Finally, evaluate r(B) to obtain our final answer. This is straightforward since B is a power of b and so the multiplications by powers of B are all shifts by a whole V. CONCLUSIONS number of digits in base b. r(B)=r0+r1B+r2B2+r3B3+r4B4 This paper discusses various algorithms for modular First step multiplication of number and splitting parts, multiplication of large numbers and evaluated them with Modular multiplication step compute these parts as follows. respect to their accuracy, computation performance and let B=R efficiency. Each algorithm has its own features suitable for a r(R)= r0+r1R+r2R2+r3R3+r4R4 specific field of application. No single algorithm provides a perfect solution to meet all demands; depending on the r(R)R-2 = (r0+r1R+r2R2+r3R3+r4R4)R-2 environment in which computation are to be performed, one = r0R-2+r1R-1+r2+r3R1+r4R2 algorithm may be preferable over another. Barrett and Modular multiplication with M in both sides as follows: Montgomery Algorithm are efficient for smaller modular (r(R)R-2)modM= (r0R-2+r1R-1+r2+r3R1+r4R2)modM multiplication but for large modular multiplication tripartite =MontM(MontM(r0,1),1)+MontM(r1,1)+ and proposed Algorithm are efficient as shown in result. BarrettM(r2,1)+BarrettM(r3,R)+BarrettM(r4,R2) The future work would be to use the Schönhage–Strassen To obtain a high-speed implementation, one can compute algorithm for the multiplication step instead of Toom-Cook’s these five different terms in parallel. where B=bi for method in proposed algorithm. calculation of this algorithm we take i=k/3, Second modular multiplication part having three Barrett algorithms and three Montgomery algorithms requires 29/18k2+2k 2 2 multiplications, 29/3k +11k+12 additions, 17/6k +34/3k+16 reads, and 17/18k2+25/3k+10 writes 11/3k+8 subtraction. In the above all algorithms they are slightly difference in there number of operations in read, write, multiplication, subtraction and addition, this algorithm execution in parallel way so that time consuming is less than other algorithms. . 103 All Rights Reserved © 2012 IJARCSEE
  • 5. ISSN: 2277 – 9043 International Journal of Advanced Research in Computer Science and Electronics Engineering Volume 1, Issue 2, April 2012 REFERENCES [1] W. Diffie and M.E. Hellman, “New Directions in Cryptography,” IEEE Trans. Information Theory, vol. IT-22, no. 6, pp. 644-654, Nov. 1976 [2] R.L. Rivest, A. Shamir, and L. Adleman, “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems,” Comm. ACM, vol. 21, no. 2, pp. 120-126, Feb. 1978. [3] T. ElGamal, “A Public Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms,” IEEE Trans. Information Theory, vol. 31, no. 4, pp. 469-472, July 1985. [4] ANSI X9.30, Public Key Cryptography for the Financial Services Industry: Part 1: The Digital Signature Algorithm (DSA), Am. Nat’l Standards Inst., Am. Bankers Assoc., 1997. [5] Marcelo E. Kaihara and Naofumi Takagi, “Bipartite Modular Multiplication Method” IEEE Transactions on Computers, vol. 57, no. 2, pp. 157-164, Feb. 2008 [6] M. E. Kaihara and N. Takagi. Bipartite Modular Multiplication. In J. R. Rao and B. Sunar, editors, Proceedings of 7th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), number 3659 in Lecture Notes in Computer Science. Springer-Verlag, 2005 [7] G.R. Blakley, “A Computer Algorithm for Calculating the Product AB Modulo M,” IEEE Trans. Computers, vol. 32, no. 5, pp. 497-500, May 1983. [8] E.F. Brickell, “A Fast Modular Multiplication Algorithm with Application to Two Key Cryptography,” Advances in Cryptology Proc. CRYPTO ’82, pp. 51-60, 1983. [9] P.L. Montgomery, “Modular Multiplication without Trial Division,” Math. Computation, vol. 44, no. 170, pp. 519-521, Apr. 1985. [10] K.R. Sloan, “Comments on a Computer Algorithm for Calculating the Product AB Modulo M,” IEEE Trans. Computers, vol. 34, no. 3, pp. 290-292, Mar. 1985. [11] P.D. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” Advances in Cryptology, Proc. Crypto’86, LNCS 263, A.M. Odlyzko, Ed., Springer-Verlag, pp. 311–323, 1987. [12] Bosselaers, R. Govaerts, and J. Vandewalle, "Comparison of Three Modular Reduction Functions," Proc. CRYPTO'93, pp.175-186. [13] Menezes, J., van Oorschot, P. C., and Vanstone, S. A., "Handbook of Applied Cryptology," chapter 14.3.3, pp. 603-604. [14] Marco Bodrato. Towards Optimal Toom–Cook Multiplication for Univariate and Multivariate Polynomials in Characteristic 2 and 0. In WAIFI'07 proceedings, volume 4547 of LNCS, pages 116–133. June 21–22, 2007 [15] A. Toom. The Complexity of a Scheme of Functional Elements Realizing the Multiplication of Integers.Translations of Dokl. Adad. Nauk. SSSR, 3, 1963. [16] A. Karatsuba and Y. Ofman. Multiplication of Many-Digital Numbers by Automatic Computers. Translation in Physics-Doklady, 145:595-596, 7 1963. [17] N. Koblitz. Elliptic Curve Cryptosystem. Math. Comp., 48:203-209, 1987. [18] Ç.K. Koç, T. Acar, and BS Kaliski, “Analyzing and Comparing Montgomery Multiplication Algorithms,” IEEE Micro, vol. 16, no. 3, pp. 26-33, June 1996. [19] Kazuo Sakiyama, Miroslav Knezevic, Junfeng Fan, Bart Preneel, and Ingrid Verbauwhede. Tripartite modular multiplication. Integration, 44(4):259 269, 2011 [20] Craig Gentry, Shai Halevi, and Vinod Vaikuntanathan. i-hop homomorphic encryption and rerandomizable yao circuits. In Tal Rabin, editor, CRYPTO, volume 6223 of Lecture Notes in Computer Science, pages 155 172. Springer, 2010. 104 All Rights Reserved © 2012 IJARCSEE