SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3483
Efficient Multiplier Design Using Adaptive Hold Logic with
Montgomery Algorithm
Ramya N1, Rose Mistica S2, Subikma Binusha V3, Prof Savitha G4
123Students, Dept of Electronics and Communication Engineering, Jeppiaar SRR Engineering College, Chennai,
Tamil Nadu
4Assistant Professor, Dept. of Electronics and Communication Engineering, Jeppiaar SRR Engineering College,
Chennai
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - In most of the digitalsignalprocessors, multiplier
is used as a key component. So, the performance of the system
depends on the throughput of themultiplier. Nowadays, relia-
bility is an important design concern in advanced technology
nodes. Performance of the system is significantly affected by
the aging of transistor and the system may fail due to delay
problems in long term. The impact of aging getting higher
with the scaling of transistor. One of the main cause for aging
in transistor is Bias Temperature Instability (BTI). Due tothis
effect threshold voltage of the transistor increases over time
and it reduces the multiplier speed. Over-design approaches
can be used to reduce the aging effect, but these may cause
power and area inefficiency. Fixed latency designs have high
chance of timing violations. So, a multiplier with variable
latency is used for reliable operation under BTI effects. An
Adaptive Hold Logic (AHL) is used for the proper se- lection of
cycle period and an Error Detection Correction Pulsed Latch
(ECPL) is used for the detection of timing errors. In modular
arithmetic computation, Montgomery multiplication
algorithm is used to perform faster modular multiplication
which was introduced by Peter L Montgomery In 1985.
Key Words: Bias Temperature Instability, Razor Flipflop,
Error Detection and Correction Pulsed Latch, Adaptive Hold
Logic, Montgomery Multiplication Algorithm.
1. INTRODUCTION
Digital multipliers square measure among the foremost
vital arithmetic practical units
in several applications, like the
Fourier remodel, distinct trigonometric function
transforms, and digital filtering. The turnout of those
applications depends on multipliers, and if the
multipliers square measure too slow, the performance of
entire circuits are reduced moreover, negative bias
temperature instability (NBTI) happens once a pMOS
semiconductor is beneath negative bias (Vgs = -Vdd),
during this state of affairs, the interaction between
inversion layer holes and hydrogen-passivated Si atoms
breaks the Si–H bond generated throughout the chemical
reaction method, generating H or H2 molecules.
Once these molecules diffuse away, interface traps square
measure left. The accumulated interface traps between
semiconducting material and therefore the gate chemical
compound interface lead to multiplied threshold voltage
(Vth), reducing the circuit shift speed. Once the biased
voltage is removed, the reverse reaction happens,
reducing the NBTI impact.
However, the reverse reaction doesn't eliminate all the
interface traps generated throughout the strain section,
and Vth is multiplied within the future. Hence, it's vital to
style a reliable superior number. The corresponding
impact on associate nMOS semi-conductor is Positive Bias
Temperature Instability(PBTI) that happens once
associate nMOS semi -conductor is beneath positive bias.
Compared with the NBTI impact, the PBTI impact is
way smaller on oxide/polygate transistors, and thus is
sometimes un-heeded unheeded. However, for high-
k/metal-gate nMOS transistors with important charge
housing, the PBTI impact will not be unheeded. In fact,
it's been shown that the PBTI impact is additional
important than the NBTI impact on32-nm high-k/metal-
gate processes. A traditional method to mitigate the aging
effect is overdesign including such things as guard-
banding and gate oversizing; however, this approach can
be very pessimistic and area and power inefficient. To
avoid this drawback, several NBTI-aware methodologies
are planned. An NBTI-aware technology mapping
technique was proposed in to guarantee the performance
of the circuit during its life time. In, an NBTI-aware sleep
transistor was designed to reduce the aging effects on
pMOS sleep-transistors, and the lifetime stability of the
power-gated circuits under consideration was improved.
Wu and Marculescu planned a joint logic restructuring
andpin rearrangement technique,that relies on detection
useful symmetries and semiconductor device stacking
effects.They additionally planned AN NBTI improvement
technique that thought of path sensitization. In and,
dynamic voltage scaling and body-basing techniques
were proposed to reduce power or extend circuit life.
These techniques, however, need circuit modification
or don't offer improvement of specific circuits.
Traditional circuits use crucial path delay because
the overall circuit clock cycle so as to perform properly.
However, the chance that the crucial ways are activated is
low.
In most cases, the trail delay is shorter than the crucial
path. For these noncritical paths, using the critical path
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3484
delay as the overall cycle period will result in significant
timing waste. Hence, the variable latency style was
planned to cut back the temporal order waste of ancient
circuits the variable-latency style divides the circuit
into 2 parts: 1) shorter ways and 2) longer ways. Shorter
ways will execute properly in one cycle, whereas longer
paths need two cycles to execute. When shorter ways
are activated oft, the average latency of variable-latency
designs is better than that of traditional designs. For
example, many variable-latency adders were planned
mistreatment the speculation technique with error
detection and recovery. For modular arithmetic
computation, Montgomery modular multiplication is
performed for faster modular multiplication.
2. PRELIMINARIES
2.1 Column-Bypass Multiplier
A column-bypassing multiplier factor is Associate in
Nursing improvement on the conventional array
multiplier factor (AM).Fig 1shows a 4×4 column-
bypassing multiplier. Supposing the inputs are10102 *
11112, it can be seen that for the FAs in the first and
third diagonals, two of the three input bits are 0: the
carry bit from its higher right solfa syllable and therefore
the partial product aibi. Therefore, the output of the
adders in each diagonals is zero, and the output sum bit is
simply equal to the third bit, which is the sum output of
its higher solfa syllable. Hence, the solfa syllable is
changed to feature 2 tri state gates and one electronic
device.
The multiplicand bit ai can be used as the selector of the
multiplexer to decide the output of the FA, and ai can also
be used as the selector of the tri state gate to turn off the
input path of the FA. If ai is 0, the inputs of FA are disabled,
and the sum bit of the current FA is equal to the submit from
its upper FA, thus reducing the power consumption of the
multiplier. If ai is 1, the normal sum result is selected.
Fig-1: Column Bypass Multiplier
2.2 Row-Bypassing Multiplier:
A low-power row-bypassing number is additionally
projected to scale back the activity power of the AM.
The operation of the low-power row-bypassing number is
comparable to it of the low-power column-bypassing
number, however the selector of the multiplexers and
also the tri state gates use the multiplicator. Fig.2 is a 4 ×
4 row-bypassing multiplier. Each input is connected
to AN solfa syllable through a tri state gate.
When the inputs are 11112*10012, the two inputs in the
first and second rows are 0 for FAs. Because b1 is 0, the
multiplexers in the first row select aib0 as the sum bit
and select 0 as the carry bit.
The inputs square measure bypassed to FAs within
the second rows and the tristate gates shut down the
input ways to the FAs. Therefore, no switch activities
occur within the first-row FAs; in return power consump
-tion is reduced. Similarly, because b2 is 0, no switching
activities will occur in the second-row FAs. However, the
FAs must be active in the third row because the b3 is not
zero
Fig-2: Row Bypass Multiplier
2.3 Variable Latency Design
Variable Latency Unit
Average Case Computation
Average-case computation, as the name suggests, refers to
those computations that occur more frequently than others,
and also get completed within average delays, considering
the delay required by all the computations the circuit
performs. Within the synchronous paradigm, two classes of
techniques have been proposed for exploiting the average-
case computations: variable-latency units, and error
detection-correction units. Our work in this chapter focuses
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3485
on the design of BTI-resilient circuits using variable latency
units (VLUs).
Unlike conventional combinational circuits that complete
operations within one clock cycle, VLUs allow the
computation of the combinational circuit to be completed in
a variable, integer, number of clock cycles. By allowing high-
probability operations to complete in a single cycle, but
allowing rarer events to use multiple (typically two) cycles,
the average cycle time may be shorter than that of the
conventional implementation, implying that the circuit
throughput for a VLU may be significantly larger. For
example, Fig.4 is associate 8-bit variable-latency ripple
carry adder (RCA).
A8–A1, B8–B1 are 8-bit inputs, and S8–S1 are Fig. 4. 8-bit
RCA with a hold logic circuit. Fig.5 Path delay distribution
of AM, column, and row-bypassing multipliers for 65 536
input patterns. The outputs. Supposing the delay for each
FA is one, and the maximum delay for the adder is 8.
Through simulation, it can be determined that the
possibility of the carry propagation delay being longer
than 5 is low.
Hence, the cycle amount is about to five, and hold logic
is other to inform the system whether or not the
adder will complete the operation at intervals a cycle
amount. Fig.3 additionally shows the hold logic
that's utilized in this circuit.
The operate of the hold logic is (A4 XOR B4)(A5 XOR
B5).If the output of the hold logic is zero, i.e., A4 = B4 or
A5 = B5, either the fourth or the fifth adder will not
produce a carryout. Hence, the utmost delay are going to
be but one cycle amount.
When the hold logic output is one, this suggests that the
input can activate methods longer than five, that the hold
logic notifies the system that this operation needs 2
cycles to finish.
Two cycles are sufficient for the longest path to complete
(5 * 2 is larger than 8).The performance improvement of
the variable-latency design can be calculated as follows:
if the possibility of every input being one is zero. 5, the
possibility of (A4 XOR B4)(A5 XOR B5) being 1 is 0.25.
The average latency for the variable latency style is zero.
75∗5+0.25∗10 = 6.25. Compared with the easy fixed-
latency RCA, which has an average latency of 8, the
variable-latency design can achieve a 28% performance
improvement. Fig.4 shows the path delay distribution of a
16 × 16 AM and for both a traditional column-bypassing
and traditional row-bypassing multiplier with 65536
randomly chosen input patterns.
All multipliers execute operations on a set cycle amount.
The maximum path delay is 1.32 ns for the AM,1.88 ns for
the column-bypassing multiplier, and 1.82 ns for the row-
bypassing multiplier. It can be seen that for the AM, quite
ninety eight of the ways have a delay of <0.7ns. Moreover,
more than 93% and 98% of the paths in the FLCB and
row-bypassing multipliers present a delay of <0.9 ns,
respectively. Hence, using the maximum path delay for all
paths will cause significant timing waste for shorter
paths, and redesigning the multiplier with variable
latency can improve their performance. Another key
observation is that the path delay for an operation is
strongly tied to the number of zeros in the multiplicands
in the column-bypassing multiplier.
Fig-3: 8-bit RCA with Hold logic circuit
Fig-4: Path Delay Distribution of AM, Column and Row
bypassing multipliers for 65536 input patterns
3. AGING AWARE RELIABLE MULTIPLIER
Fig-5: Aging Aware Reliable Multiplier
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3486
Fig 5 is an aging-aware multiplier factor design, which
incorporates 2 m-bit inputs (m may be a positive
number), one 2m-bit output, one column- or row-
bypassing multiplier factor, 2m 1-bit Razor flip-
flops Associate in Nursing d an AHL circuit. The inputs of
row-bypassing multiplier factor square measure the
symbols within the parentheses. In the planned design,
the column- and row-bypassing multipliers will
be examined by the amount of zeros in either the number
or multiplicator to predict whether or not the operation
needs one cycle or two cycles to complete. When input
patterns square measure random, the amount of zeros
and ones within the multiplicator and number follows a
traditional distribution. Therefore, mistreatment the
amount of zeros or ones because the judgement criteria
leads to similar outcomes.
Hence, the 2 aging-aware multipliers will be enforced
mistreatment similar design and therefore the distinction
between the 2 bypassing multipliers lies within the input
signals of the AHL. According to the bypassing choice
within the column or row-bypassing multiplier, the input
signal of the AHL in the architecture with the column-
bypassing multiplier is the multiplicand, whereas that of
the row-bypassing multiplier is the multiplicator.
4. RAZOR FLIPFLOP
Fig -6:Razor Flipflop
Fig 6. is Razor flip-flops which are often accustomed sight
whether or not temporal order violations occur before
consecutive input pattern arrives. A 1-bit Razor flip-flop
contains a main flip-flop, shadow latch, XOR gate, and mux.
The main flip-flop catches the execution result for the mix
circuit employing a traditional clock signal, and also the
shadow latch catches the execution result employing a
delayed clock signal, which is slower than the normal clock
signal. If the barred little bit of the shadow latch is
completely different from that of the most flip-flop, this
suggests the trail delay of this operation exceeds the cycle
amount, and the main flip-flop catches an incorrect result. If
errors occur, the Razor flip-flop will set the error signal to 1
to notify the system tore execute the operation and notify
the AHL circuit that an error hasoccurred.WeuseRazorflip-
flops to sight whether or not Associate in Nursing operation
that's thought of to be a one-cycle patternwill extremely end
in a very cycle.
If not, the operation is re-executed withtwocycles.Although
the re execution may seem costly, the overall cost is low
because the re execution frequency is low. The AHLcircuitis
the key component in the aging-ware variable-latency
multiplier. The AHL circuit contains an aging indicator, two
judging blocks, one mux, and one D flip-flop.
The aging indicator indicates whether or not the circuit has
suffered vital performance degradation because oftheaging
result. The aging indicator is enforced in a very straight
forward counter that counts the {amount the quantity} of
errors over a precise amount of operations and is reset to
zero at the tip of those operations.
If the cycle amount is just too short, the column- or row-
bypassing multiplier factor isn't ready to complete these
operations with success, inflictingtemporal orderviolations.
These temporal order violations are going to be caught by
the Razor flip-flops, that generate error signals.
If errors happen often and exceed a predefined threshold, it
means the circuit has suffered significanttimingdegradation
due to the aging effect, and the aging indicator will output
signal 1; otherwise, it'll output zero to point the aging result
remains not vital, and no actions square measure required.
The first decision making block within the AHL circuit can
output 1if the quantity of zeros within the number
(multiplicator for the row-bypassing multiplier) is larger
than n (n is a positive number, which will be discussed in
Section IV), and these Cond judging block in the AHL circuit
will output 1 if the number of zeros in the multiplicand
(multiplicator) is larger than n + 1.
They are each utilized to make your mind up whether or not
an input pattern needs one or 2 cycles, however only 1 of
them are chosen at a time. In the starting, the aging result
isn't important, and the aging indicator produces 0, so the
first judging block is used. After a amount of your time once
the aging result becomes important, the second decision
making block is chosen. Compared with theprimarydecision
making block, the second decision making block permits a
smaller range of patterns to become one-cycle patterns as a
result of it needs a lot of zeros within the number
(multiplicator).
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3487
5. ADAPTIVE HOLD LOGIC
Fig-7:Adaptive Hold Logic
Fig 7 is an Adaptive hold logic when an input patternarrives,
both judging blocks will decide whetherthepatternrequires
one cycle or two cycles to complete and pass both results to
the multiplexer. The multiplexer selects one of either result
supported the output of the aging indicator.
Then associate degree OR operation is performed between
the results of the electronic device, and the Q signal is used
to determine the input of the D flip-flop.
When the pattern needs one cycle, the output of the
multiplexer is 1. The !(gating) signal will become 1, and the
and the input flip flops will latch new data in the next cycle.
On the other hand, when the output of the multiplexer is 0,
which means the input pattern requires two cycles to
complete, the OR gate will output 0 to the D flip-flop.
Therefore, the !(gating) signal are zero to disable the clock
signal of the input flip-flops within the next cycle.
Note that solely a cycle of the input flip-flop are disabledasa
result of the D flip-flop can latch one within the next cycle.
The overall flow of our planned design is as follows: once
input patterns arrive, the column- or row-bypassing
multiplier, and the AHL circuit execute simultaneously.
According to the number of zeros in the multiplicand
(multiplicator), the AHL circuit decides if the input patterns
require one or two cycles. If the input pattern needs 2 cycles
to complete, the AHL will output 0 to disable the clock signal
of the flip-flops. Otherwise, the AHL can output one for
traditional operations. When the column- or row-bypassing
number finishes the operation, the result are passed to the
Razor flip-flops. The Razor flip-flops check whether or not
there's the trail delay temporal arrangement violation.
If temporal arrangement violationsoccur,itsuggeststhat the
cycle amount isn't long enough for the present operation to
complete which the execution results of themultiplierfactor
is wrong. Thus, the Razor flip-flops can output a slip to tell
the system that the present operation must be re dead
exploitation 2 cycles to make sure the operation is correct.
In this situation, the extra re execution cycles caused by
timing violation incurs a penalty to overall average latency.
However, our planned AHL circuit will accurately predict
whether or not the input patterns need one or 2 cycles in
most cases. Only many input patterns might cause a
temporal arrangement variation once the AHLcircuitjudges
incorrectly. In this case, the extra re execution cycles did not
produce significant timing degradation.
In summary, our planned multiplier factor style has 3 key
options.
First, it is a variable-latencydesignthatminimizesthetiming
waste of the noncritical paths.
Second, it will give reliable operations even when the aging
result happens.
The Razor flip-flops discover the temporal arrangement
violations and re execute the operations exploitation 2
cycles.
Finally, our design will regulate the share of one-cycle
patterns to attenuateperformancedegradationthanksto the
aging result.
When the circuit is aged, and many errors occur, the AHL
circuit uses the second judging block to decide if an input is
one cycle or two cycles.
6. MONTGOMERY ALGORITHM
Montgomery multiplication could be a methodology for
computing ab mod m for positive integers a, b, and m.
1.It reduces execution time on a pc once there are an
outsized range of multiplications to be through with
constant modulus m, and with a tiny low range of
multipliers.
In specific, it's helpful for computing Associate in Nursing
mod m for an outsized worth of n. The number of
multiplications modulo m in such a computation is reduced
to variety considerably but n by in turn squaring and
multiplying in line with the pattern of the bits within the
binary expression for n (“binary decomposition”). But it will
still be an outsized enough range to be worthy rushing up if
potential.
The difficulty is within the reductions modulo m, which are,
primarily, division operations, which are costly in execution
time. If one defers the modulus operation to thetop,thenthe
product can grow to terribly massive numbers, which slows
down the multiplications and also the final modulus
operation. To use Montgomery multiplication, we tend to
should have the multipliers a and b but the modulus m.
We introduce another whole number r that should be larger
than m, and that we should have gcd(r, m) = 1.
The method, primarily, changes the reductionmodulomto a
discount modulo r. sometimes r is chosen to be Associate in
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3488
Nursing integral power of two, therefore the reduction
modulo r is just a masking operation; that's, retentive the
lg(r) low-order bits of Associate in Nursing intermediate
result, and discarding higher order bits. If r couldbea power
of two, we have a tendency to should have m odd, to satisfy
the gcd demand. (Any odd worth from three to r 1 is
suitable.)
The method: 1. realize 2 integers 1 r and m such one. one 
  rr  millimetre this could be done by the extended gcd
algorithmic program. there's a binary extended gcd
algorithmic program that will no divisions, and that
simplifies considerably once one argument (r) could be a
power of two and also the different (m) is odd. This
simplified version of the algorithmic program isgivenbelow
(C perform xbinGCD).
2. rework the multipliers to “Montgomery space” by
multiplying them by r (a shift left operation if r could be a
power of 2) and reducing the merchandise modulo m. That
is, mod. mod , and b br m a ar m   These area unit pricey
operations, however they're done just one occasion per
multiplier factor, and that they aren't done on the
intermediate product of a sequence of multiplications.
3. Perform the Montgomery multiplication step. This
operates on the remodeled quantities a and b, giving the
merchandise of a and b in Montgomery area. That is, the
result's abr mod m. The multiplication tm isn't too pricey as
a result of the mod r implies that solely the low-order lg(r)
bits of the merchandise want be created. If the calculations
area unit performed to some mounted length w bits, with 2,
w r  then the opposite 2 multiplications area unit of the
shape w  w  2w bits and also the addition is of the shape
2w + 2w  2w + one bits (it will overflow). once division by
r (a shift), u is of length w + one bits. 4. Do the inverse
transformation to convert the result to an ordinary integer:
mod . ab  city 1 m allow us to currently derive step three
on top of. We would like to reason u = abr mod m. A 64-bit
Implementation. Here we have a tendency to take an in
depth consider AN implementation of Montgomery
multiplication for arguments uptothecomputer’swordsize.
For corporeality we have a tendency to take it to be sixty
four bits. The modulus m can be as large as 2 1, 64  and a
and b can be as large as m 1. We take 2. 64 r  this can be a
65-bit variety, however it are often handled while not nice
issue.
Step 1: The GCD Operation Below could be a C operate for
the binary extended gcd operation, simplified for the case
within which its initial argument a could be a power of two
and the second argument b is odd. It is a simplification ofthe
rule obtainable on the net.
Step 2: Transform the Multipliers we have a tendency to
should reason a  ar mod m, and equally for b. Because 2 ,
sixty four r  there's no multiplication to try to. We should
kind a 128-bit whole number that consists of a followed by
sixty four 0-bits, and compute the remainder of division of
that quantity by m. Some machines have an instruction for
that. For different machines, the C operate shown below
could also be used. This is the “hardware division”algorithm
of Hacker’s Delight. Invoke it as follows, wherever the
primary 2 arguments represent ar. All variables are 64-bit
unsigned integers. abar = modul64(a, 0, m)
Step 3: Montgomery MultiplicationThisstepdealswith128-
bit integers, however no quite that. The computationt abis
multiplying 2 64-bit unsigned integers, giving a 128-bit
product. Some machines have an instruction for that. For
alternative machines, the C operate below could also be
used.
Next, the subsequent expression should be evaluated:
u  (t  (tmmod r)m)/r.
Variable t could be a 128-bit unsigned number, and m could
be a 64-bit unsigned number.
Because of the “mod r,” only the low-order 64 bits of the
product tm is needed.
This means that the high-order half t is neglected, and sixty
four  sixty four  64-bit multiplication is used.
The subsequent multiplication by m should be sixty four 
sixty four  128-bit multiplication.
The addition of t should be 128 + 128  129-bit addition.
This can be through with 128 + 128  128-bit addition and
one by one computing the carry, as shown within the code
below (variable ov). This sum always ends in 64 0-bits, so
the low-order part of the sum is computed only to produce a
carry bit. Incidentally, if the low-order halves of the
summands were better-known to be each nonzero, then the
carry would be one, leading to a simplification.
However, the summand summands are often zero if either a
or b is zero. finally (for step 3), we tend to should perform
the computation: if (u  m) then come u  m; else come u.
Variable u could be a 65-bit number, in effect, as a result of
the overflow mentioned on top of. however the ultimate
results of the calculation could be a 64-bit number. If the
addition of t overflowed, then actually u  m.
Otherwise, u and m could also be compared as 64- bit
integers. The subtraction are often a 64-bit operation, as a
result of it's glorious that when the subtraction, the sixty
fifth little bit of the distinction are zero. A C operate forthese
computations follows.
Next, the subsequent expression should be evaluated: u  (t
 (tmmod r)m)/r. Variable t could be a 128-bit unsigned
number, and m could be a 64-bit unsigned number. as a
result of the “mod r,” solely the low-order sixty four bits of
the merchandise tm is required. This implies that the high-
order half t are often neglected, and sixty four  sixty four 
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3489
64-bit multiplication are often used. The next multiplication
by m should be sixty four  sixty four  128-bit
multiplication. The addition of t should be 128 + 128129-
bit addition. This may be through with 128 + 128  128-bit
addition and severally computing the carry,asshownwithin
the code below (variable ov). This add perpetually ends in
sixty four 0-bits, therefore the low-order a part of the add is
computed solely to supply a carry bit.Incidentally,ifthelow-
order halves of the summands were glorious to be each
nonzero, then the carry would be one, leading to a
simplification.
However, the summands are often zero if either a or b is
zero. finally (for step 3), we tend to should perform the
computation: if (u  m) then come u  m; else come u.
Variable u could be a 65-bit number, in effect, as a result of
the overflow mentioned on top of. However the ultimate
results of the calculation could be a 64-bit number.
If the addition of t overflowed, then actually u  m.
Otherwise, u and m could also be compared as 64- bit
integers. The subtraction are often a 64-bit operation, as a
result of it's glorious that when the subtraction, the sixty
fifth little bit of the distinction are zero. AC operate for these
computation follows.
Step4: The Inverse Transformation we tend to should
reason , mod metropolis 1 m that is that the product of a
and b modulo m as normal integers. All variables area unit
64-bit unsigned integers. The multiplication should be done
mistreatment sixty four  sixty four  128-bit
multiplication, and also the modulo operation should be
done mistreatment 128 / sixty four  64-bit division
(actually remaindering).
64-bit division (actually remaindering).
7. CONCLUSION
This paper proposed an efficient multiplier design with AHL
using Montgomery multiplication algorithm. The multiplier
is able to adjust the AHL to mitigate the performance
degradation because variable latency multipliers have less
timing waste, but traditional multipliersneedtoconsider the
degradation caused by both BTI effect and electro migration
and use the worst case delay as the cyclic period. In this
purposed architecture we have shown that, AHL with
Montgomery Multiplication Algorithm will decrease the
delay and improves the performance compared with
previous design.
REFERENCES
1. SaiLakshmy, et.al,” Performance Analysis of Aging-
Aware Multiplier Using Various Adders”,
International Conference on Communication and
Signal Processing, April 6-8, 2016, India
2. P.KamilaParveen,et.al.”Multiplier Design using
MTCMOS with Adaptive Hold Logic” 2016
International Conference on Advanced
Communication Control and Computing
Technologies (ICACCCT).
3. Y. Cao. (2016). Predictive Technology Model (PTM)
and NBTI Model [Online]. Available:
http://www.eas.asu.edu/∼ptm
4. S. Zafaret al., “A comparative study of NBTI and
PBTI (charge trapping) in SiO2/HfO2 stacks with
FUSI, TiN, Re gates,” in Proc. IEEE Symp. VLSI
Technol. Dig. Tech. Papers, 2016, pp. 23–25

More Related Content

PDF
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
PDF
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
PDF
Design & Simulation of Half Adder Circuit using AVL Technique Based on CMOS T...
PDF
IRJET- Design of Phased Array Antenna for Beam Forming Applications using...
PDF
Design of Continuous Time Multibit Sigma Delta ADC for Next Generation Wirele...
PDF
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...
PDF
IRJET- The RTL Model of a Reconfigurable Pipelined MCM
PDF
International Journal of Computational Engineering Research(IJCER)
IRJET - Low Power M-Sequence Code Generator using LFSR for Body Sensor No...
RTB: BIDIRECTIONAL TRANSCEIVER (ESSCIRC85)
Design & Simulation of Half Adder Circuit using AVL Technique Based on CMOS T...
IRJET- Design of Phased Array Antenna for Beam Forming Applications using...
Design of Continuous Time Multibit Sigma Delta ADC for Next Generation Wirele...
IRJET- Energy Efficient One Bit Subtractor Circuits for Computing Application...
IRJET- The RTL Model of a Reconfigurable Pipelined MCM
International Journal of Computational Engineering Research(IJCER)

What's hot (20)

PDF
Performance analysis of cmos comparator and cntfet comparator design
PDF
Towards Efficient Modular Adders based on Reversible Circuits
PDF
Negative image amplifier technique for performance enhancement of ultra wideb...
PDF
Iaetsd design and simulation of high speed cmos full adder (2)
PDF
IRJET - Analysis of Power Consumption in Glitch Free Dual Edge Triggered ...
PDF
Design of Three-Input XOR/XNOR using Systematic Cell Design Methodology
PDF
Building impedance matching network based on s parameter from manufacturer
PDF
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
PDF
Comparative Performance Analysis of Low Power Full Adder Design in Different ...
PDF
Cq4301536541
PDF
Design and Fabrication of the Novel Miniaturized Microstrip Coupler 3dB Using...
PDF
Design and Analysis of Low Power High Speed Hybrid logic 8-T Full Adder Circuit
PDF
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
PDF
IRJET- Analysis of BER Performance for DCO-OFDM in VLC SYSTEM
PDF
Overview of signal integrity simulation for sfp+ interface serial links with ...
PDF
Design a Low Power High Speed Full Adder Using AVL Technique Based on CMOS Na...
PDF
Fpga implementation of soft decision low power convolutional decoder using vi...
PDF
High Speed Low Power Veterbi Decoder Design for TCM Decoders
PDF
logical effort based dual mode logic gates by mallika
PDF
IRJET- Fin FET Two Bit Comparator for Low Voltage, Low Power, High Speed and ...
Performance analysis of cmos comparator and cntfet comparator design
Towards Efficient Modular Adders based on Reversible Circuits
Negative image amplifier technique for performance enhancement of ultra wideb...
Iaetsd design and simulation of high speed cmos full adder (2)
IRJET - Analysis of Power Consumption in Glitch Free Dual Edge Triggered ...
Design of Three-Input XOR/XNOR using Systematic Cell Design Methodology
Building impedance matching network based on s parameter from manufacturer
IRJET- Low Power Adder and Multiplier Circuits Design Optimization in VLSI
Comparative Performance Analysis of Low Power Full Adder Design in Different ...
Cq4301536541
Design and Fabrication of the Novel Miniaturized Microstrip Coupler 3dB Using...
Design and Analysis of Low Power High Speed Hybrid logic 8-T Full Adder Circuit
A 10-BIT 25 MS/S PIPELINED ADC USING 1.5-BIT SWITCHED CAPACITANCE BASED MDAC ...
IRJET- Analysis of BER Performance for DCO-OFDM in VLC SYSTEM
Overview of signal integrity simulation for sfp+ interface serial links with ...
Design a Low Power High Speed Full Adder Using AVL Technique Based on CMOS Na...
Fpga implementation of soft decision low power convolutional decoder using vi...
High Speed Low Power Veterbi Decoder Design for TCM Decoders
logical effort based dual mode logic gates by mallika
IRJET- Fin FET Two Bit Comparator for Low Voltage, Low Power, High Speed and ...
Ad

Similar to IRJET- Efficient Multiplier Design using Adaptive Hold Logic with Montgomery Algorithm (20)

PDF
F010632733
PDF
Implementation of a High Speed and Power Efficient Reliable Multiplier Using ...
PDF
Design and Implementation of Multiplier using Advanced Booth Multiplier and R...
PDF
IRJET- A Novel High Speed Power Efficient Double Tail Comparator in 180nm...
PDF
Dx34756759
PDF
IRJET - Low Power Design for Fast Full Adder
PDF
IRJET- Design of Memristor based Multiplier
PDF
IRJET- An Evaluation of the Performance Parameters of CMOS and CNTFET based D...
PDF
MULTIPLE TESTS ON TRANSFORMER WITH THE HELP OF MATLAB SIMULINK
PDF
age acknowledging reliable multiplier
PDF
Operational Transconductance Amplifier on High Gain
PDF
IRJET- A Novel Design of Hybrid 2 Bit Magnitude Comparator
PDF
IRJET- A Implementation of High Speed On-Chip Monitoring Circuit by using SAR...
PDF
IRJET- FPGA Implementation of Efficient Muf Gate based Multipliers
PDF
Fault Modeling and Parametric Fault Detection in Analog VLSI Circuits using D...
PDF
IRJET- Power Quality Improvement by using Three Phase Adaptive Filter Control...
PDF
Underground Cable Fault Detection Using IOT
PDF
AGING EFFECT TOLERANT MULTIPRECISION RAZOR-BASED MULTIPLIER
PDF
P0450495100
PDF
IRJET- Wavelet Decomposition along with ANN used for Fault Detection
F010632733
Implementation of a High Speed and Power Efficient Reliable Multiplier Using ...
Design and Implementation of Multiplier using Advanced Booth Multiplier and R...
IRJET- A Novel High Speed Power Efficient Double Tail Comparator in 180nm...
Dx34756759
IRJET - Low Power Design for Fast Full Adder
IRJET- Design of Memristor based Multiplier
IRJET- An Evaluation of the Performance Parameters of CMOS and CNTFET based D...
MULTIPLE TESTS ON TRANSFORMER WITH THE HELP OF MATLAB SIMULINK
age acknowledging reliable multiplier
Operational Transconductance Amplifier on High Gain
IRJET- A Novel Design of Hybrid 2 Bit Magnitude Comparator
IRJET- A Implementation of High Speed On-Chip Monitoring Circuit by using SAR...
IRJET- FPGA Implementation of Efficient Muf Gate based Multipliers
Fault Modeling and Parametric Fault Detection in Analog VLSI Circuits using D...
IRJET- Power Quality Improvement by using Three Phase Adaptive Filter Control...
Underground Cable Fault Detection Using IOT
AGING EFFECT TOLERANT MULTIPRECISION RAZOR-BASED MULTIPLIER
P0450495100
IRJET- Wavelet Decomposition along with ANN used for Fault Detection
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Geodesy 1.pptx...............................................
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Welding lecture in detail for understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT
Mechanical Engineering MATERIALS Selection
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT on Performance Review to get promotions
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Geodesy 1.pptx...............................................
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Welding lecture in detail for understanding
CYBER-CRIMES AND SECURITY A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
Model Code of Practice - Construction Work - 21102022 .pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Digital Logic Computer Design lecture notes
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mechanical Engineering MATERIALS Selection
UNIT-1 - COAL BASED THERMAL POWER PLANTS

IRJET- Efficient Multiplier Design using Adaptive Hold Logic with Montgomery Algorithm

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3483 Efficient Multiplier Design Using Adaptive Hold Logic with Montgomery Algorithm Ramya N1, Rose Mistica S2, Subikma Binusha V3, Prof Savitha G4 123Students, Dept of Electronics and Communication Engineering, Jeppiaar SRR Engineering College, Chennai, Tamil Nadu 4Assistant Professor, Dept. of Electronics and Communication Engineering, Jeppiaar SRR Engineering College, Chennai ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - In most of the digitalsignalprocessors, multiplier is used as a key component. So, the performance of the system depends on the throughput of themultiplier. Nowadays, relia- bility is an important design concern in advanced technology nodes. Performance of the system is significantly affected by the aging of transistor and the system may fail due to delay problems in long term. The impact of aging getting higher with the scaling of transistor. One of the main cause for aging in transistor is Bias Temperature Instability (BTI). Due tothis effect threshold voltage of the transistor increases over time and it reduces the multiplier speed. Over-design approaches can be used to reduce the aging effect, but these may cause power and area inefficiency. Fixed latency designs have high chance of timing violations. So, a multiplier with variable latency is used for reliable operation under BTI effects. An Adaptive Hold Logic (AHL) is used for the proper se- lection of cycle period and an Error Detection Correction Pulsed Latch (ECPL) is used for the detection of timing errors. In modular arithmetic computation, Montgomery multiplication algorithm is used to perform faster modular multiplication which was introduced by Peter L Montgomery In 1985. Key Words: Bias Temperature Instability, Razor Flipflop, Error Detection and Correction Pulsed Latch, Adaptive Hold Logic, Montgomery Multiplication Algorithm. 1. INTRODUCTION Digital multipliers square measure among the foremost vital arithmetic practical units in several applications, like the Fourier remodel, distinct trigonometric function transforms, and digital filtering. The turnout of those applications depends on multipliers, and if the multipliers square measure too slow, the performance of entire circuits are reduced moreover, negative bias temperature instability (NBTI) happens once a pMOS semiconductor is beneath negative bias (Vgs = -Vdd), during this state of affairs, the interaction between inversion layer holes and hydrogen-passivated Si atoms breaks the Si–H bond generated throughout the chemical reaction method, generating H or H2 molecules. Once these molecules diffuse away, interface traps square measure left. The accumulated interface traps between semiconducting material and therefore the gate chemical compound interface lead to multiplied threshold voltage (Vth), reducing the circuit shift speed. Once the biased voltage is removed, the reverse reaction happens, reducing the NBTI impact. However, the reverse reaction doesn't eliminate all the interface traps generated throughout the strain section, and Vth is multiplied within the future. Hence, it's vital to style a reliable superior number. The corresponding impact on associate nMOS semi-conductor is Positive Bias Temperature Instability(PBTI) that happens once associate nMOS semi -conductor is beneath positive bias. Compared with the NBTI impact, the PBTI impact is way smaller on oxide/polygate transistors, and thus is sometimes un-heeded unheeded. However, for high- k/metal-gate nMOS transistors with important charge housing, the PBTI impact will not be unheeded. In fact, it's been shown that the PBTI impact is additional important than the NBTI impact on32-nm high-k/metal- gate processes. A traditional method to mitigate the aging effect is overdesign including such things as guard- banding and gate oversizing; however, this approach can be very pessimistic and area and power inefficient. To avoid this drawback, several NBTI-aware methodologies are planned. An NBTI-aware technology mapping technique was proposed in to guarantee the performance of the circuit during its life time. In, an NBTI-aware sleep transistor was designed to reduce the aging effects on pMOS sleep-transistors, and the lifetime stability of the power-gated circuits under consideration was improved. Wu and Marculescu planned a joint logic restructuring andpin rearrangement technique,that relies on detection useful symmetries and semiconductor device stacking effects.They additionally planned AN NBTI improvement technique that thought of path sensitization. In and, dynamic voltage scaling and body-basing techniques were proposed to reduce power or extend circuit life. These techniques, however, need circuit modification or don't offer improvement of specific circuits. Traditional circuits use crucial path delay because the overall circuit clock cycle so as to perform properly. However, the chance that the crucial ways are activated is low. In most cases, the trail delay is shorter than the crucial path. For these noncritical paths, using the critical path
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3484 delay as the overall cycle period will result in significant timing waste. Hence, the variable latency style was planned to cut back the temporal order waste of ancient circuits the variable-latency style divides the circuit into 2 parts: 1) shorter ways and 2) longer ways. Shorter ways will execute properly in one cycle, whereas longer paths need two cycles to execute. When shorter ways are activated oft, the average latency of variable-latency designs is better than that of traditional designs. For example, many variable-latency adders were planned mistreatment the speculation technique with error detection and recovery. For modular arithmetic computation, Montgomery modular multiplication is performed for faster modular multiplication. 2. PRELIMINARIES 2.1 Column-Bypass Multiplier A column-bypassing multiplier factor is Associate in Nursing improvement on the conventional array multiplier factor (AM).Fig 1shows a 4×4 column- bypassing multiplier. Supposing the inputs are10102 * 11112, it can be seen that for the FAs in the first and third diagonals, two of the three input bits are 0: the carry bit from its higher right solfa syllable and therefore the partial product aibi. Therefore, the output of the adders in each diagonals is zero, and the output sum bit is simply equal to the third bit, which is the sum output of its higher solfa syllable. Hence, the solfa syllable is changed to feature 2 tri state gates and one electronic device. The multiplicand bit ai can be used as the selector of the multiplexer to decide the output of the FA, and ai can also be used as the selector of the tri state gate to turn off the input path of the FA. If ai is 0, the inputs of FA are disabled, and the sum bit of the current FA is equal to the submit from its upper FA, thus reducing the power consumption of the multiplier. If ai is 1, the normal sum result is selected. Fig-1: Column Bypass Multiplier 2.2 Row-Bypassing Multiplier: A low-power row-bypassing number is additionally projected to scale back the activity power of the AM. The operation of the low-power row-bypassing number is comparable to it of the low-power column-bypassing number, however the selector of the multiplexers and also the tri state gates use the multiplicator. Fig.2 is a 4 × 4 row-bypassing multiplier. Each input is connected to AN solfa syllable through a tri state gate. When the inputs are 11112*10012, the two inputs in the first and second rows are 0 for FAs. Because b1 is 0, the multiplexers in the first row select aib0 as the sum bit and select 0 as the carry bit. The inputs square measure bypassed to FAs within the second rows and the tristate gates shut down the input ways to the FAs. Therefore, no switch activities occur within the first-row FAs; in return power consump -tion is reduced. Similarly, because b2 is 0, no switching activities will occur in the second-row FAs. However, the FAs must be active in the third row because the b3 is not zero Fig-2: Row Bypass Multiplier 2.3 Variable Latency Design Variable Latency Unit Average Case Computation Average-case computation, as the name suggests, refers to those computations that occur more frequently than others, and also get completed within average delays, considering the delay required by all the computations the circuit performs. Within the synchronous paradigm, two classes of techniques have been proposed for exploiting the average- case computations: variable-latency units, and error detection-correction units. Our work in this chapter focuses
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3485 on the design of BTI-resilient circuits using variable latency units (VLUs). Unlike conventional combinational circuits that complete operations within one clock cycle, VLUs allow the computation of the combinational circuit to be completed in a variable, integer, number of clock cycles. By allowing high- probability operations to complete in a single cycle, but allowing rarer events to use multiple (typically two) cycles, the average cycle time may be shorter than that of the conventional implementation, implying that the circuit throughput for a VLU may be significantly larger. For example, Fig.4 is associate 8-bit variable-latency ripple carry adder (RCA). A8–A1, B8–B1 are 8-bit inputs, and S8–S1 are Fig. 4. 8-bit RCA with a hold logic circuit. Fig.5 Path delay distribution of AM, column, and row-bypassing multipliers for 65 536 input patterns. The outputs. Supposing the delay for each FA is one, and the maximum delay for the adder is 8. Through simulation, it can be determined that the possibility of the carry propagation delay being longer than 5 is low. Hence, the cycle amount is about to five, and hold logic is other to inform the system whether or not the adder will complete the operation at intervals a cycle amount. Fig.3 additionally shows the hold logic that's utilized in this circuit. The operate of the hold logic is (A4 XOR B4)(A5 XOR B5).If the output of the hold logic is zero, i.e., A4 = B4 or A5 = B5, either the fourth or the fifth adder will not produce a carryout. Hence, the utmost delay are going to be but one cycle amount. When the hold logic output is one, this suggests that the input can activate methods longer than five, that the hold logic notifies the system that this operation needs 2 cycles to finish. Two cycles are sufficient for the longest path to complete (5 * 2 is larger than 8).The performance improvement of the variable-latency design can be calculated as follows: if the possibility of every input being one is zero. 5, the possibility of (A4 XOR B4)(A5 XOR B5) being 1 is 0.25. The average latency for the variable latency style is zero. 75∗5+0.25∗10 = 6.25. Compared with the easy fixed- latency RCA, which has an average latency of 8, the variable-latency design can achieve a 28% performance improvement. Fig.4 shows the path delay distribution of a 16 × 16 AM and for both a traditional column-bypassing and traditional row-bypassing multiplier with 65536 randomly chosen input patterns. All multipliers execute operations on a set cycle amount. The maximum path delay is 1.32 ns for the AM,1.88 ns for the column-bypassing multiplier, and 1.82 ns for the row- bypassing multiplier. It can be seen that for the AM, quite ninety eight of the ways have a delay of <0.7ns. Moreover, more than 93% and 98% of the paths in the FLCB and row-bypassing multipliers present a delay of <0.9 ns, respectively. Hence, using the maximum path delay for all paths will cause significant timing waste for shorter paths, and redesigning the multiplier with variable latency can improve their performance. Another key observation is that the path delay for an operation is strongly tied to the number of zeros in the multiplicands in the column-bypassing multiplier. Fig-3: 8-bit RCA with Hold logic circuit Fig-4: Path Delay Distribution of AM, Column and Row bypassing multipliers for 65536 input patterns 3. AGING AWARE RELIABLE MULTIPLIER Fig-5: Aging Aware Reliable Multiplier
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3486 Fig 5 is an aging-aware multiplier factor design, which incorporates 2 m-bit inputs (m may be a positive number), one 2m-bit output, one column- or row- bypassing multiplier factor, 2m 1-bit Razor flip- flops Associate in Nursing d an AHL circuit. The inputs of row-bypassing multiplier factor square measure the symbols within the parentheses. In the planned design, the column- and row-bypassing multipliers will be examined by the amount of zeros in either the number or multiplicator to predict whether or not the operation needs one cycle or two cycles to complete. When input patterns square measure random, the amount of zeros and ones within the multiplicator and number follows a traditional distribution. Therefore, mistreatment the amount of zeros or ones because the judgement criteria leads to similar outcomes. Hence, the 2 aging-aware multipliers will be enforced mistreatment similar design and therefore the distinction between the 2 bypassing multipliers lies within the input signals of the AHL. According to the bypassing choice within the column or row-bypassing multiplier, the input signal of the AHL in the architecture with the column- bypassing multiplier is the multiplicand, whereas that of the row-bypassing multiplier is the multiplicator. 4. RAZOR FLIPFLOP Fig -6:Razor Flipflop Fig 6. is Razor flip-flops which are often accustomed sight whether or not temporal order violations occur before consecutive input pattern arrives. A 1-bit Razor flip-flop contains a main flip-flop, shadow latch, XOR gate, and mux. The main flip-flop catches the execution result for the mix circuit employing a traditional clock signal, and also the shadow latch catches the execution result employing a delayed clock signal, which is slower than the normal clock signal. If the barred little bit of the shadow latch is completely different from that of the most flip-flop, this suggests the trail delay of this operation exceeds the cycle amount, and the main flip-flop catches an incorrect result. If errors occur, the Razor flip-flop will set the error signal to 1 to notify the system tore execute the operation and notify the AHL circuit that an error hasoccurred.WeuseRazorflip- flops to sight whether or not Associate in Nursing operation that's thought of to be a one-cycle patternwill extremely end in a very cycle. If not, the operation is re-executed withtwocycles.Although the re execution may seem costly, the overall cost is low because the re execution frequency is low. The AHLcircuitis the key component in the aging-ware variable-latency multiplier. The AHL circuit contains an aging indicator, two judging blocks, one mux, and one D flip-flop. The aging indicator indicates whether or not the circuit has suffered vital performance degradation because oftheaging result. The aging indicator is enforced in a very straight forward counter that counts the {amount the quantity} of errors over a precise amount of operations and is reset to zero at the tip of those operations. If the cycle amount is just too short, the column- or row- bypassing multiplier factor isn't ready to complete these operations with success, inflictingtemporal orderviolations. These temporal order violations are going to be caught by the Razor flip-flops, that generate error signals. If errors happen often and exceed a predefined threshold, it means the circuit has suffered significanttimingdegradation due to the aging effect, and the aging indicator will output signal 1; otherwise, it'll output zero to point the aging result remains not vital, and no actions square measure required. The first decision making block within the AHL circuit can output 1if the quantity of zeros within the number (multiplicator for the row-bypassing multiplier) is larger than n (n is a positive number, which will be discussed in Section IV), and these Cond judging block in the AHL circuit will output 1 if the number of zeros in the multiplicand (multiplicator) is larger than n + 1. They are each utilized to make your mind up whether or not an input pattern needs one or 2 cycles, however only 1 of them are chosen at a time. In the starting, the aging result isn't important, and the aging indicator produces 0, so the first judging block is used. After a amount of your time once the aging result becomes important, the second decision making block is chosen. Compared with theprimarydecision making block, the second decision making block permits a smaller range of patterns to become one-cycle patterns as a result of it needs a lot of zeros within the number (multiplicator).
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3487 5. ADAPTIVE HOLD LOGIC Fig-7:Adaptive Hold Logic Fig 7 is an Adaptive hold logic when an input patternarrives, both judging blocks will decide whetherthepatternrequires one cycle or two cycles to complete and pass both results to the multiplexer. The multiplexer selects one of either result supported the output of the aging indicator. Then associate degree OR operation is performed between the results of the electronic device, and the Q signal is used to determine the input of the D flip-flop. When the pattern needs one cycle, the output of the multiplexer is 1. The !(gating) signal will become 1, and the and the input flip flops will latch new data in the next cycle. On the other hand, when the output of the multiplexer is 0, which means the input pattern requires two cycles to complete, the OR gate will output 0 to the D flip-flop. Therefore, the !(gating) signal are zero to disable the clock signal of the input flip-flops within the next cycle. Note that solely a cycle of the input flip-flop are disabledasa result of the D flip-flop can latch one within the next cycle. The overall flow of our planned design is as follows: once input patterns arrive, the column- or row-bypassing multiplier, and the AHL circuit execute simultaneously. According to the number of zeros in the multiplicand (multiplicator), the AHL circuit decides if the input patterns require one or two cycles. If the input pattern needs 2 cycles to complete, the AHL will output 0 to disable the clock signal of the flip-flops. Otherwise, the AHL can output one for traditional operations. When the column- or row-bypassing number finishes the operation, the result are passed to the Razor flip-flops. The Razor flip-flops check whether or not there's the trail delay temporal arrangement violation. If temporal arrangement violationsoccur,itsuggeststhat the cycle amount isn't long enough for the present operation to complete which the execution results of themultiplierfactor is wrong. Thus, the Razor flip-flops can output a slip to tell the system that the present operation must be re dead exploitation 2 cycles to make sure the operation is correct. In this situation, the extra re execution cycles caused by timing violation incurs a penalty to overall average latency. However, our planned AHL circuit will accurately predict whether or not the input patterns need one or 2 cycles in most cases. Only many input patterns might cause a temporal arrangement variation once the AHLcircuitjudges incorrectly. In this case, the extra re execution cycles did not produce significant timing degradation. In summary, our planned multiplier factor style has 3 key options. First, it is a variable-latencydesignthatminimizesthetiming waste of the noncritical paths. Second, it will give reliable operations even when the aging result happens. The Razor flip-flops discover the temporal arrangement violations and re execute the operations exploitation 2 cycles. Finally, our design will regulate the share of one-cycle patterns to attenuateperformancedegradationthanksto the aging result. When the circuit is aged, and many errors occur, the AHL circuit uses the second judging block to decide if an input is one cycle or two cycles. 6. MONTGOMERY ALGORITHM Montgomery multiplication could be a methodology for computing ab mod m for positive integers a, b, and m. 1.It reduces execution time on a pc once there are an outsized range of multiplications to be through with constant modulus m, and with a tiny low range of multipliers. In specific, it's helpful for computing Associate in Nursing mod m for an outsized worth of n. The number of multiplications modulo m in such a computation is reduced to variety considerably but n by in turn squaring and multiplying in line with the pattern of the bits within the binary expression for n (“binary decomposition”). But it will still be an outsized enough range to be worthy rushing up if potential. The difficulty is within the reductions modulo m, which are, primarily, division operations, which are costly in execution time. If one defers the modulus operation to thetop,thenthe product can grow to terribly massive numbers, which slows down the multiplications and also the final modulus operation. To use Montgomery multiplication, we tend to should have the multipliers a and b but the modulus m. We introduce another whole number r that should be larger than m, and that we should have gcd(r, m) = 1. The method, primarily, changes the reductionmodulomto a discount modulo r. sometimes r is chosen to be Associate in
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3488 Nursing integral power of two, therefore the reduction modulo r is just a masking operation; that's, retentive the lg(r) low-order bits of Associate in Nursing intermediate result, and discarding higher order bits. If r couldbea power of two, we have a tendency to should have m odd, to satisfy the gcd demand. (Any odd worth from three to r 1 is suitable.) The method: 1. realize 2 integers 1 r and m such one. one    rr  millimetre this could be done by the extended gcd algorithmic program. there's a binary extended gcd algorithmic program that will no divisions, and that simplifies considerably once one argument (r) could be a power of two and also the different (m) is odd. This simplified version of the algorithmic program isgivenbelow (C perform xbinGCD). 2. rework the multipliers to “Montgomery space” by multiplying them by r (a shift left operation if r could be a power of 2) and reducing the merchandise modulo m. That is, mod. mod , and b br m a ar m   These area unit pricey operations, however they're done just one occasion per multiplier factor, and that they aren't done on the intermediate product of a sequence of multiplications. 3. Perform the Montgomery multiplication step. This operates on the remodeled quantities a and b, giving the merchandise of a and b in Montgomery area. That is, the result's abr mod m. The multiplication tm isn't too pricey as a result of the mod r implies that solely the low-order lg(r) bits of the merchandise want be created. If the calculations area unit performed to some mounted length w bits, with 2, w r  then the opposite 2 multiplications area unit of the shape w  w  2w bits and also the addition is of the shape 2w + 2w  2w + one bits (it will overflow). once division by r (a shift), u is of length w + one bits. 4. Do the inverse transformation to convert the result to an ordinary integer: mod . ab  city 1 m allow us to currently derive step three on top of. We would like to reason u = abr mod m. A 64-bit Implementation. Here we have a tendency to take an in depth consider AN implementation of Montgomery multiplication for arguments uptothecomputer’swordsize. For corporeality we have a tendency to take it to be sixty four bits. The modulus m can be as large as 2 1, 64  and a and b can be as large as m 1. We take 2. 64 r  this can be a 65-bit variety, however it are often handled while not nice issue. Step 1: The GCD Operation Below could be a C operate for the binary extended gcd operation, simplified for the case within which its initial argument a could be a power of two and the second argument b is odd. It is a simplification ofthe rule obtainable on the net. Step 2: Transform the Multipliers we have a tendency to should reason a  ar mod m, and equally for b. Because 2 , sixty four r  there's no multiplication to try to. We should kind a 128-bit whole number that consists of a followed by sixty four 0-bits, and compute the remainder of division of that quantity by m. Some machines have an instruction for that. For different machines, the C operate shown below could also be used. This is the “hardware division”algorithm of Hacker’s Delight. Invoke it as follows, wherever the primary 2 arguments represent ar. All variables are 64-bit unsigned integers. abar = modul64(a, 0, m) Step 3: Montgomery MultiplicationThisstepdealswith128- bit integers, however no quite that. The computationt abis multiplying 2 64-bit unsigned integers, giving a 128-bit product. Some machines have an instruction for that. For alternative machines, the C operate below could also be used. Next, the subsequent expression should be evaluated: u  (t  (tmmod r)m)/r. Variable t could be a 128-bit unsigned number, and m could be a 64-bit unsigned number. Because of the “mod r,” only the low-order 64 bits of the product tm is needed. This means that the high-order half t is neglected, and sixty four  sixty four  64-bit multiplication is used. The subsequent multiplication by m should be sixty four  sixty four  128-bit multiplication. The addition of t should be 128 + 128  129-bit addition. This can be through with 128 + 128  128-bit addition and one by one computing the carry, as shown within the code below (variable ov). This sum always ends in 64 0-bits, so the low-order part of the sum is computed only to produce a carry bit. Incidentally, if the low-order halves of the summands were better-known to be each nonzero, then the carry would be one, leading to a simplification. However, the summand summands are often zero if either a or b is zero. finally (for step 3), we tend to should perform the computation: if (u  m) then come u  m; else come u. Variable u could be a 65-bit number, in effect, as a result of the overflow mentioned on top of. however the ultimate results of the calculation could be a 64-bit number. If the addition of t overflowed, then actually u  m. Otherwise, u and m could also be compared as 64- bit integers. The subtraction are often a 64-bit operation, as a result of it's glorious that when the subtraction, the sixty fifth little bit of the distinction are zero. A C operate forthese computations follows. Next, the subsequent expression should be evaluated: u  (t  (tmmod r)m)/r. Variable t could be a 128-bit unsigned number, and m could be a 64-bit unsigned number. as a result of the “mod r,” solely the low-order sixty four bits of the merchandise tm is required. This implies that the high- order half t are often neglected, and sixty four  sixty four 
  • 7. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 03 | Mar 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3489 64-bit multiplication are often used. The next multiplication by m should be sixty four  sixty four  128-bit multiplication. The addition of t should be 128 + 128129- bit addition. This may be through with 128 + 128  128-bit addition and severally computing the carry,asshownwithin the code below (variable ov). This add perpetually ends in sixty four 0-bits, therefore the low-order a part of the add is computed solely to supply a carry bit.Incidentally,ifthelow- order halves of the summands were glorious to be each nonzero, then the carry would be one, leading to a simplification. However, the summands are often zero if either a or b is zero. finally (for step 3), we tend to should perform the computation: if (u  m) then come u  m; else come u. Variable u could be a 65-bit number, in effect, as a result of the overflow mentioned on top of. However the ultimate results of the calculation could be a 64-bit number. If the addition of t overflowed, then actually u  m. Otherwise, u and m could also be compared as 64- bit integers. The subtraction are often a 64-bit operation, as a result of it's glorious that when the subtraction, the sixty fifth little bit of the distinction are zero. AC operate for these computation follows. Step4: The Inverse Transformation we tend to should reason , mod metropolis 1 m that is that the product of a and b modulo m as normal integers. All variables area unit 64-bit unsigned integers. The multiplication should be done mistreatment sixty four  sixty four  128-bit multiplication, and also the modulo operation should be done mistreatment 128 / sixty four  64-bit division (actually remaindering). 64-bit division (actually remaindering). 7. CONCLUSION This paper proposed an efficient multiplier design with AHL using Montgomery multiplication algorithm. The multiplier is able to adjust the AHL to mitigate the performance degradation because variable latency multipliers have less timing waste, but traditional multipliersneedtoconsider the degradation caused by both BTI effect and electro migration and use the worst case delay as the cyclic period. In this purposed architecture we have shown that, AHL with Montgomery Multiplication Algorithm will decrease the delay and improves the performance compared with previous design. REFERENCES 1. SaiLakshmy, et.al,” Performance Analysis of Aging- Aware Multiplier Using Various Adders”, International Conference on Communication and Signal Processing, April 6-8, 2016, India 2. P.KamilaParveen,et.al.”Multiplier Design using MTCMOS with Adaptive Hold Logic” 2016 International Conference on Advanced Communication Control and Computing Technologies (ICACCCT). 3. Y. Cao. (2016). Predictive Technology Model (PTM) and NBTI Model [Online]. Available: http://www.eas.asu.edu/∼ptm 4. S. Zafaret al., “A comparative study of NBTI and PBTI (charge trapping) in SiO2/HfO2 stacks with FUSI, TiN, Re gates,” in Proc. IEEE Symp. VLSI Technol. Dig. Tech. Papers, 2016, pp. 23–25