SlideShare a Scribd company logo
implemented on FPGA of SHA-224/256 core.
Ms.Shreshtha Mishrs (Garg)
M Tech (Embedded System and VLSI Design)
Prof. Rishi Jha
Deptt of Electronics and Communication
Abstract- Cryptography plays an important role in the
security of data. Even though the data is encrypted it can be
altered while transmitting on the network so data should be
verified using a digital signature. Hashing algorithms are used
to create these digital signatures for verification of the data
received. Hashing algorithm like Secure Hash Algorithm-2
(SHA-2(224/256)) is designed which has a fixed output length
of 512-bits. Then to improve on power a low-power technique
such as latch based clock gating technique is used.
Introduction- In this research paper several hardware
optimization techniques for the SHA-2 hashing functions were
explored. A new architecture i.e., Round Pipelined Technique
was proposed for the SHA-2 core, which eliminates the data
dependency between iteration using data forwarding to
improve the throughput per area. The fully iterative and
Round Pipelined Techniques were investigated and developed
using HDL. A comparison with other published results depicts
57% improvement in throughput per slice for SHA-256 and
17% improvement in throughput per slice for SHA-512.
Implementation results indicate that the Round Pipelined
technique can help to achieve good tradeoff between
throughput and area. As future work, implementations of the
SHA-2 core may be attempted by adopting various other
design techniques like a design optimized for area using better
resource sharing or loop unrolling techniques. Another
optimization effort could be to increase the depth of the
pipeline stages to increase the throughput.
The aim of the project is to implement VHDL of the Hash
Algorithm version 2 (SHA-2) hashing cryogenic algorithm,
which will only provide partial and useful information to
understand my work.
From an arbitrary length message (<264), the algorithm
generates a fixed length hash, equal to the number present in
the name of the four variants of the SHA-224, SHA-256,
SHA-384, SHA-512 algorithm.
A preprocessing (or padding) phase is provided in which the
message is bit 1 and so many bits 0 to its length to a value
divided by 512 for the rest 448 (i.e., 512 to 64) and then
accodated The length of the original message (64 bits).
The message is divided into 512-bit blocks and the algorithm
is applied sequentially to cascade on the blocks, using the
result obtained from the previous block as the input data for
the next calculation. Each block passes through three phases:
data expansion, compression cycle, and hash processing (or
intermediate digestion).
The most critical step for implementation is compression that
consists of a 64-step loop (80 for SHA-384/512) that is
dependent on each other and therefore not parallelizable. Even
expansion in the official drafting of the algorithm is a loop,
however, it can be carried out without loosing clock cycles
while loading the block through a buffer and a pointer over it.
We performed other modifications and optimizations
(rescheduling, pipelines between the stages of the different
blocks, redundancies) to reduce the total delay, some of the
many documents available on the net, the main ones in the
bibliography (2) (3), others introduced by Tailored to my
design. I have examined and discarded many other
possibilities (unrolling, quasi-pipeline, special additions)
because they are not suitable for our architecture.
Cryptographic Hash Function Designs (SHA-2 )
Three new revised versions of SHA were added into SHA
family by NIST in August 2002 as FIPS 180-2, known to be
SHA-256, SHA-384, and SHA-512 with the respective hash
value lengths of 256, 384, and 512 bits. Later in 2008, FIP
PUB 180-3 was issued as a revised document, which added
SHA-224 [RFC3874] into the family. Collectively, these
algorithms are recognized as SHA2.
Fig. One Iteration of Compression Function of SHA-2
Family
SHA256 Algorithm
The SHA256 hashing calculation task can be advantageously
separated into three unmistakable activities. They are as per
the following:
• Pre-handling: Operation that performs cushioning
rationale and parses the info message
• Message scheduler: Function that creates sixty-four
words from a 16 word input message square
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2) Using VHDL
---------------------------------------------------------------------***----------------------------------------------------------------------
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1713
• Compression work: Function that completes the
genuine hashing activity of the message-subordinate
word that leaves the message scheduler in each round
SHA256 Pre-preparing
The SHA256 pre-preparing is the underlying advance that
should be performed before the message planning and the
pressure capacity can be connected. The pre-preparing stage
plays out the accompanying three undertakings all together:
• Pad the message to make it a various of 512 bits,
• Parse this message into 512 piece squares, and
• Set the underlying hash esteem
Padding the Message
The message to be hashed should be cushioned first.
Cushioning is done as such as to guarantee that the message to
be hashed is a numerous of the square size for SHA256 i.e.
512 bits. Presently, in the event that we consider that the
length of the message is l bits, the cushioning rationale is with
the end goal that it attaches a bit "1" toward the finish of the
real message which is then trailed by k number of zero bits.
Here, k is the littlest, non negative answer for the
accompanying condition [26]:
l + 1 + k = 448 mod 512
Condition 5: SHA256 Padding Logic
The condition above is such in light of the fact that SHA256
permits an info message to have a length of up to 264 bits.
After the trail of 0 bits, a 64 bit square is attached toward the
end that is equivalent to l spoke to in a double portrayal.
Parsing the Padded Message
After the message has been cushioned utilizing the rationale
clarified above, it is parsed into N 512-piece squares with the
goal that the message planning and hash calculation can be
initiated.
Setting the Initial Hash Value (H0)
Prior to the hash calculation begins, the underlying hash
esteem is set which comprises of the accompanying 32 bit
words:
H00 H01 H02 H03 H04 H05 H06 H07
0x6a09e66
7
0xbb67ae8
5
0x3c6ef37
2
0xa54ff53
a
0x510e52
7f
0x9b0568
8c
0x1f83d9a
b
0x5be0cd1
9
Table 1: SHA256 Initialisation Vector. Source: [26]
It is interesting to know the origin of this 8 word value. They
were obtained by taking the first 32 bits of the fractional parts
of the square roots of the first 8 prime numbers. This initial
hash value acts as the IV for the SHA256 algorithm as
explained in the earlier section.
SHA256 Message Scheduler
After the pre-processing stage is completed, the message
schedule block takes the first 512 bit message block and
outputs the message dependant words Wt. The 32 bit message-
dependant words that that are output by the message scheduler
for every round are labelled as W0, W1,…, W63 (for t=0 to
63) and they are calculated as follows:
For 0 ≤ t ≤ 15,
Wt = Mt
For 16 ≤ t ≤ 63,
Wt = 1(Wt-2) + Wt-7 + 0(Wt-5) + Wt-16
Equation 6: SHA256 Message Scheduler. Source: [26]
Here, σ 0 and σ 1 are two logical functions specific to the
SHA256 message scheduler that operate on a 32 bit word. The
details of these functions are provided below:
0(x) = ROTR7(x) ⊕ ROTR18(x) ⊕ SHR3(x)
1(x) = ROTR17(x) ⊕ ROTR19(x) ⊕ SHR10(x)
Equation 7: Logical Functions σ0 and σ1. Source: [26]
The two consistent capacities 0 and 1 work on an expression
of the information message and apply the above bitwise tasks
to it. ROTRx remains for bitwise turn appropriate for x bits,
SHRx remains for bitwise move right and ⊕ remains for the
bitwise selective or. This message plan square is normally
executed in equipment by utilizing 16 phases of 32 bit move
registers and three 32 bit adders [33] for the 512 piece
information square handling.
Each round, the 32 bit estimation of Wt is moved to one side
utilizing the move enrolls as past estimations of Wt are
required to ascertain future estimations of Wt. It is engaging
realize that two sensible capacities σ0, σ1 and the message
plan rationale clarified don't become an integral factor until
the point that the seventeenth round. The 512 piece input
message is nourished as it is to the message pressure work for
the initial 16 rounds.
SHA256 Message Compression Function
The message pressure work plays out the genuine hashing
activity and is the principle task that authorizes the restricted
property of SHA256. Other than the eight expressions of
working factors A, B, C, D, E, F, G and H that are utilized and
refreshed in each cycle, two transitory words T1 and T2 are
additionally utilized by the message pressure work for
calculation of the factors An and E in each round. The initial
step that the message pressure work performs is that it
initialises these 8 working factors with the IV (in the event
that it is the principal square) or with the halfway hash of the
past square (on the off chance that it isn't the primary square
being hashed).
Figure 3: SHA256 Message Compression Function (Above)
and Message Scheduler (Below)
The figure above demonstrates the common execution of the
message pressure work and the message scheduler that work
couple. It can be obviously observed from the figure over that
at each cycle, 6 out of 8 estimations of A, B, C, and E, F, G
are moved by one position to B, C, D and F, G, H separately.
Wt is the 32 bit information figured by the message scheduler
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1714
and bolstered to the pressure capacity and Kt is a round
particular 32 bit steady whose round particular qualities are
determined in addendum A. SHA256 utilizes 64 constants (1
for each round) that are 32 bit words and they have been
gotten by taking the initial 32 bits of the partial parts of the
block foundations of the initial 64 prime numbers. Factors An
and E are reliant on all information esteems and are figured in
each round utilizing conditions clarified straightaway. After
the 8 working factors are initialised as clarified before, 64
rounds of the pressure work are connected to them and middle
of the road round estimations of these factors are ascertained
as takes after:
T1 = H + Σ1(E) + Ch(E, F, G) + Kt + Wt
T2 = Σ0(A) + Maj(A, B, C)
H = G; G = F; F = E
E = D + T1 = D + H + Σ1(E) + Ch(E, F, G) + Kt + Wt
D = C; C = B; B = A
A = T1 + T2 = H + Σ1(E) + Ch(E, F, G) + Σ0(A) + Maj(A, B,
C) + Kt + Wt
Equation 8: Message Compression Function. Source: [26]
The four logical functions mentioned above perform the core
operation of introducing the confusion and diffusion in Wt that
enters in 32 bit words at each round. After applying the above
equations to the working variables for 64 rounds, an
appropriate level of the avalanche effect is observed. These 4
logical functions are now explained next.
Ch(X, Y, Z) = (X ∧ Y)⊕(¬X ∧ Z)
Maj(X, Y, Z)= (X ∧ Y)⊕(X ∧ Z)⊕(Y ∧ Z)
Σ0(X) = ROTR2(X) ⊕ ROTR13(X) ⊕ ROTR22(X)
Σ1(X) = ROTR6(X) ⊕ ROTR11(X) ⊕ ROTR25(X)
Equation 9: Logical Functions Ch, Maj, Σ0 and Σ1.
Here, logical functions Ch and Maj take 3 words as input and
produce a single word output. ∧ stands for a 32 bit Bitwise
AND operation while ¬ is the compliment operation. The Ch
function always takes the working variables E, F and G as
inputs while the Maj function always takes A, B and C as
inputs. Variables A and E are the ones that need to be
computed at each round. Functions Σ0 and Σ1 always take
variables A and E as their input. We canthus see some sort of
symmetry in the message compression function that divides it
into two parts. This symmetry is evident from figure 3.
Figure 4: SHA256 Compression Function Along with the
Final Additions.
The figure above represents a different look at the
compression function but it conveys the same message. The
point to take away from this figure is that after the
compression function has been applied 64 times i.e. after the
64 rounds have been completed, the values contained in the
working variables A to H are finally added to the 8 word data
block that was fed to the compression function at the
beginning. This value could either be the constant IV for
SHA256 or an intermediate message digest. This is because of
the fact that the SHA256 algorithm follows the Davies-Meyer
construction where the input is added to the output at the end.
Now, the intermediate/final hash is given by the following
equation:
H0
i+1
= A + H0
1
H1
i+1
= B + H1
1
H2
i+1
= C + H2
1
H3
i+1
= D + H3
1
H4
i+1
= E + H4
1
H5
i+1
= F + H5
1
H0
i+1
= G + H6
1
H0 i+1 = H + H71
Equation 10: Calculation of Intermediate/Final Hash Value
After all the message blocks including the final Nth message
block has been processed in this manner, the final hash i.e. the
256 bit message digest of the message is represented in the
following manner:
SHA256(M) = H0
N
|| H0
N
|| H0
N
|| H0
N
|| H0
N
|| H0
N
|| H0
N
H0
N
Equation 11: Resulting SHA256 Message Digest. Source: [26]
This 8 word information square (H0 - H7i) is the default
steady SHA256 initialisation vector (IV) if the message was
not exactly or equivalent to 512 bits (counting the cushioning).
In the event that the length of the message (counting
cushioning) is more noteworthy than 512 bits, at that point this
8 word information square is the middle hash figured of the
past 512 piece square. This game plan where the halfway hash
estimation of the past square is nourished as IV to the hash
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1715
calculation of the following square is known as the Merkle-
Damgård development. SHA256 depends on this development
called the Merkle-Damgård Paradigm and is worked to be
impact safe as the hidden SHA256 pressure work is crash safe.
Module Architecture of Proposed Design
Sg_sha256
This is the sg_sha256 motor best level.
• The sg_sha256 is a stream hash motor, i.e., the
information words are hashed as a surge of words read
from an info transport, with control contributions for
BEGIN/END of the message information stream. The
information transport is a 32bit word transport, with a
byte path selector to signalize what number of bytes are
legitimate in the last word.
• The center is a basic coordination of the rationale
obstructs for the SHA256 motor, with the inner
information way and control-way wires.-
• Written in synthesizable VHDL, the hash motor is a
low asset, zone productive execution of the SHA256
hash calculation.
• The information input port is sorted out as a 32bit word
compose enroll, with stream control and start/end
signals.
• The 256bit outcome enlist is sorted out as 8 x 32bit
registers that can be perused at the same time.
• This execution is a traditionalist usage of the affirmed
FIPS-180-4 calculation, with a reasonable bargain of
assets, containing just 32 registers of 32bit words for
the hash motor, with a solitary cycle combinational
rationale for every calculation step.
• The SG_SHA256 is a fundamental cryptographic
square, utilized by all encryption and advanced mark
plans.
Sg_sha_control
This is the control route method of reasoning for the
sg_sha256 brisk engine.
• It is a totally synchronous diagram, with all signs
synchronous to the rising edge of the system clock.
• The sequencer state machine controls the hash data
way modules, making addresses for the coefficients
ROM, stack/enable signs for the
• message design, hash focus and yield registers circuit
squares, and control signals for the data padding
reason.
Sg_sha_hash_core
This is the 256bit single-cycle hash center preparing rationale
for each of the 64 square advances.
• The combinational profundity of this square is 8 layers
of rationale and adders.
Sg_sha_Ki_rom
Starting qualities for the hash result registers.
• This module is demonstrated as a settled esteem work.
• It can be actualized as a nearby steady settled esteem.
Sg_sha_kt_rom
This is the 64 words coefficients rom for the square hash
center.
• It is displayed as an offbeat addressable ROM memory.
• Depending on the manufacture procedure and
innovation, this memory can be executed as an OTP, a
MUX, a settled LUT or a combinational capacity.
Sg_sha_msg_sch
This is the message scheduler information way for the sha256
processor.
Sg_sha_padding
This is the byte cushioning datapath rationale for the sha256
processor.
• The cushioning of the last square is controlled by the
byte path selectors and the last words selectors.
• These control signals are produced at the Control Logic
square of the SHA256 processor.
• A consistency check blunder flag is created, to hail
illicit control conditions.
• This square is a completely combinational circuit, with
2 layers of rationale.
Sg_sha_regs
The regs square has the yield result registers for the SHA256
processor.
• It is a solitary cycle 256bit Accumulator for the square
hash comes about, and can be actualized as a 32bit
MUX and a 32bit convey chain for each enroll.
The Hardware Implementations and Optimisations of
SHA256
The only practical way of a high speed SHA256 engine is to
implement it in hardware be it either FPFAs or the recent
technology of ASICs. Software implementations of Bitcoin
mining used in CPU or GPU mining have become obsolete as
they simply cannot compete with the hashing power of
hardware implementations. These hardware implementations
truly serve the meaning of a fast implementation and various
hardware optimisations have been proposed over the years in
order to increase their throughput and to reduce their power
consumption. These optimisations, however, are aimed at the
hardware implementation of the SHA256 hashing algorithm in
general rather than SHA256 employed for Bitcoin mining.
Most of these optimisations are aimed towards the longest data
path or the critical path in the SHA256 core which is the
calculation of working variable A in the message compression
function that involves mod 232 additions of 7 operands (see
equation 8 in section 3.2.3). We shall now have a look at the
various SHA256 hardware speedup proposals made.
SHA256 Hardware Optimisations
Many hardware implementations have been seen in the
literature that are either FPGA [15] [16] [25] [30] [39] or
ASIC designs [21] [22] [33] [37] [47]. These implementations
designs contain one or a combination of the following
optimisations so as to speed up the calculations i.e. the
throughput of the SHA256 core. The main design difference
for the hardware implementation of SHA256 lies in the trade-
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1716
off between throughput and the area complexity which is
measured in Gate Equivalents (GE). But, in our cases of
Bitcoin mining, we typically have no area/space constrains
and thus we shall concentrate on the throughput optimisations
only. More the area, more is the throughput and lesser is the
number of required clock cycles to perform the SHA256
computation.
Use of Carry-Save Adders (CSAs)
As mentioned before, the calculation of the working variable
A for each round of the compression function forms the
longest data path or the critical path in the SHA256 core. This
involves mod 232 additions on 7 operands (Kt, Wt, H, Σ1(E),
Ch(E, F, G), Σ0(A), Maj(A, B, C)). Architectures [21] [22]
[35] [36] [39] employing Carry Save Adders (CSAs) minimise
the delay caused by the carry propagation time by separating
the sum and the carry paths. CSAs accept 3 operands as inputs
and so, the working variable A can be computed using just 5
CSAs [22]. Having said that, CSAs require another 2-input
adder for the recombination of the sum and carry paths. This 2
operand addition can either be performed by using CLAs i.e.
Carry Look Ahead adders or by using CPAs i.e. Carry
Propagation Adders. The net result of using CSAs for the
critical path is that they reduce the carry propagation delay
caused as compared to traditional CPAs used on the critical
path.
Unrolling
Unrolled architectures [20] [35] [36] [39] reduce the number
of clock cycles required to perform the SHA256 hash
computation by implementing multiple rounds of the SHA256
compression function using combinational logic. These
architectures help improve the throughput by optimising the
data dependencies involved in the message compression
function. Say if the SHA256 core was unrolled once, then this
would effectively mean that the hash should be calculated in
half the number of clock cycles. As a trade-off, unrolling the
SHA256 core architecture comes at the cost of a decrease in
the clock frequency and an increase in the area complexity.
(Quasi-) Pipelining
The goal of quasi-pipelining is to optimise the critical path and
therefore increase the clock frequency. Quasi-pipelined
SHA256 architectures [21] [22] [36] [39] use registers to
break the long path or the critical path of the computation of
the working variable A in the message compression function.
Thus, such quasi-pipelines architectures allow higher data
throughputs and higher frequencies of hash calculations by
achieving very short critical paths. Pipelining is not as easy to
achieve as it sounds due to the feedback associated due to the
way in which the SHA256 compression function is designed.
As a result, an external control circuitry is required such that
the registers are enabled correctly.
Delay Balancing
Dadda et al. [22] have been the pioneers in hardware
optimisations of SHA256 and they have also spent their
research efforts on delay balancing along with the use of
CSAs. Just as described earlier, a CLA is used to combine the
sum and the carry paths output by the CSA but these sum and
carry paths are first registered so that the CLA adder is
removed from the critical path. This increases the throughput
but this architecture requires additional control circuitry for
the additional register introduced in the architecture.
Addition of Kt and Wt
Looking at figure 3 and figure 4, we can see that the addition
of Kt and Wt can be performed independent of the message
compression function. The architecture proposed in [52] uses
this as an improvement by moving Kt + Wt to the message
scheduler stage. This can be done because both Wt and Kt are
available before and are independent of the other operands
(see equation 8 in section 3.2.3). However, it is seen that
quasi-pipelining architecture proposed by Dadda et. al. [21]
[22] [36] performs a similar separation of the operands and the
resulting critical path is even shorter than in [52].
Operation Rescheduling
Architectures that employ operational rescheduling allow an
efficient use of a pipelined structure without increasing the
area complexity. This in turn allows higher throughputs. [15]
[16] have claimed that they were able to reduce the critical
path in a similar manner as unrolling techniques and gain a
higher throughput without adding more area complexity.
Module Architecture of Proposed Design
SHA-2 hardware is architected by using iterative and round
pipeline basis where iterative SHA224/256 takes 64 clock
cycles and pipelined operation takes 68 clock cycles.
SHA384/512 takes 80 clock cycles for iterative operation and
84 clock cycles for pipelined operation to calculate the final
hash value.
The aim is to implement the designed hash function core on
VHDL. The whole package and separate modules were
synthesized and analyzed using Xilinx ISE 12.1 tool for the
targeted Virtex-VI FPGA.
The VHDL implementation was divided into five modules:
• Initial module: - It collects the serial input bits and
sends 512 bit blocks to the next module.
• Round module: - It performs the hashing calculations
and operations on the input message block and
previous hash output to generate a new hash value.
• Last Block module: - At the end of the message bit
stream the final message block of 512 bits has to be
prepared by adding 64 bits of message length at the end
of 448 bits of input message block, padded accordingly
to suffice the word size requirement. This final
message block does this function of preparing the last
message block.
• Final module: - This module computes the hash value
by adding the previous hash value to the new hash
value achieved from the Round module. Then it sends
the 256 bit hash value, bit by bit (serially).
• Top module: - This module is the control unit for
controlling the functioning of the rest of the modules
and to ensure that the SHA-2 algorithm flow is
followed and maintained
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1717
Complete Top Level Logic Design of SHA-256
Sg_sha256
This is the sg_sha256 engine top level.
• The sg_sha256 is a stream hash engine, i.e., the data
words are hashed as a stream of words read from an
input bus, with control inputs for BEGIN/END of the
message data stream. The input bus is a 32bit word bus,
with a byte lane selector to signalize how many bytes
are valid in the last word.
• The core is a structural integration of the logic blocks
for the SHA256 engine, with the internal data-path and
control-path wires.-
• Written in synthesizable VHDL, the hash engine is a
low resource, area-efficient implementation of the
SHA256 hash algorithm.
• The data input port is organized as a 32bit word write
register, with flow control and begin/end signals.
• The 256bit result register is organized as 8 x 32bit
registers that can be read simultaneously.
• This implementation is a conservative implementation
of the approved FIPS-180-4 algorithm, with a fair
compromise of resources, comprising of only 32
registers of 32bit words for the hash engine, with a
single-cycle combinational logic for each algorithm
step.
• The SG_SHA256 is a basic cryptographic block, used
by almost all encryption and digital signature schemes.
Sg_sha_control
This is the control path logic for the sg_sha256 fast engine.
• It is a fully synchronous design, with all signals
synchronous to the rising edge of the system clock.
• The sequencer state machine controls the hash data
path modules, generating addresses for the coefficients
ROM, load/enable signals for the
• message schedule, hash core and output registers
circuit blocks, and control signals for the input padding
logic.
control path logic
Sg_sha_hash_core
This is the 256bit single-cycle hash core processing logic for
each of the 64 block steps.
• The combinational depth of this block is 8 layers of
logic and adders.
256bit single-cycle hash core processing logic
Sg_sha_Ki_rom
Initial values for the hash result registers.
• This module is modelled as a fixed value function.
• It can be implemented as a local constant fixed value.
Sg_sha_kt_rom
This is the 64 words coefficients rom for the block hash core.
• It is modelled as an asynchronous addressable ROM
memory.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1718
• Depending on the fabrication process and technology,
this memory can be implemented as a OTP, a MUX, a
fixed LUT or a combinational function.
Sg_sha_msg_sch
This is the message scheduler data path for the sha256
processor.
Sg_sha_padding
This is the byte padding datapath logic for the sha256
processor.
• The padding of the last block is controlled by the byte lane
selectors and the last words selectors.
• These control signals are generated at the Control Logic block
of the SHA256 processor.
• A consistency check error signal is generated, to flag illegal
control conditions.
• This block is a fully combinational circuit, with 2 layers of
logic.
byte padding datapath logic
Sg_sha_regs
The regs block has the output result registers for the SHA256
processor.
• It is a single-cycle 256bit Accumulator for the block
hash results, and can be implemented as a 32bit MUX
and a 32bit carry chain for each register.
Registers for the SHA256
Conclusion
The future cryptographic hash standard SHA-2 should
be sensible and versatile for a broad assortment of
usages, featuring meanwhile a perfect security quality.
In this work, we showed an aggregate hardware
depiction of the SHA-2. A round rescheduling
technique and an exceptional reason memory design
are moreover proposed. Post-mix comes to fruition , a
low-control littler utilization of SHA-2 has been
Implemented. I assume that a similar approach for
littler VLSI use of cryptographic traditions is a
productive choice to reduce the area and power use of
the organized circuit.
The foremost purpose of this proposition was to
propose improvements in the SHA256 hashing
estimation. This point was roused by the manner in
which that various hardware based progressions in
SHA256 gear executions have quite recently been
suggested anyway they have been away for the
SHA256 hashing estimation when all is said in done.
In perspective of that, the fundamental responsibility
made in this suggestion has been the SHA256 figuring
headway proposition that are specific to decreasing
hardware. The hypothesis in like manner made an
undertaking to deal with establishment information and
moreover the related information which would be
required with a particular true objective to totally
acknowledge what was being prescribed. A talk has
moreover been impacted concerning the precision of
the Savings To factor and the necessity for executing
and taking a gander at as performed by off-the-rack
SHA256 and the propelled version of SHA256 for a
more exact assessment of this Factor. The prerequisite
for an essential examination of the figuring
improvements' closeness with existing gear headways
has furthermore been discussed. It is assumed that the
prescribed improvements will accomplish radical
throughput overhauls in devices.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1719
The extensive variety of achieved displays makes prepared for
the utilization of the SHA-2 ability to various hardware use.
Shutting remarks are,
• 'Lightweight' is the rising star of cryptography. The
term 'lightweight' alone covers a wide range, for
instance, lightweight to the extent zone, speed, control
usage, essentialness use, or a blend of these, dependent
upon the specific application.
• This ask about solely centers around the lightweight for
zone, which in like manner realizes lightweight for
typical power usage in numerous applications.
• The usage of square memories is avoided for similitude
on different stages.
• We have been productive in accomplishing our target
of most lessened entryway count, and even made sense
of how to beat a segment of the starting late proposed
lightweight hash fills in to the extent littleness and
throughput.
REFERENCES
[1.] William Stallings, “Cryptography and Network
Security,Principles and Practices” Fourth Edition, 2005.
[2.] NIST “SECURE HASH STANDARD”, Federal
Information Processing Standards Publication 180-1,
August 1995.
[3.] NIST “SECURE HASH STANDARD”, Federal
Information Processing Standards Publication 180-2,
August 2002.
[4.] NIST “SECURE HASH STANDARD”, Federal
Information Processing Standards Publication 180-3,
August 2008.
[5.] Harris E. Michail, Athanasios S. Milidonis, “A Top-
Down Design Methodology for Ultrahigh-Performance
Hashing Cores” IEEE transaction on Dependable and
Secure computing, vol. 6, No. 4, October-December
2009.
[6.] N. Skluvos, G. Dimitroulakos, and O. Koufopavlou,“An
Ultra HighSpeed Architecture for VLSI Implementation
of HashFunctions”, Electronics, Circuits and Systems,
2003.
[7.] Marco Macchetti, Luigi Dadda, “Quasi-Pipelined Hash
Circuits”, Proceedings of the 17th IEEE Symposium on
Computer Arithmetic 2005.
[8.] Robert P. McEvoy, Francis M. Crowe, Colin C. Murphy
and William P. Marnane, “Optimisation of the SHA-2
Family of Hash Functions on FPGAs”, IEEE Computer
Society Annual Symposium onEmerging VLSI
Technologies and Architectures, 2006.
[9.] Hoang Anh Tuan, Katsuhiro Yamazaki, Shigeru
Oyanagi,“Three-stage Pipeline Implementation for SHA-
2 using data forwarding”, International Conference on
Field Programmable Logic and Applications, 2008.
[10.] Ricardo Chaves,Georgi Kuzmanov,Leonel
Sousa,Stamatis Vassiliadis,“Cost-Efficient SHA
Hardware Accelerators”, IEEE Transaction onVery Large
Scale Integration (VLSI) systems, Vol. 16, No. 8, Aug
2008.
[11.] M. McLoone, J. V. McCanny, “Efficient Single-Chip
Implementation of SHA-384 &SHA-512”, IEEE
International Conference on Field-Programmable
Technology, 2002.
[12.] Shay Gueron, Vlad Krasnov, “Parallelizing message
schedules to accelerate the computations of hash
functions”, Journal of Cryptographic Engineering,
Volume 2, Issue 4, pp 241-253,November 2012.
[14.] Chanjuan Li1, Qingguo Zhou2, Yuli Liu2, Qi,
“Costefficient Data Cryptographic Engine Based on
FPGA”,Fourth International Conference on Ubi-Media
2011.
[13.] Helion Technology Ltd. (http://www.heliontech.com)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1720

More Related Content

PPT
Network Layer
PDF
The Internet Protocol version 4 (IPv4)
PPTX
Ch 18 intro to network layer - section 4
PPTX
Internet Protocol Version 4
PDF
Ipv4 packet structure
PPTX
IPv4 Addressing
PDF
PPTX
CN Unit 3
Network Layer
The Internet Protocol version 4 (IPv4)
Ch 18 intro to network layer - section 4
Internet Protocol Version 4
Ipv4 packet structure
IPv4 Addressing
CN Unit 3

What's hot (20)

PDF
Computer network
PPT
IPV4 Frame Format
PPTX
PPT
Internet Protocol
PPTX
Et3003 sem2-1314-5 network layers ii (intro to routing)
PDF
Cs8591 Computer Networks
PPT
Chap 03
PPTX
COMPUTER NETWORKS UNIT 2
PPT
Network layer
PPTX
Data streaming algorithms
PPTX
Ch 19 Network-layer protocols Section 1
PPTX
Transport layer
PDF
IP Datagram Structure
PPTX
Ch 19 Network-layer protocols - section 2
PPTX
Network Layer
PPTX
Ipv4 header
PPTX
Lecture 3 network layer
PDF
Multicast address
PPT
Chapter 5 : Link Layer
PDF
20CS2007 Computer Communication Networks
Computer network
IPV4 Frame Format
Internet Protocol
Et3003 sem2-1314-5 network layers ii (intro to routing)
Cs8591 Computer Networks
Chap 03
COMPUTER NETWORKS UNIT 2
Network layer
Data streaming algorithms
Ch 19 Network-layer protocols Section 1
Transport layer
IP Datagram Structure
Ch 19 Network-layer protocols - section 2
Network Layer
Ipv4 header
Lecture 3 network layer
Multicast address
Chapter 5 : Link Layer
20CS2007 Computer Communication Networks
Ad

Similar to IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2) using VHDL Implemented on FPGA of SHA-224/256 Core (20)

PDF
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
PDF
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
PDF
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
PDF
Sha
PPT
ch12.ppt
PPT
HMAC&CMAC.ppt
PPTX
Cryptographic Hash Functions in Security.pptx
PDF
Algebraic Fault Attack on the SHA-256 Compression Function
PDF
SHA 1 Algorithm
PDF
Implementation of message authentication code using DNA-LCG key and a novel h...
PPT
Secure hashing algorithm
PDF
A Comparative Analysis between SHA and MD5 algorithms
PDF
Whitepaper on new research on message digest and hash techniques
PDF
Count based Secured Hash Algorithm.
PPTX
PPTX
Blockchain Technology Explained: A Beginner's Guide to the Future of the Inte...
PPTX
Information and network security 39 secure hash algorithm
PDF
icwet1097
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
PERFORMANCE ANALYSIS OF SHA-2 AND SHA-3 FINALISTS
Sha
ch12.ppt
HMAC&CMAC.ppt
Cryptographic Hash Functions in Security.pptx
Algebraic Fault Attack on the SHA-256 Compression Function
SHA 1 Algorithm
Implementation of message authentication code using DNA-LCG key and a novel h...
Secure hashing algorithm
A Comparative Analysis between SHA and MD5 algorithms
Whitepaper on new research on message digest and hash techniques
Count based Secured Hash Algorithm.
Blockchain Technology Explained: A Beginner's Guide to the Future of the Inte...
Information and network security 39 secure hash algorithm
icwet1097
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
PPT on Performance Review to get promotions
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Welding lecture in detail for understanding
PPTX
web development for engineering and engineering
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
DOCX
573137875-Attendance-Management-System-original
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Project quality management in manufacturing
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Sustainable Sites - Green Building Construction
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Well-logging-methods_new................
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPT on Performance Review to get promotions
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Welding lecture in detail for understanding
web development for engineering and engineering
CH1 Production IntroductoryConcepts.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
573137875-Attendance-Management-System-original
OOP with Java - Java Introduction (Basics)
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
Project quality management in manufacturing
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CYBER-CRIMES AND SECURITY A guide to understanding
Sustainable Sites - Green Building Construction
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Well-logging-methods_new................
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx

IRJET- Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2) using VHDL Implemented on FPGA of SHA-224/256 Core

  • 1. implemented on FPGA of SHA-224/256 core. Ms.Shreshtha Mishrs (Garg) M Tech (Embedded System and VLSI Design) Prof. Rishi Jha Deptt of Electronics and Communication Abstract- Cryptography plays an important role in the security of data. Even though the data is encrypted it can be altered while transmitting on the network so data should be verified using a digital signature. Hashing algorithms are used to create these digital signatures for verification of the data received. Hashing algorithm like Secure Hash Algorithm-2 (SHA-2(224/256)) is designed which has a fixed output length of 512-bits. Then to improve on power a low-power technique such as latch based clock gating technique is used. Introduction- In this research paper several hardware optimization techniques for the SHA-2 hashing functions were explored. A new architecture i.e., Round Pipelined Technique was proposed for the SHA-2 core, which eliminates the data dependency between iteration using data forwarding to improve the throughput per area. The fully iterative and Round Pipelined Techniques were investigated and developed using HDL. A comparison with other published results depicts 57% improvement in throughput per slice for SHA-256 and 17% improvement in throughput per slice for SHA-512. Implementation results indicate that the Round Pipelined technique can help to achieve good tradeoff between throughput and area. As future work, implementations of the SHA-2 core may be attempted by adopting various other design techniques like a design optimized for area using better resource sharing or loop unrolling techniques. Another optimization effort could be to increase the depth of the pipeline stages to increase the throughput. The aim of the project is to implement VHDL of the Hash Algorithm version 2 (SHA-2) hashing cryogenic algorithm, which will only provide partial and useful information to understand my work. From an arbitrary length message (<264), the algorithm generates a fixed length hash, equal to the number present in the name of the four variants of the SHA-224, SHA-256, SHA-384, SHA-512 algorithm. A preprocessing (or padding) phase is provided in which the message is bit 1 and so many bits 0 to its length to a value divided by 512 for the rest 448 (i.e., 512 to 64) and then accodated The length of the original message (64 bits). The message is divided into 512-bit blocks and the algorithm is applied sequentially to cascade on the blocks, using the result obtained from the previous block as the input data for the next calculation. Each block passes through three phases: data expansion, compression cycle, and hash processing (or intermediate digestion). The most critical step for implementation is compression that consists of a 64-step loop (80 for SHA-384/512) that is dependent on each other and therefore not parallelizable. Even expansion in the official drafting of the algorithm is a loop, however, it can be carried out without loosing clock cycles while loading the block through a buffer and a pointer over it. We performed other modifications and optimizations (rescheduling, pipelines between the stages of the different blocks, redundancies) to reduce the total delay, some of the many documents available on the net, the main ones in the bibliography (2) (3), others introduced by Tailored to my design. I have examined and discarded many other possibilities (unrolling, quasi-pipeline, special additions) because they are not suitable for our architecture. Cryptographic Hash Function Designs (SHA-2 ) Three new revised versions of SHA were added into SHA family by NIST in August 2002 as FIPS 180-2, known to be SHA-256, SHA-384, and SHA-512 with the respective hash value lengths of 256, 384, and 512 bits. Later in 2008, FIP PUB 180-3 was issued as a revised document, which added SHA-224 [RFC3874] into the family. Collectively, these algorithms are recognized as SHA2. Fig. One Iteration of Compression Function of SHA-2 Family SHA256 Algorithm The SHA256 hashing calculation task can be advantageously separated into three unmistakable activities. They are as per the following: • Pre-handling: Operation that performs cushioning rationale and parses the info message • Message scheduler: Function that creates sixty-four words from a 16 word input message square International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 Low Power and Simple Implementation of Secure Hashing Algorithm (SHA-2) Using VHDL ---------------------------------------------------------------------***---------------------------------------------------------------------- © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1713
  • 2. • Compression work: Function that completes the genuine hashing activity of the message-subordinate word that leaves the message scheduler in each round SHA256 Pre-preparing The SHA256 pre-preparing is the underlying advance that should be performed before the message planning and the pressure capacity can be connected. The pre-preparing stage plays out the accompanying three undertakings all together: • Pad the message to make it a various of 512 bits, • Parse this message into 512 piece squares, and • Set the underlying hash esteem Padding the Message The message to be hashed should be cushioned first. Cushioning is done as such as to guarantee that the message to be hashed is a numerous of the square size for SHA256 i.e. 512 bits. Presently, in the event that we consider that the length of the message is l bits, the cushioning rationale is with the end goal that it attaches a bit "1" toward the finish of the real message which is then trailed by k number of zero bits. Here, k is the littlest, non negative answer for the accompanying condition [26]: l + 1 + k = 448 mod 512 Condition 5: SHA256 Padding Logic The condition above is such in light of the fact that SHA256 permits an info message to have a length of up to 264 bits. After the trail of 0 bits, a 64 bit square is attached toward the end that is equivalent to l spoke to in a double portrayal. Parsing the Padded Message After the message has been cushioned utilizing the rationale clarified above, it is parsed into N 512-piece squares with the goal that the message planning and hash calculation can be initiated. Setting the Initial Hash Value (H0) Prior to the hash calculation begins, the underlying hash esteem is set which comprises of the accompanying 32 bit words: H00 H01 H02 H03 H04 H05 H06 H07 0x6a09e66 7 0xbb67ae8 5 0x3c6ef37 2 0xa54ff53 a 0x510e52 7f 0x9b0568 8c 0x1f83d9a b 0x5be0cd1 9 Table 1: SHA256 Initialisation Vector. Source: [26] It is interesting to know the origin of this 8 word value. They were obtained by taking the first 32 bits of the fractional parts of the square roots of the first 8 prime numbers. This initial hash value acts as the IV for the SHA256 algorithm as explained in the earlier section. SHA256 Message Scheduler After the pre-processing stage is completed, the message schedule block takes the first 512 bit message block and outputs the message dependant words Wt. The 32 bit message- dependant words that that are output by the message scheduler for every round are labelled as W0, W1,…, W63 (for t=0 to 63) and they are calculated as follows: For 0 ≤ t ≤ 15, Wt = Mt For 16 ≤ t ≤ 63, Wt = 1(Wt-2) + Wt-7 + 0(Wt-5) + Wt-16 Equation 6: SHA256 Message Scheduler. Source: [26] Here, σ 0 and σ 1 are two logical functions specific to the SHA256 message scheduler that operate on a 32 bit word. The details of these functions are provided below: 0(x) = ROTR7(x) ⊕ ROTR18(x) ⊕ SHR3(x) 1(x) = ROTR17(x) ⊕ ROTR19(x) ⊕ SHR10(x) Equation 7: Logical Functions σ0 and σ1. Source: [26] The two consistent capacities 0 and 1 work on an expression of the information message and apply the above bitwise tasks to it. ROTRx remains for bitwise turn appropriate for x bits, SHRx remains for bitwise move right and ⊕ remains for the bitwise selective or. This message plan square is normally executed in equipment by utilizing 16 phases of 32 bit move registers and three 32 bit adders [33] for the 512 piece information square handling. Each round, the 32 bit estimation of Wt is moved to one side utilizing the move enrolls as past estimations of Wt are required to ascertain future estimations of Wt. It is engaging realize that two sensible capacities σ0, σ1 and the message plan rationale clarified don't become an integral factor until the point that the seventeenth round. The 512 piece input message is nourished as it is to the message pressure work for the initial 16 rounds. SHA256 Message Compression Function The message pressure work plays out the genuine hashing activity and is the principle task that authorizes the restricted property of SHA256. Other than the eight expressions of working factors A, B, C, D, E, F, G and H that are utilized and refreshed in each cycle, two transitory words T1 and T2 are additionally utilized by the message pressure work for calculation of the factors An and E in each round. The initial step that the message pressure work performs is that it initialises these 8 working factors with the IV (in the event that it is the principal square) or with the halfway hash of the past square (on the off chance that it isn't the primary square being hashed). Figure 3: SHA256 Message Compression Function (Above) and Message Scheduler (Below) The figure above demonstrates the common execution of the message pressure work and the message scheduler that work couple. It can be obviously observed from the figure over that at each cycle, 6 out of 8 estimations of A, B, C, and E, F, G are moved by one position to B, C, D and F, G, H separately. Wt is the 32 bit information figured by the message scheduler International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1714
  • 3. and bolstered to the pressure capacity and Kt is a round particular 32 bit steady whose round particular qualities are determined in addendum A. SHA256 utilizes 64 constants (1 for each round) that are 32 bit words and they have been gotten by taking the initial 32 bits of the partial parts of the block foundations of the initial 64 prime numbers. Factors An and E are reliant on all information esteems and are figured in each round utilizing conditions clarified straightaway. After the 8 working factors are initialised as clarified before, 64 rounds of the pressure work are connected to them and middle of the road round estimations of these factors are ascertained as takes after: T1 = H + Σ1(E) + Ch(E, F, G) + Kt + Wt T2 = Σ0(A) + Maj(A, B, C) H = G; G = F; F = E E = D + T1 = D + H + Σ1(E) + Ch(E, F, G) + Kt + Wt D = C; C = B; B = A A = T1 + T2 = H + Σ1(E) + Ch(E, F, G) + Σ0(A) + Maj(A, B, C) + Kt + Wt Equation 8: Message Compression Function. Source: [26] The four logical functions mentioned above perform the core operation of introducing the confusion and diffusion in Wt that enters in 32 bit words at each round. After applying the above equations to the working variables for 64 rounds, an appropriate level of the avalanche effect is observed. These 4 logical functions are now explained next. Ch(X, Y, Z) = (X ∧ Y)⊕(¬X ∧ Z) Maj(X, Y, Z)= (X ∧ Y)⊕(X ∧ Z)⊕(Y ∧ Z) Σ0(X) = ROTR2(X) ⊕ ROTR13(X) ⊕ ROTR22(X) Σ1(X) = ROTR6(X) ⊕ ROTR11(X) ⊕ ROTR25(X) Equation 9: Logical Functions Ch, Maj, Σ0 and Σ1. Here, logical functions Ch and Maj take 3 words as input and produce a single word output. ∧ stands for a 32 bit Bitwise AND operation while ¬ is the compliment operation. The Ch function always takes the working variables E, F and G as inputs while the Maj function always takes A, B and C as inputs. Variables A and E are the ones that need to be computed at each round. Functions Σ0 and Σ1 always take variables A and E as their input. We canthus see some sort of symmetry in the message compression function that divides it into two parts. This symmetry is evident from figure 3. Figure 4: SHA256 Compression Function Along with the Final Additions. The figure above represents a different look at the compression function but it conveys the same message. The point to take away from this figure is that after the compression function has been applied 64 times i.e. after the 64 rounds have been completed, the values contained in the working variables A to H are finally added to the 8 word data block that was fed to the compression function at the beginning. This value could either be the constant IV for SHA256 or an intermediate message digest. This is because of the fact that the SHA256 algorithm follows the Davies-Meyer construction where the input is added to the output at the end. Now, the intermediate/final hash is given by the following equation: H0 i+1 = A + H0 1 H1 i+1 = B + H1 1 H2 i+1 = C + H2 1 H3 i+1 = D + H3 1 H4 i+1 = E + H4 1 H5 i+1 = F + H5 1 H0 i+1 = G + H6 1 H0 i+1 = H + H71 Equation 10: Calculation of Intermediate/Final Hash Value After all the message blocks including the final Nth message block has been processed in this manner, the final hash i.e. the 256 bit message digest of the message is represented in the following manner: SHA256(M) = H0 N || H0 N || H0 N || H0 N || H0 N || H0 N || H0 N H0 N Equation 11: Resulting SHA256 Message Digest. Source: [26] This 8 word information square (H0 - H7i) is the default steady SHA256 initialisation vector (IV) if the message was not exactly or equivalent to 512 bits (counting the cushioning). In the event that the length of the message (counting cushioning) is more noteworthy than 512 bits, at that point this 8 word information square is the middle hash figured of the past 512 piece square. This game plan where the halfway hash estimation of the past square is nourished as IV to the hash International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1715
  • 4. calculation of the following square is known as the Merkle- Damgård development. SHA256 depends on this development called the Merkle-Damgård Paradigm and is worked to be impact safe as the hidden SHA256 pressure work is crash safe. Module Architecture of Proposed Design Sg_sha256 This is the sg_sha256 motor best level. • The sg_sha256 is a stream hash motor, i.e., the information words are hashed as a surge of words read from an info transport, with control contributions for BEGIN/END of the message information stream. The information transport is a 32bit word transport, with a byte path selector to signalize what number of bytes are legitimate in the last word. • The center is a basic coordination of the rationale obstructs for the SHA256 motor, with the inner information way and control-way wires.- • Written in synthesizable VHDL, the hash motor is a low asset, zone productive execution of the SHA256 hash calculation. • The information input port is sorted out as a 32bit word compose enroll, with stream control and start/end signals. • The 256bit outcome enlist is sorted out as 8 x 32bit registers that can be perused at the same time. • This execution is a traditionalist usage of the affirmed FIPS-180-4 calculation, with a reasonable bargain of assets, containing just 32 registers of 32bit words for the hash motor, with a solitary cycle combinational rationale for every calculation step. • The SG_SHA256 is a fundamental cryptographic square, utilized by all encryption and advanced mark plans. Sg_sha_control This is the control route method of reasoning for the sg_sha256 brisk engine. • It is a totally synchronous diagram, with all signs synchronous to the rising edge of the system clock. • The sequencer state machine controls the hash data way modules, making addresses for the coefficients ROM, stack/enable signs for the • message design, hash focus and yield registers circuit squares, and control signals for the data padding reason. Sg_sha_hash_core This is the 256bit single-cycle hash center preparing rationale for each of the 64 square advances. • The combinational profundity of this square is 8 layers of rationale and adders. Sg_sha_Ki_rom Starting qualities for the hash result registers. • This module is demonstrated as a settled esteem work. • It can be actualized as a nearby steady settled esteem. Sg_sha_kt_rom This is the 64 words coefficients rom for the square hash center. • It is displayed as an offbeat addressable ROM memory. • Depending on the manufacture procedure and innovation, this memory can be executed as an OTP, a MUX, a settled LUT or a combinational capacity. Sg_sha_msg_sch This is the message scheduler information way for the sha256 processor. Sg_sha_padding This is the byte cushioning datapath rationale for the sha256 processor. • The cushioning of the last square is controlled by the byte path selectors and the last words selectors. • These control signals are produced at the Control Logic square of the SHA256 processor. • A consistency check blunder flag is created, to hail illicit control conditions. • This square is a completely combinational circuit, with 2 layers of rationale. Sg_sha_regs The regs square has the yield result registers for the SHA256 processor. • It is a solitary cycle 256bit Accumulator for the square hash comes about, and can be actualized as a 32bit MUX and a 32bit convey chain for each enroll. The Hardware Implementations and Optimisations of SHA256 The only practical way of a high speed SHA256 engine is to implement it in hardware be it either FPFAs or the recent technology of ASICs. Software implementations of Bitcoin mining used in CPU or GPU mining have become obsolete as they simply cannot compete with the hashing power of hardware implementations. These hardware implementations truly serve the meaning of a fast implementation and various hardware optimisations have been proposed over the years in order to increase their throughput and to reduce their power consumption. These optimisations, however, are aimed at the hardware implementation of the SHA256 hashing algorithm in general rather than SHA256 employed for Bitcoin mining. Most of these optimisations are aimed towards the longest data path or the critical path in the SHA256 core which is the calculation of working variable A in the message compression function that involves mod 232 additions of 7 operands (see equation 8 in section 3.2.3). We shall now have a look at the various SHA256 hardware speedup proposals made. SHA256 Hardware Optimisations Many hardware implementations have been seen in the literature that are either FPGA [15] [16] [25] [30] [39] or ASIC designs [21] [22] [33] [37] [47]. These implementations designs contain one or a combination of the following optimisations so as to speed up the calculations i.e. the throughput of the SHA256 core. The main design difference for the hardware implementation of SHA256 lies in the trade- International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1716
  • 5. off between throughput and the area complexity which is measured in Gate Equivalents (GE). But, in our cases of Bitcoin mining, we typically have no area/space constrains and thus we shall concentrate on the throughput optimisations only. More the area, more is the throughput and lesser is the number of required clock cycles to perform the SHA256 computation. Use of Carry-Save Adders (CSAs) As mentioned before, the calculation of the working variable A for each round of the compression function forms the longest data path or the critical path in the SHA256 core. This involves mod 232 additions on 7 operands (Kt, Wt, H, Σ1(E), Ch(E, F, G), Σ0(A), Maj(A, B, C)). Architectures [21] [22] [35] [36] [39] employing Carry Save Adders (CSAs) minimise the delay caused by the carry propagation time by separating the sum and the carry paths. CSAs accept 3 operands as inputs and so, the working variable A can be computed using just 5 CSAs [22]. Having said that, CSAs require another 2-input adder for the recombination of the sum and carry paths. This 2 operand addition can either be performed by using CLAs i.e. Carry Look Ahead adders or by using CPAs i.e. Carry Propagation Adders. The net result of using CSAs for the critical path is that they reduce the carry propagation delay caused as compared to traditional CPAs used on the critical path. Unrolling Unrolled architectures [20] [35] [36] [39] reduce the number of clock cycles required to perform the SHA256 hash computation by implementing multiple rounds of the SHA256 compression function using combinational logic. These architectures help improve the throughput by optimising the data dependencies involved in the message compression function. Say if the SHA256 core was unrolled once, then this would effectively mean that the hash should be calculated in half the number of clock cycles. As a trade-off, unrolling the SHA256 core architecture comes at the cost of a decrease in the clock frequency and an increase in the area complexity. (Quasi-) Pipelining The goal of quasi-pipelining is to optimise the critical path and therefore increase the clock frequency. Quasi-pipelined SHA256 architectures [21] [22] [36] [39] use registers to break the long path or the critical path of the computation of the working variable A in the message compression function. Thus, such quasi-pipelines architectures allow higher data throughputs and higher frequencies of hash calculations by achieving very short critical paths. Pipelining is not as easy to achieve as it sounds due to the feedback associated due to the way in which the SHA256 compression function is designed. As a result, an external control circuitry is required such that the registers are enabled correctly. Delay Balancing Dadda et al. [22] have been the pioneers in hardware optimisations of SHA256 and they have also spent their research efforts on delay balancing along with the use of CSAs. Just as described earlier, a CLA is used to combine the sum and the carry paths output by the CSA but these sum and carry paths are first registered so that the CLA adder is removed from the critical path. This increases the throughput but this architecture requires additional control circuitry for the additional register introduced in the architecture. Addition of Kt and Wt Looking at figure 3 and figure 4, we can see that the addition of Kt and Wt can be performed independent of the message compression function. The architecture proposed in [52] uses this as an improvement by moving Kt + Wt to the message scheduler stage. This can be done because both Wt and Kt are available before and are independent of the other operands (see equation 8 in section 3.2.3). However, it is seen that quasi-pipelining architecture proposed by Dadda et. al. [21] [22] [36] performs a similar separation of the operands and the resulting critical path is even shorter than in [52]. Operation Rescheduling Architectures that employ operational rescheduling allow an efficient use of a pipelined structure without increasing the area complexity. This in turn allows higher throughputs. [15] [16] have claimed that they were able to reduce the critical path in a similar manner as unrolling techniques and gain a higher throughput without adding more area complexity. Module Architecture of Proposed Design SHA-2 hardware is architected by using iterative and round pipeline basis where iterative SHA224/256 takes 64 clock cycles and pipelined operation takes 68 clock cycles. SHA384/512 takes 80 clock cycles for iterative operation and 84 clock cycles for pipelined operation to calculate the final hash value. The aim is to implement the designed hash function core on VHDL. The whole package and separate modules were synthesized and analyzed using Xilinx ISE 12.1 tool for the targeted Virtex-VI FPGA. The VHDL implementation was divided into five modules: • Initial module: - It collects the serial input bits and sends 512 bit blocks to the next module. • Round module: - It performs the hashing calculations and operations on the input message block and previous hash output to generate a new hash value. • Last Block module: - At the end of the message bit stream the final message block of 512 bits has to be prepared by adding 64 bits of message length at the end of 448 bits of input message block, padded accordingly to suffice the word size requirement. This final message block does this function of preparing the last message block. • Final module: - This module computes the hash value by adding the previous hash value to the new hash value achieved from the Round module. Then it sends the 256 bit hash value, bit by bit (serially). • Top module: - This module is the control unit for controlling the functioning of the rest of the modules and to ensure that the SHA-2 algorithm flow is followed and maintained International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1717
  • 6. Complete Top Level Logic Design of SHA-256 Sg_sha256 This is the sg_sha256 engine top level. • The sg_sha256 is a stream hash engine, i.e., the data words are hashed as a stream of words read from an input bus, with control inputs for BEGIN/END of the message data stream. The input bus is a 32bit word bus, with a byte lane selector to signalize how many bytes are valid in the last word. • The core is a structural integration of the logic blocks for the SHA256 engine, with the internal data-path and control-path wires.- • Written in synthesizable VHDL, the hash engine is a low resource, area-efficient implementation of the SHA256 hash algorithm. • The data input port is organized as a 32bit word write register, with flow control and begin/end signals. • The 256bit result register is organized as 8 x 32bit registers that can be read simultaneously. • This implementation is a conservative implementation of the approved FIPS-180-4 algorithm, with a fair compromise of resources, comprising of only 32 registers of 32bit words for the hash engine, with a single-cycle combinational logic for each algorithm step. • The SG_SHA256 is a basic cryptographic block, used by almost all encryption and digital signature schemes. Sg_sha_control This is the control path logic for the sg_sha256 fast engine. • It is a fully synchronous design, with all signals synchronous to the rising edge of the system clock. • The sequencer state machine controls the hash data path modules, generating addresses for the coefficients ROM, load/enable signals for the • message schedule, hash core and output registers circuit blocks, and control signals for the input padding logic. control path logic Sg_sha_hash_core This is the 256bit single-cycle hash core processing logic for each of the 64 block steps. • The combinational depth of this block is 8 layers of logic and adders. 256bit single-cycle hash core processing logic Sg_sha_Ki_rom Initial values for the hash result registers. • This module is modelled as a fixed value function. • It can be implemented as a local constant fixed value. Sg_sha_kt_rom This is the 64 words coefficients rom for the block hash core. • It is modelled as an asynchronous addressable ROM memory. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1718
  • 7. • Depending on the fabrication process and technology, this memory can be implemented as a OTP, a MUX, a fixed LUT or a combinational function. Sg_sha_msg_sch This is the message scheduler data path for the sha256 processor. Sg_sha_padding This is the byte padding datapath logic for the sha256 processor. • The padding of the last block is controlled by the byte lane selectors and the last words selectors. • These control signals are generated at the Control Logic block of the SHA256 processor. • A consistency check error signal is generated, to flag illegal control conditions. • This block is a fully combinational circuit, with 2 layers of logic. byte padding datapath logic Sg_sha_regs The regs block has the output result registers for the SHA256 processor. • It is a single-cycle 256bit Accumulator for the block hash results, and can be implemented as a 32bit MUX and a 32bit carry chain for each register. Registers for the SHA256 Conclusion The future cryptographic hash standard SHA-2 should be sensible and versatile for a broad assortment of usages, featuring meanwhile a perfect security quality. In this work, we showed an aggregate hardware depiction of the SHA-2. A round rescheduling technique and an exceptional reason memory design are moreover proposed. Post-mix comes to fruition , a low-control littler utilization of SHA-2 has been Implemented. I assume that a similar approach for littler VLSI use of cryptographic traditions is a productive choice to reduce the area and power use of the organized circuit. The foremost purpose of this proposition was to propose improvements in the SHA256 hashing estimation. This point was roused by the manner in which that various hardware based progressions in SHA256 gear executions have quite recently been suggested anyway they have been away for the SHA256 hashing estimation when all is said in done. In perspective of that, the fundamental responsibility made in this suggestion has been the SHA256 figuring headway proposition that are specific to decreasing hardware. The hypothesis in like manner made an undertaking to deal with establishment information and moreover the related information which would be required with a particular true objective to totally acknowledge what was being prescribed. A talk has moreover been impacted concerning the precision of the Savings To factor and the necessity for executing and taking a gander at as performed by off-the-rack SHA256 and the propelled version of SHA256 for a more exact assessment of this Factor. The prerequisite for an essential examination of the figuring improvements' closeness with existing gear headways has furthermore been discussed. It is assumed that the prescribed improvements will accomplish radical throughput overhauls in devices. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1719
  • 8. The extensive variety of achieved displays makes prepared for the utilization of the SHA-2 ability to various hardware use. Shutting remarks are, • 'Lightweight' is the rising star of cryptography. The term 'lightweight' alone covers a wide range, for instance, lightweight to the extent zone, speed, control usage, essentialness use, or a blend of these, dependent upon the specific application. • This ask about solely centers around the lightweight for zone, which in like manner realizes lightweight for typical power usage in numerous applications. • The usage of square memories is avoided for similitude on different stages. • We have been productive in accomplishing our target of most lessened entryway count, and even made sense of how to beat a segment of the starting late proposed lightweight hash fills in to the extent littleness and throughput. REFERENCES [1.] William Stallings, “Cryptography and Network Security,Principles and Practices” Fourth Edition, 2005. [2.] NIST “SECURE HASH STANDARD”, Federal Information Processing Standards Publication 180-1, August 1995. [3.] NIST “SECURE HASH STANDARD”, Federal Information Processing Standards Publication 180-2, August 2002. [4.] NIST “SECURE HASH STANDARD”, Federal Information Processing Standards Publication 180-3, August 2008. [5.] Harris E. Michail, Athanasios S. Milidonis, “A Top- Down Design Methodology for Ultrahigh-Performance Hashing Cores” IEEE transaction on Dependable and Secure computing, vol. 6, No. 4, October-December 2009. [6.] N. Skluvos, G. Dimitroulakos, and O. Koufopavlou,“An Ultra HighSpeed Architecture for VLSI Implementation of HashFunctions”, Electronics, Circuits and Systems, 2003. [7.] Marco Macchetti, Luigi Dadda, “Quasi-Pipelined Hash Circuits”, Proceedings of the 17th IEEE Symposium on Computer Arithmetic 2005. [8.] Robert P. McEvoy, Francis M. Crowe, Colin C. Murphy and William P. Marnane, “Optimisation of the SHA-2 Family of Hash Functions on FPGAs”, IEEE Computer Society Annual Symposium onEmerging VLSI Technologies and Architectures, 2006. [9.] Hoang Anh Tuan, Katsuhiro Yamazaki, Shigeru Oyanagi,“Three-stage Pipeline Implementation for SHA- 2 using data forwarding”, International Conference on Field Programmable Logic and Applications, 2008. [10.] Ricardo Chaves,Georgi Kuzmanov,Leonel Sousa,Stamatis Vassiliadis,“Cost-Efficient SHA Hardware Accelerators”, IEEE Transaction onVery Large Scale Integration (VLSI) systems, Vol. 16, No. 8, Aug 2008. [11.] M. McLoone, J. V. McCanny, “Efficient Single-Chip Implementation of SHA-384 &SHA-512”, IEEE International Conference on Field-Programmable Technology, 2002. [12.] Shay Gueron, Vlad Krasnov, “Parallelizing message schedules to accelerate the computations of hash functions”, Journal of Cryptographic Engineering, Volume 2, Issue 4, pp 241-253,November 2012. [14.] Chanjuan Li1, Qingguo Zhou2, Yuli Liu2, Qi, “Costefficient Data Cryptographic Engine Based on FPGA”,Fourth International Conference on Ubi-Media 2011. [13.] Helion Technology Ltd. (http://www.heliontech.com) International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 04 | Apr 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1720