SlideShare a Scribd company logo
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
106
AES ENCRYPTION ENGINE FOR MANY CORE
PROCESSOR ARRAYS FOR ENHANCED SECURITY
Dhanya Pushkaran¹, Neethu Bhaskar²
1
M.Tech, VLSI and Embedded system, ECE Department, SNGCE, Kolenchery
2
Assistant Professor, ECE, SNGCE, Ernakulam, India,
ABSTRACT
With the development of networking technology, Hardware encryption technology will become an irreplaceable
safety technology become an irreplaceable safety technology. In this paper, I present the design of a very high throughput
AES processor with 128 bit key on an FPGA. In order to protect the encrypted data from Power Analysis, High-
throughput advanced encryption standard (AES) engine with masked S-Box is proposed. In order to analyse the
throughput, we map 2 implementations of an Advanced Encryption Standard (AES) cipher with online key expansion on
a fine-grained many-core system.
Keywords: Advanced Encryption Standard (AES), Differential Power Analysis (DPA), Field Programmable Gate Array
(FPGA), Fine-Grained, Many-Core, Parallel Processor
INTRODUCTION
With the development of information technology, protection of information through encryption is very important
in day to day life. In 2001, national institute of standard and technology replaces the data encryption standard and select
the Rijndael algorithm as the advanced encryption standard (AES) [1]. AES has been used in many applications, such as
secure communication system, digital video/audio recorder, RFID tags and smart cards etc. One of the main advantage of
Rijndael algorithm is that it can be used for both hardware and software implementation.
To satisfy many application numerous hardware implementation of AES has been reported to achieve high
throughput even though time consuming and costly. One of the main block of AES is the SubByte transformation [1]
which uses S-box look-up table that is stored in memory. This data stored in storage are under the risk of information
leakage in embedded applications. The differential power analysis (DPA) attack[2] was further developed as one of the
most promising power analysis attacks which is related to the power consumption. So the protection of data from DPA is
very important. For that instead of using S-Box lookup table masked S-Box is being implemented. We perform the
masked S-Box mainly over GF(2⁴). Therefore, we only need to transform the input values from GF(2⁸) to GF(2⁴) and
transform the output values back from GF(2⁴) to GF(2⁸) which reduces the hardware resources.
This paper present the online expansion of two type AES implementation on a fine grained many core system to
achieve high performance and throughput per unit of chip.
INTERNATIONAL JOURNAL OF ELECTRONICS AND
COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 5, Issue 12, December (2014), pp. 106-111
© IAEME: http://www.iaeme.com/IJECET.asp
Journal Impact Factor (2014): 7.2836 (Calculated by GISI)
www.jifactor.com
IJECET
© I A E M E
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
107
AES ALGORITHM
AES is a key iterated block cipher that contains several round of transformation on the state. It is a symmetric
encryption algorithm uses 128 bit key to generate output cipher text. It takes 128 bits of data block and each 128-bit data
block is considered as a 4-by-4 array of bytes, called the state. The number of iteration in the AES, Nr, is defined by the
length of the round key, which are 10 for key lengths of 128 bits.
Fig 1: Block Diagram of AES Algorithm
The figure 1 shows the basic steps of AES algorithm with online key expansion. The steps include:
1. SubBytes: Nonlinear bite transformation which replace each input byte with the byte value from the substitution
box. Substitution box is explained in section
2. ShiftRow: Each row of the state is left shifted according to the row number. First row no shifting is done, for 2nd
row 1byte shifting is done and so on.
3. MixColumn: Each column of the array is considered as a polynomial over GF(2⁸) and modular multiplication is
done with irreducible polynomial x⁴+1. The resulting polynomial is then multiplied with a fixed polynomial given
in equation (1).
A(x) = {03}x³+ {01}x²+{01}x+{02} (1)
4. AddRoundKey: Simple bitwise XOR operation of the state with the key expanded value is done. The key
expansion is done by the following steps:
1. KeySubWord: Each byte of the key value is replaced with the values from the substitution box.
2. KeyRotWord: Each row is done a 1 byte shifting to the left.
3. KeyXor: Each row w[i] is XORed with the previous row w[i-1] to form a new row w'[i].
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
108
MASKED S-BOX
In SubByte transformation, each byte is replaced with a value from S-Box. Since there are only 256
representation of 1 byte, a lookup table of S-Box can be implemented. So the power and time consumption is reduced. But
this result in differential power analysis (DPA) attach [3] [4].
So here S-Box using galois field can be implemented to avoid DPA attach. It can be implemented by taking the
multiplicative inverse and apply the affine transformation. But calculating the multiplicative inverse in GF(2⁸) is very
expensive. So masked S-Box is implemented that calculates multiplicative inverse of GF(2⁸) using GF(2⁴). The input byte
is mapped to two elements of GF(2⁴) and then find out the multiplicative inverse using GF(2⁴). After that the two elemnts
inverse mapping to GF(2⁸) is done. Figure 2 shows the steps to find out the masked s-box.
Multiplicative inverse
For hardware implementation far better suited representation is to see field GF(2ˆ8) as a quadratic extension of
the field GF(2ˆ4). In this case, an element a є GF(2ˆ8) is represented as the linear polynomial with coefficient in GF(2ˆ4)
Map(a)= aһ x + al, a є GF (2ˆ8); ah, al є GF(2⁴)
For hardware implementation, the equation for map (a) is shown in equation 2.
ah x + al = map (a), ah, al є GF(2⁴), a є GF(2⁸) (2)
aA = a1⊕ a7, aB= a5 ⊕ a7,
aC= a4 ⊕ a6 al0= ac ⊕ a0 ⊕ a5,
al1= a1 ⊕ a2, al2= aA,
al3= a2 ⊕ a4 ah0= ac ⊕ a5,
ah1= aA ⊕ aC, ah2= aB ⊕ a2 ⊕ a3,
ah3= aB
Fig 2: Block diagram of masked S-Box
After finding out the multiplicative inverse in GF (2⁴), two term polynomial ah x + al converted back to element
in GF(2⁸). The equation for map¯¹ is shown in equation 3.
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
109
map¯¹ (ah x + al) = a, ah, al є GF(2⁴), a є GF(2⁸) (3)
aA= al1 ah3, aB= ah0 ah1
a0= al0 ⊕ ah0, a1= aB ⊕ ah3,
a2= aA ⊕ aB, a3= aB ⊕ al1 ⊕ ah2,
a4= aA ⊕ aB ⊕ al3, a5= aB ⊕ al2,
a6= aA ⊕ al2 ⊕ al3 ⊕ ah0, a7= aB ⊕ al2 ⊕ ah3
Multiplication in GF(2⁴) corresponds to multiplication of polynomial modulo an irreducible polynomial of degree 4. The
irreducible polynomial is given by,
M(x)= x⁴ + x+1.For hardware implementation, byte multiplication is given in equation 4.
q(x) = a(x). b(x). mod m(x), a(x),b(x),q(x) є GF(2⁴) (4)
aA= a0 ⊕ a3, aB= a2 ⊕ a3
q0= a0b0 ⊕ a3b1 ⊕ a2b2 ⊕ a1b3
q1= a1b0 ⊕ aAb1⊕ aBb2 ⊕ (a1 a2)b3
q2= a2b0 ⊕ a1b1 ⊕ aAb2 ⊕ aBb3
q3= a3b0 ⊕ a2b1 ⊕ a1b2⊕ aAb3
The multiplicative inverse can be find out using extended Euclidean algorithm. It can be derived by solving the equation
a(x).a¯¹(x)mod m4(x)= 1. Solution is shown in equation 5.
q(x) = a(x) ⁻¹ mod m₄(x), q(x), a(x) є GF(2⁴) (5)
aA= a1 ⊕ a2 ⊕ a3 ⊕ a1a2a3
q0= aA⊕ a0⊕ a0a2⊕ a1a2 ⊕ a0a1a2
q1= a0a1 ⊕ a0a2 ⊕ a1a2 ⊕ a3 ⊕ a1a3 ⊕ a0a1a3
q2= a0a1 ⊕ a2 ⊕ a0a2 ⊕ a3 ⊕ a0a3 ⊕ a0a2a3
q3= aA ⊕ a0a3 ⊕ a1a3 ⊕ a2a3
Affine Transformation
Affine transformation I given by, A'= M(a).X ⊕ [v]
Where [v] =x⁷+x⁶+x²+x and m(a)= x⁷+x⁴+x³+x+1.
The equation for hardware implementation is given in equation 6.
q = aff_tran(a) q= aff_trans⁻¹ (a) (6)
aA= a0 ⊕ a1, aA= a0 ⊕ a5,
aB= a2 ⊕ a3 aB= a1 ⊕ a4
aC= a4 ⊕ a5, aC= a2 ⊕ a7,
aD= a6 ⊕ a7 aD= a3 ⊕ a6
q0= ā0 ⊕ aC ⊕ aD q0= ā5 ⊕ aC
q1= a5 ⊕ aA⊕ aD q1= a0 ⊕ aD
q2= a2 ⊕ aA ⊕ aD q2= ā7 ⊕ aB
q3= a7⊕ aA ⊕ aB q3= a2 ⊕ aA
q4= a1⊕ aB ⊕ aC q4= a1⊕ aD
q5= ā1 ⊕ aB ⊕ aCq5= a4 ⊕ aC
q6= ā6 ⊕ aB ⊕ aC q6= a3 ⊕ aA
q7= a3 ⊕ aC ⊕ aDq7 = a6 ⊕ aB
FINE GRAINED MANY CORE ARCHITECTURE
The performance of architecture is roughly proportional to the square root of its complexity. So as the complexity
is decreased the performance will increase but it may increase the logical area. So a many core architecture can perform
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
110
better with complexity. That is instead of using single complicated core many core is used, which increases the
performance.
AES IMPLEMENTATION
In this paper I present two different AES implementation with online key expansion and the throughput of the
design is measured.
One task one processor (OTOP)
Each step in the AES algorithm is considered as a task as shown in the dataflow diagram in figure 3. Each task is
mapped on to one processor in many core processors. So we call this implementation One Task One processor. For single
iteration about 10 cores are required and after completing first iteration the same cores are used for the following iteration.
Figure 3: OTOP dataflow diagram
Loop unrolled nine times
To enhance the throughput, new design is implemented as shown in figure 4. Here each loop is done by another
set of core. So loop unrolled nine times break the data dependency and work on multiple data block. About 60 cores are
required to implement this design.
Figure 4: loop unrolled nine times data flow diagram
RESULT
I have implemented the proposed design with hardware description language which is synthesized using Xilinx
ISE 14.1 and ported the design to Spartan-6 LX45 FPGA. The table 1 shows the throughput obtained from the two
designs. From this table it is clear that the loop unrolled nine times design is very much faster than one task one
processor design.
Implementation Throughput
One Task One Processor 1.98 Gbps
Loop Unrolled Nine Times 85.15 Gbps
Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
111
CONCLUSION
Secure “data-at-rest” and enhance the throughput are the important factor for large data transformation system.
so, modern systems shift the data encryption from a software platform to a hardware platform. But the hardware based
encryption still facing the possibility of DPA attacks. In this case, an AES with masked S-box has been proposed to resist
the DPA attach with acceptable area on FPGA. The proposed masked -Box needs to map the input values from GF(2⁸) to
GF(2⁴) at the beginning of the operation and map the result back from GF(2⁴) to GF(2⁸) once at the end of the operation
Which reduce about 20% area resources.
ACKNOWLEDGMENT
I would like to express my heartfelt gratitude and thanks to my beloved guide Ms. Neethu Bhaskar, Assistant
Professor, Dept. of Electronics and Communication Engineering, SNGCE Kadayiruppu, whose guidance I could
complete the thesis work to the level I had planned, for the regular reviews and suggestions. It gives me great pleasure to
thank her for the conviction she brought in into selecting the topic of work, and the technical and literary guidance she
imparted through the different stages of its execution.
REFERENCES
[1] Advanced Encryption Standard (AES), FIPS-197, Nat. Inst. of Standards and Technol., 2001.
[2] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Proc.CRYPTO, 1999, vol. LNCS 1666,
pp. 388–397.
[3] L. Goubin and J. Patarin, “DES and differential power analysis (the ‘duplication’ method),” in Proc. CHES
LNCS, 1999, vol. 1717, pp. 158–172.
[4] S. Messerges, “Securing the AES finalists against power analysis attacks,” in Proc. FSE LNCS, 2000, vol. 1978,
pp. 150–164.
[5] S.K. Mathew, F. Sheikh, M. Kounavis, S. Gueron, A. Agarwal, S.K.Hsu, H. Kaul, M.A. Anders, and R.K.
Krishnamurthy, “53 gbps Native GF(ð24Þ2) Composite-Field AES-Encrypt/Decrypt Accelerator for Content-
Protection in 45 nm High-Performance Microprocessors,” IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 767-
776, Apr. 2011.
[6] A. Hodjat and I. Verbauwhede, “A 21.54 gbits/s Fully Pipelined AES Processor on FPGA,” Proc. IEEE 12th
Ann. Symp. Field-Programmable Custom Computing Machines, pp. 308-309, Apr. 2004.
[7] C.-J. Chang, C.-W. Huang, K.-H. Chang, Y.-C. Chen, and C.-C.Hsieh, “High Throughput 32-Bit AES
Implementation in FPGA,” Proc. IEEE Asia Pacific Conf. Circuits and Systems, pp. 1806-1809, Nov. 2008.
[8] M. McLoone and J. V. McCanny, “Rijndael FPGA implementations utilizing look-up tables,” in Proc. IEEE
Workshop Signal Process. Syst., Antwerp, Belgium, 2001, pp. 349–360.
[9] V. Rijmen, “Efficient Implementation of the Rijndael S-Box,” Dept. ESAT., Katholieke Universiteit Leuven,
Leuven, Belgium, 2006. [Online] Available: http://www.networkdls.com/Articles/sbox.pdf
[10] A. Hodjat and I. Verbauwhede, “A 21.54 Gbits/s fully pipelined processor on FPGA,” in Proc. IEEE 12th Annu.
Symp. Field-Programm. Custom Comput. Mach., 2004, pp. 308–309.
[11] S. Mangard, N. Pramstaller, and E. Oswald, “Successfully attacking masked AES hardware implementations,”
in Proc. CHES LNCS, 2005, vol. 3659, pp. 157–171.
[12] E. Oswald, S. Mangard, N. Pramstaller, and V. Rijmen, “A side-channel analysis resistant description of the
AES S-box,” in Proc. FSE LNCS, Setubal, Potugal, 2005, vol. 3557, pp. 413–423.
[13] H. Kim, S. Hong, and J. Lim, “A fast and provably secure higher-order masking of AES S-box,” in Proc. CHES
LNCS, Nara, Japan, 2011, vol. 6917, pp. 95–107. For masked AES implementation,” in Proc. IEEE 54th Int.
MWSCAS, Seoul, Korea, 2011, pp. 1–4.
[14] M. Alam, S. Ghosh, M. J. Mohan, D. Mukhopadhyay, D. R. Chowdhury, and I. S. Gupta, “Effect of glitches
against masked AES S-box implementation and countermeasure,” IET Inf. Security, vol. 3, no. 1, pp. 34–44,
Feb. 2009.
[15] E. Trichina, T. Korkishko, and K. H. Lee, “Small size, low power, side channel-immune AES coprocessor:
Design and synthesis results,” in Proc. AES LNCS, 2005, vol. 3373, pp. 113–127.
[16] S. K. Mathew, F. Sheikh, M. Kounavis, S. Gueron, A. Agarwal, S. K. Hsu, H. Kaul, M. A. Anders and R. K.
Krishnamurthy, “53 Gbps native GF(24)2 composite-field AES-encrypt/decrypt accelerator for content-
protection in 45 nm high-performance microprocessors,” IEE.
[17] Anubhav Gupta and Harish Bansal, “Design of Area Optimized AES Encryption Core using Pipelining
Technology”, International Journal of Electronics and Communication Engineering & Technology (IJECET),
Volume 4, Issue 2, 2013, pp. 308 - 314, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472.

More Related Content

PDF
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
PDF
Al04605265270
PDF
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
PPTX
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
PDF
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Paper id 37201520
PDF
A Cryptographic Hardware Revolution in Communication Systems using Verilog HDL
Al04605265270
EXTENDED K-MAP FOR MINIMIZING MULTIPLE OUTPUT LOGIC CIRCUITS
Design of High Performance 8,16,32-bit Vedic Multipliers using SCL PDK 180nm ...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
International Journal of Engineering Research and Development (IJERD)
Paper id 37201520

What's hot (19)

PPT
Ieee project reversible logic gates by_amit
PDF
A low power adder using reversible logic gates
PPTX
Seminar9
PDF
Reversible code converter
PDF
High Speed VLSI Architecture for AES-Galois/Counter Mode
PDF
Analysis of different multiplication algorithm and FPGA implementation of rec...
PDF
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
PDF
Implementation of Stronger S-Box for Advanced Encryption Standard
PDF
Algorithms explained
PDF
SFQ MULTIPLIER
PPTX
Computer Science Programming Assignment Help
PPTX
Vedic multiplier
PDF
F044062933
PPT
32-bit unsigned multiplier by using CSLA & CLAA
PDF
JOURNAL PAPER
PPT
KRUSKAL'S algorithm from chaitra
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
Minimal spanning tree class 15
PDF
Dijkstra's Algorithm
Ieee project reversible logic gates by_amit
A low power adder using reversible logic gates
Seminar9
Reversible code converter
High Speed VLSI Architecture for AES-Galois/Counter Mode
Analysis of different multiplication algorithm and FPGA implementation of rec...
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
Implementation of Stronger S-Box for Advanced Encryption Standard
Algorithms explained
SFQ MULTIPLIER
Computer Science Programming Assignment Help
Vedic multiplier
F044062933
32-bit unsigned multiplier by using CSLA & CLAA
JOURNAL PAPER
KRUSKAL'S algorithm from chaitra
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Minimal spanning tree class 15
Dijkstra's Algorithm
Ad

Viewers also liked (11)

PDF
Korean muslim
PDF
quest diagnostics pledge_external
PPTX
Jagueryand
PDF
Certification_1852219
PDF
aislacion electrica en motores
PDF
Cadillac Fairview Office Vacancy - May 2016
PDF
certificate_How to Build a Content Program From an Area Template
PDF
Badanie przyjazności serwisów względem wyszukiwarek
PDF
Tantra chinois1 (1)
PPT
Wi max
Korean muslim
quest diagnostics pledge_external
Jagueryand
Certification_1852219
aislacion electrica en motores
Cadillac Fairview Office Vacancy - May 2016
certificate_How to Build a Content Program From an Area Template
Badanie przyjazności serwisów względem wyszukiwarek
Tantra chinois1 (1)
Wi max
Ad

Similar to Aes encryption engine for many core processor arrays for enhanced security (20)

PDF
Hardware implementation of aes encryption and decryption for low area & power...
PDF
PDF
Design and Implementation A different Architectures of mixcolumn in FPGA
PDF
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm
PPTX
A HIGH THROUGHPUT AES DESIGN
PDF
FPGA Implementation of Mix and Inverse Mix Column for AES Algorithm
PDF
Ijmsr 2016-05
PDF
A High Throughput CFA AES S-Box with Error Correction Capability
PPT
Chiffremtn asymetriqye AES Introduction.ppt
PDF
CFA based SBOX and Modified Mixcolumn Implementation of 8 Bit Datapath for AES
PDF
Aes128 bit project_report
PDF
E04612529
PPTX
A Hybrid Approach to Advanced ES Design.pptx
PPT
AES (2).ppt
PPTX
Fault Detection AES
PDF
High throughput FPGA Implementation of Advanced Encryption Standard Algorithm
PDF
Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC
PDF
An Efficient FPGA Implementation of the Advanced Encryption Standard Algorithm
PDF
Design of advanced encryption standard using Vedic Mathematics
PDF
Hardware implementation of aes encryption and decryption for low area & power...
Design and Implementation A different Architectures of mixcolumn in FPGA
FPGA Implementation of SubByte & Inverse SubByte for AES Algorithm
A HIGH THROUGHPUT AES DESIGN
FPGA Implementation of Mix and Inverse Mix Column for AES Algorithm
Ijmsr 2016-05
A High Throughput CFA AES S-Box with Error Correction Capability
Chiffremtn asymetriqye AES Introduction.ppt
CFA based SBOX and Modified Mixcolumn Implementation of 8 Bit Datapath for AES
Aes128 bit project_report
E04612529
A Hybrid Approach to Advanced ES Design.pptx
AES (2).ppt
Fault Detection AES
High throughput FPGA Implementation of Advanced Encryption Standard Algorithm
Design and Implementation of Area Efficiency AES Algoritham with FPGA and ASIC
An Efficient FPGA Implementation of the Advanced Encryption Standard Algorithm
Design of advanced encryption standard using Vedic Mathematics

More from IAEME Publication (20)

PDF
IAEME_Publication_Call_for_Paper_September_2022.pdf
PDF
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
PDF
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
PDF
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
PDF
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
PDF
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
PDF
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
PDF
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
PDF
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
PDF
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
PDF
GANDHI ON NON-VIOLENT POLICE
PDF
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
PDF
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
PDF
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
PDF
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
PDF
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
PDF
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
PDF
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
PDF
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
PDF
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
IAEME_Publication_Call_for_Paper_September_2022.pdf
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
GANDHI ON NON-VIOLENT POLICE
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
sap open course for s4hana steps from ECC to s4
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Spectroscopy.pptx food analysis technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
sap open course for s4hana steps from ECC to s4
NewMind AI Weekly Chronicles - August'25 Week I

Aes encryption engine for many core processor arrays for enhanced security

  • 1. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 106 AES ENCRYPTION ENGINE FOR MANY CORE PROCESSOR ARRAYS FOR ENHANCED SECURITY Dhanya Pushkaran¹, Neethu Bhaskar² 1 M.Tech, VLSI and Embedded system, ECE Department, SNGCE, Kolenchery 2 Assistant Professor, ECE, SNGCE, Ernakulam, India, ABSTRACT With the development of networking technology, Hardware encryption technology will become an irreplaceable safety technology become an irreplaceable safety technology. In this paper, I present the design of a very high throughput AES processor with 128 bit key on an FPGA. In order to protect the encrypted data from Power Analysis, High- throughput advanced encryption standard (AES) engine with masked S-Box is proposed. In order to analyse the throughput, we map 2 implementations of an Advanced Encryption Standard (AES) cipher with online key expansion on a fine-grained many-core system. Keywords: Advanced Encryption Standard (AES), Differential Power Analysis (DPA), Field Programmable Gate Array (FPGA), Fine-Grained, Many-Core, Parallel Processor INTRODUCTION With the development of information technology, protection of information through encryption is very important in day to day life. In 2001, national institute of standard and technology replaces the data encryption standard and select the Rijndael algorithm as the advanced encryption standard (AES) [1]. AES has been used in many applications, such as secure communication system, digital video/audio recorder, RFID tags and smart cards etc. One of the main advantage of Rijndael algorithm is that it can be used for both hardware and software implementation. To satisfy many application numerous hardware implementation of AES has been reported to achieve high throughput even though time consuming and costly. One of the main block of AES is the SubByte transformation [1] which uses S-box look-up table that is stored in memory. This data stored in storage are under the risk of information leakage in embedded applications. The differential power analysis (DPA) attack[2] was further developed as one of the most promising power analysis attacks which is related to the power consumption. So the protection of data from DPA is very important. For that instead of using S-Box lookup table masked S-Box is being implemented. We perform the masked S-Box mainly over GF(2⁴). Therefore, we only need to transform the input values from GF(2⁸) to GF(2⁴) and transform the output values back from GF(2⁴) to GF(2⁸) which reduces the hardware resources. This paper present the online expansion of two type AES implementation on a fine grained many core system to achieve high performance and throughput per unit of chip. INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) ISSN 0976 – 6464(Print) ISSN 0976 – 6472(Online) Volume 5, Issue 12, December (2014), pp. 106-111 © IAEME: http://www.iaeme.com/IJECET.asp Journal Impact Factor (2014): 7.2836 (Calculated by GISI) www.jifactor.com IJECET © I A E M E
  • 2. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 107 AES ALGORITHM AES is a key iterated block cipher that contains several round of transformation on the state. It is a symmetric encryption algorithm uses 128 bit key to generate output cipher text. It takes 128 bits of data block and each 128-bit data block is considered as a 4-by-4 array of bytes, called the state. The number of iteration in the AES, Nr, is defined by the length of the round key, which are 10 for key lengths of 128 bits. Fig 1: Block Diagram of AES Algorithm The figure 1 shows the basic steps of AES algorithm with online key expansion. The steps include: 1. SubBytes: Nonlinear bite transformation which replace each input byte with the byte value from the substitution box. Substitution box is explained in section 2. ShiftRow: Each row of the state is left shifted according to the row number. First row no shifting is done, for 2nd row 1byte shifting is done and so on. 3. MixColumn: Each column of the array is considered as a polynomial over GF(2⁸) and modular multiplication is done with irreducible polynomial x⁴+1. The resulting polynomial is then multiplied with a fixed polynomial given in equation (1). A(x) = {03}x³+ {01}x²+{01}x+{02} (1) 4. AddRoundKey: Simple bitwise XOR operation of the state with the key expanded value is done. The key expansion is done by the following steps: 1. KeySubWord: Each byte of the key value is replaced with the values from the substitution box. 2. KeyRotWord: Each row is done a 1 byte shifting to the left. 3. KeyXor: Each row w[i] is XORed with the previous row w[i-1] to form a new row w'[i].
  • 3. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 108 MASKED S-BOX In SubByte transformation, each byte is replaced with a value from S-Box. Since there are only 256 representation of 1 byte, a lookup table of S-Box can be implemented. So the power and time consumption is reduced. But this result in differential power analysis (DPA) attach [3] [4]. So here S-Box using galois field can be implemented to avoid DPA attach. It can be implemented by taking the multiplicative inverse and apply the affine transformation. But calculating the multiplicative inverse in GF(2⁸) is very expensive. So masked S-Box is implemented that calculates multiplicative inverse of GF(2⁸) using GF(2⁴). The input byte is mapped to two elements of GF(2⁴) and then find out the multiplicative inverse using GF(2⁴). After that the two elemnts inverse mapping to GF(2⁸) is done. Figure 2 shows the steps to find out the masked s-box. Multiplicative inverse For hardware implementation far better suited representation is to see field GF(2ˆ8) as a quadratic extension of the field GF(2ˆ4). In this case, an element a є GF(2ˆ8) is represented as the linear polynomial with coefficient in GF(2ˆ4) Map(a)= aһ x + al, a є GF (2ˆ8); ah, al є GF(2⁴) For hardware implementation, the equation for map (a) is shown in equation 2. ah x + al = map (a), ah, al є GF(2⁴), a є GF(2⁸) (2) aA = a1⊕ a7, aB= a5 ⊕ a7, aC= a4 ⊕ a6 al0= ac ⊕ a0 ⊕ a5, al1= a1 ⊕ a2, al2= aA, al3= a2 ⊕ a4 ah0= ac ⊕ a5, ah1= aA ⊕ aC, ah2= aB ⊕ a2 ⊕ a3, ah3= aB Fig 2: Block diagram of masked S-Box After finding out the multiplicative inverse in GF (2⁴), two term polynomial ah x + al converted back to element in GF(2⁸). The equation for map¯¹ is shown in equation 3.
  • 4. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 109 map¯¹ (ah x + al) = a, ah, al є GF(2⁴), a є GF(2⁸) (3) aA= al1 ah3, aB= ah0 ah1 a0= al0 ⊕ ah0, a1= aB ⊕ ah3, a2= aA ⊕ aB, a3= aB ⊕ al1 ⊕ ah2, a4= aA ⊕ aB ⊕ al3, a5= aB ⊕ al2, a6= aA ⊕ al2 ⊕ al3 ⊕ ah0, a7= aB ⊕ al2 ⊕ ah3 Multiplication in GF(2⁴) corresponds to multiplication of polynomial modulo an irreducible polynomial of degree 4. The irreducible polynomial is given by, M(x)= x⁴ + x+1.For hardware implementation, byte multiplication is given in equation 4. q(x) = a(x). b(x). mod m(x), a(x),b(x),q(x) є GF(2⁴) (4) aA= a0 ⊕ a3, aB= a2 ⊕ a3 q0= a0b0 ⊕ a3b1 ⊕ a2b2 ⊕ a1b3 q1= a1b0 ⊕ aAb1⊕ aBb2 ⊕ (a1 a2)b3 q2= a2b0 ⊕ a1b1 ⊕ aAb2 ⊕ aBb3 q3= a3b0 ⊕ a2b1 ⊕ a1b2⊕ aAb3 The multiplicative inverse can be find out using extended Euclidean algorithm. It can be derived by solving the equation a(x).a¯¹(x)mod m4(x)= 1. Solution is shown in equation 5. q(x) = a(x) ⁻¹ mod m₄(x), q(x), a(x) є GF(2⁴) (5) aA= a1 ⊕ a2 ⊕ a3 ⊕ a1a2a3 q0= aA⊕ a0⊕ a0a2⊕ a1a2 ⊕ a0a1a2 q1= a0a1 ⊕ a0a2 ⊕ a1a2 ⊕ a3 ⊕ a1a3 ⊕ a0a1a3 q2= a0a1 ⊕ a2 ⊕ a0a2 ⊕ a3 ⊕ a0a3 ⊕ a0a2a3 q3= aA ⊕ a0a3 ⊕ a1a3 ⊕ a2a3 Affine Transformation Affine transformation I given by, A'= M(a).X ⊕ [v] Where [v] =x⁷+x⁶+x²+x and m(a)= x⁷+x⁴+x³+x+1. The equation for hardware implementation is given in equation 6. q = aff_tran(a) q= aff_trans⁻¹ (a) (6) aA= a0 ⊕ a1, aA= a0 ⊕ a5, aB= a2 ⊕ a3 aB= a1 ⊕ a4 aC= a4 ⊕ a5, aC= a2 ⊕ a7, aD= a6 ⊕ a7 aD= a3 ⊕ a6 q0= ā0 ⊕ aC ⊕ aD q0= ā5 ⊕ aC q1= a5 ⊕ aA⊕ aD q1= a0 ⊕ aD q2= a2 ⊕ aA ⊕ aD q2= ā7 ⊕ aB q3= a7⊕ aA ⊕ aB q3= a2 ⊕ aA q4= a1⊕ aB ⊕ aC q4= a1⊕ aD q5= ā1 ⊕ aB ⊕ aCq5= a4 ⊕ aC q6= ā6 ⊕ aB ⊕ aC q6= a3 ⊕ aA q7= a3 ⊕ aC ⊕ aDq7 = a6 ⊕ aB FINE GRAINED MANY CORE ARCHITECTURE The performance of architecture is roughly proportional to the square root of its complexity. So as the complexity is decreased the performance will increase but it may increase the logical area. So a many core architecture can perform
  • 5. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 110 better with complexity. That is instead of using single complicated core many core is used, which increases the performance. AES IMPLEMENTATION In this paper I present two different AES implementation with online key expansion and the throughput of the design is measured. One task one processor (OTOP) Each step in the AES algorithm is considered as a task as shown in the dataflow diagram in figure 3. Each task is mapped on to one processor in many core processors. So we call this implementation One Task One processor. For single iteration about 10 cores are required and after completing first iteration the same cores are used for the following iteration. Figure 3: OTOP dataflow diagram Loop unrolled nine times To enhance the throughput, new design is implemented as shown in figure 4. Here each loop is done by another set of core. So loop unrolled nine times break the data dependency and work on multiple data block. About 60 cores are required to implement this design. Figure 4: loop unrolled nine times data flow diagram RESULT I have implemented the proposed design with hardware description language which is synthesized using Xilinx ISE 14.1 and ported the design to Spartan-6 LX45 FPGA. The table 1 shows the throughput obtained from the two designs. From this table it is clear that the loop unrolled nine times design is very much faster than one task one processor design. Implementation Throughput One Task One Processor 1.98 Gbps Loop Unrolled Nine Times 85.15 Gbps
  • 6. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14) 30 – 31, December 2014, Ernakulam, India 111 CONCLUSION Secure “data-at-rest” and enhance the throughput are the important factor for large data transformation system. so, modern systems shift the data encryption from a software platform to a hardware platform. But the hardware based encryption still facing the possibility of DPA attacks. In this case, an AES with masked S-box has been proposed to resist the DPA attach with acceptable area on FPGA. The proposed masked -Box needs to map the input values from GF(2⁸) to GF(2⁴) at the beginning of the operation and map the result back from GF(2⁴) to GF(2⁸) once at the end of the operation Which reduce about 20% area resources. ACKNOWLEDGMENT I would like to express my heartfelt gratitude and thanks to my beloved guide Ms. Neethu Bhaskar, Assistant Professor, Dept. of Electronics and Communication Engineering, SNGCE Kadayiruppu, whose guidance I could complete the thesis work to the level I had planned, for the regular reviews and suggestions. It gives me great pleasure to thank her for the conviction she brought in into selecting the topic of work, and the technical and literary guidance she imparted through the different stages of its execution. REFERENCES [1] Advanced Encryption Standard (AES), FIPS-197, Nat. Inst. of Standards and Technol., 2001. [2] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Proc.CRYPTO, 1999, vol. LNCS 1666, pp. 388–397. [3] L. Goubin and J. Patarin, “DES and differential power analysis (the ‘duplication’ method),” in Proc. CHES LNCS, 1999, vol. 1717, pp. 158–172. [4] S. Messerges, “Securing the AES finalists against power analysis attacks,” in Proc. FSE LNCS, 2000, vol. 1978, pp. 150–164. [5] S.K. Mathew, F. Sheikh, M. Kounavis, S. Gueron, A. Agarwal, S.K.Hsu, H. Kaul, M.A. Anders, and R.K. Krishnamurthy, “53 gbps Native GF(ð24Þ2) Composite-Field AES-Encrypt/Decrypt Accelerator for Content- Protection in 45 nm High-Performance Microprocessors,” IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 767- 776, Apr. 2011. [6] A. Hodjat and I. Verbauwhede, “A 21.54 gbits/s Fully Pipelined AES Processor on FPGA,” Proc. IEEE 12th Ann. Symp. Field-Programmable Custom Computing Machines, pp. 308-309, Apr. 2004. [7] C.-J. Chang, C.-W. Huang, K.-H. Chang, Y.-C. Chen, and C.-C.Hsieh, “High Throughput 32-Bit AES Implementation in FPGA,” Proc. IEEE Asia Pacific Conf. Circuits and Systems, pp. 1806-1809, Nov. 2008. [8] M. McLoone and J. V. McCanny, “Rijndael FPGA implementations utilizing look-up tables,” in Proc. IEEE Workshop Signal Process. Syst., Antwerp, Belgium, 2001, pp. 349–360. [9] V. Rijmen, “Efficient Implementation of the Rijndael S-Box,” Dept. ESAT., Katholieke Universiteit Leuven, Leuven, Belgium, 2006. [Online] Available: http://www.networkdls.com/Articles/sbox.pdf [10] A. Hodjat and I. Verbauwhede, “A 21.54 Gbits/s fully pipelined processor on FPGA,” in Proc. IEEE 12th Annu. Symp. Field-Programm. Custom Comput. Mach., 2004, pp. 308–309. [11] S. Mangard, N. Pramstaller, and E. Oswald, “Successfully attacking masked AES hardware implementations,” in Proc. CHES LNCS, 2005, vol. 3659, pp. 157–171. [12] E. Oswald, S. Mangard, N. Pramstaller, and V. Rijmen, “A side-channel analysis resistant description of the AES S-box,” in Proc. FSE LNCS, Setubal, Potugal, 2005, vol. 3557, pp. 413–423. [13] H. Kim, S. Hong, and J. Lim, “A fast and provably secure higher-order masking of AES S-box,” in Proc. CHES LNCS, Nara, Japan, 2011, vol. 6917, pp. 95–107. For masked AES implementation,” in Proc. IEEE 54th Int. MWSCAS, Seoul, Korea, 2011, pp. 1–4. [14] M. Alam, S. Ghosh, M. J. Mohan, D. Mukhopadhyay, D. R. Chowdhury, and I. S. Gupta, “Effect of glitches against masked AES S-box implementation and countermeasure,” IET Inf. Security, vol. 3, no. 1, pp. 34–44, Feb. 2009. [15] E. Trichina, T. Korkishko, and K. H. Lee, “Small size, low power, side channel-immune AES coprocessor: Design and synthesis results,” in Proc. AES LNCS, 2005, vol. 3373, pp. 113–127. [16] S. K. Mathew, F. Sheikh, M. Kounavis, S. Gueron, A. Agarwal, S. K. Hsu, H. Kaul, M. A. Anders and R. K. Krishnamurthy, “53 Gbps native GF(24)2 composite-field AES-encrypt/decrypt accelerator for content- protection in 45 nm high-performance microprocessors,” IEE. [17] Anubhav Gupta and Harish Bansal, “Design of Area Optimized AES Encryption Core using Pipelining Technology”, International Journal of Electronics and Communication Engineering & Technology (IJECET), Volume 4, Issue 2, 2013, pp. 308 - 314, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472.