SlideShare a Scribd company logo
Low-Complexity Low-Latency
Architecture for Matching
of Data Encoded With Hard
Systematic
Error-Correcting Codes
By
Bhavya K V
Mtech 1st Sem
JSSATE
Overview
 Abstract
 Introduction
 Decode and Compare architecture
 Encode and compare architecture
 Proposed architecture
 Evaluation
 Conclusion
Abstract
 A new architecture for matching the data, protected
with an ECC is presented to reduce latency and
complexity.
 We know that the code word of an ECC is usually
represented in a systematic form consisting of the
raw data and the parity information generated by
encoding.
 The proposed architecture parallelizes the
comparison of the data and that of the parity
information.
 To further reduce the latency and complexity, in
addition, a new butterfly-formed weight accumulator
(BWA) is proposed for the efficient computation of
the Hamming distance.
Introduction
 Data comparison is widely used in computing, such
as the tag matching in a cache memory and the
virtual-to-physical address translation in a translation
look aside buffer(TLB).
 The circuit, therefore, must be designed to have as
low latency as possible, or the components will be
disqualified, where the overall performance of the
whole system would be severely deteriorated.
Decode and Compare
architecture
 Recent computers employ (ECCs) to protect data and
improve reliability that has complicated decoding
procedure, which must precede the data comparison,
which increases the critical path and complexity.
Encode and compare
architecture Which encodes the incoming data and then compares it with the
retrieved data.
 Therefore, the method eliminates the complex decoding from the
critical path.
 In performing the comparison, the method does not examine
whether the retrieved data is exactly the same as the incoming
data. Instead, it checks if the retrieved data resides in the error
correctable range of the code word corresponding to the incoming
data.
Let tmax and rmax denote the numbers of maximally
correctable a detectable errors, respectively. The cases
are summarized as follows.
1) If d = 0, X matches Y exactly.
2) If 0 < d <= tmax, X will match Y provided at most
tmax errors in Y are corrected.
3) If tmax < d <= rmax, Y has detectable but
uncorrectable errors. In this case, the cache may issue a
system fault so as to make the central processing unit
take a proper action.
4) If rmax < d, X does not match Y.
 To compute the Hamming distance
 The following half adders (HAs) are used to count
the number of 1’s in two adjacent bits in the
vector. The numbers of 1’s are accumulated by
passing through the following SA tree.
Disadvantages of encode and
compare architecture
 For the checking necessitates an additional circuit
i.e., saturate adder(SA) block are included for
calculating the Hamming distance.
 Effectiveness of a practical ECC code word is
usually represented in a systematic form in which
the data and parity parts are together with each
other.
 In addition, as the SA always forces its output not
to be greater than the number of detectable errors
by more than one, it contributes to the increase of
the entire circuit complexity.
Proposed architecture
 This paper presents a new architecture
that can reduce the latency and
complexity of the data comparison by
using the characteristics of systematic
codes.
A. Datapath Design for Systematic Codes.
B. Architecture for Computing the
Hamming Distance.
C. General Expressions for the Complexity
and the Latency.
Datapath Design for Systematic
Codes
 In the SA-based architecture the comparison of two code
words is invoked after the incoming tag is encoded.
Therefore, the critical path consists of a series of the
encoding and the n-bit comparison as shown in Fig..
 However this method did not consider the fact that is the ECC
code word is of a systematic form in which the data and
parity parts are completely separated as shown in Fig.
 As the data part of a systematic code word is
exactly the same as the incoming tag field, it is
immediately available for comparison while the
parity part becomes available only after the
encoding is completed.
 The comparison of the k-bit tags can be started
before the remaining (n–k)-bit comparison of the
parity bits.
Proposed architecture optimized
for systematic code words
Architecture for Computing the
Hamming Distance
 The Datapath design contains multiple butterfly-
formed weight accumulators (BWAs) proposed to
improve the latency and complexity of the
Hamming distance computation.
 The basic function of the BWA is to count the
number of 1’s among its input bits. It consists of
multiple stages of HAs as shown in Fig a.
.
 Since what we need is not the precise Hamming
distance but the range it belongs to, it is possible to
simplify the circuit.
 In that case, we can replace several HAs with a simple
OR-gate tree as shown in Fig.(b).
General Expressions for the
Complexity and the Latency
 The complexity of the proposed architecture, C,
can be expressed as
 Where CXOR, CENC, C2nd, CDU, and
CBWA(n) are the complexities of XOR banks,
an encoder, the second level circuits, the decision
unit, and a BWA for n inputs, respectively.
 The latency of the proposed architecture, L, can be expressed as
 where LXOR, LENC, L2nd, LDU, and LBWA(n) are the
latencies of an XOR bank, an encoder, the second level circuits,
the decision unit, and a BWA for n inputs, respectively.
 As one of BWAs at the first level finishes earlier than the other,
some components at the second level may start earlier.
 Similarly, some BWAs or the OR-gate tree at the second level
may provide their output earlier to the decision unit so that the
unit can begin its operation without waiting for all of its inputs.
 In such cases, L2nd and LDU can be partially hidden by the
critical path of the preceding circuits, and L becomes shorter
than the given expression.
Evaluation
For a set of four codes including the (31, 25) code, Table shows
the latencies and hardware complexities resulting from three
architectures:
1) the conventional decode-and-compare;
2) the SA-based direct compare; and
3) the proposed ones.
 As shown in Table the proposed architecture is
effective in reducing the latency as well as the
hardware complexity even with considering the
practical factors.
 Latencies of the SA-based architecture and the
proposed one are dominated by SAs and HAs,
respectively.
 As the bit-width doubles, at least one more stage of
SAs or HAs needs to be added.
Conclusion
 The proposed architecture examines whether the
incoming data matches the stored data if a certain
number of erroneous bits are corrected.
 To reduce the latency, the comparison of the data is
parallelized with the encoding process that generates
the parity information.
 As the proposed architecture is effective in reducing
the latency as well as the complexity considerably, it
can be regarded as a promising solution for the
comparison of ECC-protected data.
 Though this brief focuses only on the tag match of a
cache memory, the proposed method is applicable to
applications that need such comparison.
References
[1] J. Chang, M. Huang, J. Shoemaker, J. Benoit, S.-L. Chen, W. Chen, S. Chiu, R. Ganesan, G.
Leong, V. Lukka, S. Rusu, and D. Srivastava, “The 65-nm 16-MB shared on-die L3 cache for the
dual-core Intel xeon processor 7100 series,” IEEE J. Solid-State Circuits, vol. 42, no. 4, pp. 846–
852, Apr. 2007.
[2] J. D. Warnock, Y.-H. Chan, S. M. Carey, H. Wen, P. J. Meaney, G. Gerwig,H. H. Smith, Y. H.
Chan, J. Davis, P. Bunce, A. Pelella, D. Rodko, P. Patel, T. Strach, D. Malone, F. Malgioglio, J.
Neves, D. L. Rude, and W. V. Huott “Circuit and physical design implementation of the
microprocessor chip for the zEnterprise system,” IEEE J. Solid-State Circuits, vol. 47, no. 1, pp.
151–163, Jan. 2012.
[3] H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, K. Morita, T. Muta, and T.
Motokurumada, S. Okada, H. Yamashita, and Y. Satsukawa, “A 1.3 GHz fifth generation
SPARC64 microprocessor,” in IEEE ISSCC. Dig. Tech. Papers, Feb. 2003, pp. 246–247.
[4] M. Tremblay and S. Chaudhry, “A third-generation 65nm 16-core 32-thread plus 32-scout-
thread CMT SPARC processor,” in ISSCC. Dig. Tech. Papers, Feb. 2008, pp. 82–83.
[5] AMD Inc. (2010). Family 10h AMD Opteron Processor Product Data Sheet, Sunnyvale, CA,
USA [Online]. Available: http://support.amd.com/us/Processor_TechDocs/40036.pdf
[6] W. Wu, D. Somasekhar, and S.-L. Lu, “Direct compare of information coded with error-
correcting codes,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 2147–
2151, Nov. 2012.
[7] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, 2nd ed.
Englewood Cliffs, NJ, USA: Prentice-Hall, 2004.
[8] Y. Lee, H. Yoo, and I.-C. Park, “6.4Gb/s multi-threaded BCH encoder and decoder for multi-
channel SSD controllers,” in ISSCC Dig. Tech. Papers, 2012, pp. 426–427.
THANK YOU

More Related Content

DOCX
FractalTreeIndex
PPTX
SYBSC IT COMPUTER NETWORKS UNIT III Connecting LANs, Backbone Networks, and V...
PPT
SYBSC IT COMPUTER NETWORKS UNIT III Wired LANS: Ethernet
PDF
IRJET- A Survey on Reconstruct Structural Design of FPGA
DOCX
TransparentInterconnectionsofLotofLinks
PPTX
Unit 1 ca-introduction
PDF
20CS2008 Computer Networks
PDF
Evaluation of High Speed and Low Memory Parallel Prefix Adders
FractalTreeIndex
SYBSC IT COMPUTER NETWORKS UNIT III Connecting LANs, Backbone Networks, and V...
SYBSC IT COMPUTER NETWORKS UNIT III Wired LANS: Ethernet
IRJET- A Survey on Reconstruct Structural Design of FPGA
TransparentInterconnectionsofLotofLinks
Unit 1 ca-introduction
20CS2008 Computer Networks
Evaluation of High Speed and Low Memory Parallel Prefix Adders

What's hot (20)

PPS
Introduction to the OSI 7 layer model and Data Link Layer
PDF
Shortest Tree Routing With Security In Wireless Sensor Networks
PDF
Analysis of Algorithms II - PS3
PDF
International Journal of Engineering Research and Development
PDF
Traffic Engineering in Metro Ethernet
PDF
640 802-study-guide-sample
PDF
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
PDF
Ak04605259264
PDF
Fpga sotcore architecture for lifting scheme revised
PDF
An Enhanced Performance Pipelined Bus Invert Coding For Power Optimization Of...
PPT
Chapter 2 - Network Models
PDF
parallel Questions &amp; answers
PDF
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
PPT
Chap 03
PDF
Ek35775781
PDF
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
PPTX
Final several design issues at network layer
PDF
An efficient model for design of 64-bit High Speed Parallel Prefix VLSI adder
PDF
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
Introduction to the OSI 7 layer model and Data Link Layer
Shortest Tree Routing With Security In Wireless Sensor Networks
Analysis of Algorithms II - PS3
International Journal of Engineering Research and Development
Traffic Engineering in Metro Ethernet
640 802-study-guide-sample
IRJET- Review Paper on Radix-2 DIT Fast Fourier Transform using Reversible Gate
Ak04605259264
Fpga sotcore architecture for lifting scheme revised
An Enhanced Performance Pipelined Bus Invert Coding For Power Optimization Of...
Chapter 2 - Network Models
parallel Questions &amp; answers
Design of High Speed and Low Power Veterbi Decoder for Trellis Coded Modulati...
Chap 03
Ek35775781
ANALOG MODELING OF RECURSIVE ESTIMATOR DESIGN WITH FILTER DESIGN MODEL
Final several design issues at network layer
An efficient model for design of 64-bit High Speed Parallel Prefix VLSI adder
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
Ad

Similar to Low complexity low-latency architecture for matching (20)

PDF
Systematic error codes implimentation for matched data encoded 47405
PDF
E010422834
PDF
Design of high speed adders for efficient digital design blocks
PDF
An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...
PDF
REDUCED COMPLEXITY QUASI-CYCLIC LDPC ENCODER FOR IEEE 802.11N
PDF
Fa2c4eb1e3582a1a36255a82b258cb03a7dc
PDF
Fa2c4eb1e3582a1a36255a82b258cb03a7dc
PDF
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
PDF
www.ijerd.com
PDF
StateKeeper Report
PDF
Area efficient parallel LFSR for cyclic redundancy check
PDF
C0423014021
PDF
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
PPTX
Xdr ppt
PDF
Paper id 25201467
PDF
An efficient reconfigurable code rate cooperative low-density parity check co...
PDF
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
PDF
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
PDF
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
PDF
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Systematic error codes implimentation for matched data encoded 47405
E010422834
Design of high speed adders for efficient digital design blocks
An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...
REDUCED COMPLEXITY QUASI-CYCLIC LDPC ENCODER FOR IEEE 802.11N
Fa2c4eb1e3582a1a36255a82b258cb03a7dc
Fa2c4eb1e3582a1a36255a82b258cb03a7dc
LOGIC OPTIMIZATION USING TECHNOLOGY INDEPENDENT MUX BASED ADDERS IN FPGA
www.ijerd.com
StateKeeper Report
Area efficient parallel LFSR for cyclic redundancy check
C0423014021
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Xdr ppt
Paper id 25201467
An efficient reconfigurable code rate cooperative low-density parity check co...
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Ad

Recently uploaded (20)

PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Computing-Curriculum for Schools in Ghana
PDF
IGGE1 Understanding the Self1234567891011
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Trump Administration's workforce development strategy
PDF
advance database management system book.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Computing-Curriculum for Schools in Ghana
IGGE1 Understanding the Self1234567891011
FORM 1 BIOLOGY MIND MAPS and their schemes
Trump Administration's workforce development strategy
advance database management system book.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Unit 4 Computer Architecture Multicore Processor.pptx
Computer Architecture Input Output Memory.pptx
Weekly quiz Compilation Jan -July 25.pdf
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Hazard Identification & Risk Assessment .pdf
Chinmaya Tiranga quiz Grand Finale.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf

Low complexity low-latency architecture for matching

  • 1. Low-Complexity Low-Latency Architecture for Matching of Data Encoded With Hard Systematic Error-Correcting Codes By Bhavya K V Mtech 1st Sem JSSATE
  • 2. Overview  Abstract  Introduction  Decode and Compare architecture  Encode and compare architecture  Proposed architecture  Evaluation  Conclusion
  • 3. Abstract  A new architecture for matching the data, protected with an ECC is presented to reduce latency and complexity.  We know that the code word of an ECC is usually represented in a systematic form consisting of the raw data and the parity information generated by encoding.  The proposed architecture parallelizes the comparison of the data and that of the parity information.  To further reduce the latency and complexity, in addition, a new butterfly-formed weight accumulator (BWA) is proposed for the efficient computation of the Hamming distance.
  • 4. Introduction  Data comparison is widely used in computing, such as the tag matching in a cache memory and the virtual-to-physical address translation in a translation look aside buffer(TLB).  The circuit, therefore, must be designed to have as low latency as possible, or the components will be disqualified, where the overall performance of the whole system would be severely deteriorated.
  • 5. Decode and Compare architecture  Recent computers employ (ECCs) to protect data and improve reliability that has complicated decoding procedure, which must precede the data comparison, which increases the critical path and complexity.
  • 6. Encode and compare architecture Which encodes the incoming data and then compares it with the retrieved data.  Therefore, the method eliminates the complex decoding from the critical path.  In performing the comparison, the method does not examine whether the retrieved data is exactly the same as the incoming data. Instead, it checks if the retrieved data resides in the error correctable range of the code word corresponding to the incoming data.
  • 7. Let tmax and rmax denote the numbers of maximally correctable a detectable errors, respectively. The cases are summarized as follows. 1) If d = 0, X matches Y exactly. 2) If 0 < d <= tmax, X will match Y provided at most tmax errors in Y are corrected. 3) If tmax < d <= rmax, Y has detectable but uncorrectable errors. In this case, the cache may issue a system fault so as to make the central processing unit take a proper action. 4) If rmax < d, X does not match Y.
  • 8.  To compute the Hamming distance  The following half adders (HAs) are used to count the number of 1’s in two adjacent bits in the vector. The numbers of 1’s are accumulated by passing through the following SA tree.
  • 9. Disadvantages of encode and compare architecture  For the checking necessitates an additional circuit i.e., saturate adder(SA) block are included for calculating the Hamming distance.  Effectiveness of a practical ECC code word is usually represented in a systematic form in which the data and parity parts are together with each other.  In addition, as the SA always forces its output not to be greater than the number of detectable errors by more than one, it contributes to the increase of the entire circuit complexity.
  • 10. Proposed architecture  This paper presents a new architecture that can reduce the latency and complexity of the data comparison by using the characteristics of systematic codes. A. Datapath Design for Systematic Codes. B. Architecture for Computing the Hamming Distance. C. General Expressions for the Complexity and the Latency.
  • 11. Datapath Design for Systematic Codes  In the SA-based architecture the comparison of two code words is invoked after the incoming tag is encoded. Therefore, the critical path consists of a series of the encoding and the n-bit comparison as shown in Fig..  However this method did not consider the fact that is the ECC code word is of a systematic form in which the data and parity parts are completely separated as shown in Fig.
  • 12.  As the data part of a systematic code word is exactly the same as the incoming tag field, it is immediately available for comparison while the parity part becomes available only after the encoding is completed.  The comparison of the k-bit tags can be started before the remaining (n–k)-bit comparison of the parity bits.
  • 13. Proposed architecture optimized for systematic code words
  • 14. Architecture for Computing the Hamming Distance  The Datapath design contains multiple butterfly- formed weight accumulators (BWAs) proposed to improve the latency and complexity of the Hamming distance computation.  The basic function of the BWA is to count the number of 1’s among its input bits. It consists of multiple stages of HAs as shown in Fig a. .
  • 15.  Since what we need is not the precise Hamming distance but the range it belongs to, it is possible to simplify the circuit.  In that case, we can replace several HAs with a simple OR-gate tree as shown in Fig.(b).
  • 16. General Expressions for the Complexity and the Latency  The complexity of the proposed architecture, C, can be expressed as  Where CXOR, CENC, C2nd, CDU, and CBWA(n) are the complexities of XOR banks, an encoder, the second level circuits, the decision unit, and a BWA for n inputs, respectively.
  • 17.  The latency of the proposed architecture, L, can be expressed as  where LXOR, LENC, L2nd, LDU, and LBWA(n) are the latencies of an XOR bank, an encoder, the second level circuits, the decision unit, and a BWA for n inputs, respectively.  As one of BWAs at the first level finishes earlier than the other, some components at the second level may start earlier.  Similarly, some BWAs or the OR-gate tree at the second level may provide their output earlier to the decision unit so that the unit can begin its operation without waiting for all of its inputs.  In such cases, L2nd and LDU can be partially hidden by the critical path of the preceding circuits, and L becomes shorter than the given expression.
  • 18. Evaluation For a set of four codes including the (31, 25) code, Table shows the latencies and hardware complexities resulting from three architectures: 1) the conventional decode-and-compare; 2) the SA-based direct compare; and 3) the proposed ones.
  • 19.  As shown in Table the proposed architecture is effective in reducing the latency as well as the hardware complexity even with considering the practical factors.  Latencies of the SA-based architecture and the proposed one are dominated by SAs and HAs, respectively.  As the bit-width doubles, at least one more stage of SAs or HAs needs to be added.
  • 20. Conclusion  The proposed architecture examines whether the incoming data matches the stored data if a certain number of erroneous bits are corrected.  To reduce the latency, the comparison of the data is parallelized with the encoding process that generates the parity information.  As the proposed architecture is effective in reducing the latency as well as the complexity considerably, it can be regarded as a promising solution for the comparison of ECC-protected data.  Though this brief focuses only on the tag match of a cache memory, the proposed method is applicable to applications that need such comparison.
  • 21. References [1] J. Chang, M. Huang, J. Shoemaker, J. Benoit, S.-L. Chen, W. Chen, S. Chiu, R. Ganesan, G. Leong, V. Lukka, S. Rusu, and D. Srivastava, “The 65-nm 16-MB shared on-die L3 cache for the dual-core Intel xeon processor 7100 series,” IEEE J. Solid-State Circuits, vol. 42, no. 4, pp. 846– 852, Apr. 2007. [2] J. D. Warnock, Y.-H. Chan, S. M. Carey, H. Wen, P. J. Meaney, G. Gerwig,H. H. Smith, Y. H. Chan, J. Davis, P. Bunce, A. Pelella, D. Rodko, P. Patel, T. Strach, D. Malone, F. Malgioglio, J. Neves, D. L. Rude, and W. V. Huott “Circuit and physical design implementation of the microprocessor chip for the zEnterprise system,” IEEE J. Solid-State Circuits, vol. 47, no. 1, pp. 151–163, Jan. 2012. [3] H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, K. Morita, T. Muta, and T. Motokurumada, S. Okada, H. Yamashita, and Y. Satsukawa, “A 1.3 GHz fifth generation SPARC64 microprocessor,” in IEEE ISSCC. Dig. Tech. Papers, Feb. 2003, pp. 246–247. [4] M. Tremblay and S. Chaudhry, “A third-generation 65nm 16-core 32-thread plus 32-scout- thread CMT SPARC processor,” in ISSCC. Dig. Tech. Papers, Feb. 2008, pp. 82–83. [5] AMD Inc. (2010). Family 10h AMD Opteron Processor Product Data Sheet, Sunnyvale, CA, USA [Online]. Available: http://support.amd.com/us/Processor_TechDocs/40036.pdf [6] W. Wu, D. Somasekhar, and S.-L. Lu, “Direct compare of information coded with error- correcting codes,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 2147– 2151, Nov. 2012. [7] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2004. [8] Y. Lee, H. Yoo, and I.-C. Park, “6.4Gb/s multi-threaded BCH encoder and decoder for multi- channel SSD controllers,” in ISSCC Dig. Tech. Papers, 2012, pp. 426–427.