SlideShare a Scribd company logo
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 371 
(MAD) algorithm introduced in [1], [2] is used for coefficients quanti-zation. 
The subfilter is based on canonical signed digit (CSD) structure 
and Carry-Save adders are used. Tables III, IV, and V show the results 
of area, power, and critical path delay, synthesized by Design Compiler 
[10] with 45-nm technology. 
VI. CONCLUSION 
In this paper, we have presented new parallel FIR filter structures, 
which are beneficial to symmetric convolutions when the number of 
taps is the multiple of 2 or 3. Multipliers are the major portions in hard-ware 
consumption for the parallel FIR filter implementation. The pro-posed 
new structure exploits the nature of even symmetric coefficients 
and save a significant amount of multipliers at the expense of addi-tional 
adders. Since multipliers outweigh adders in hardware cost, it is 
profitable to exchange multipliers with adders. Moreover, the number 
of increased adders stays still when the length of FIR filter becomes 
large, whereas the number of reduced multipliers increases along with 
the length of FIR filter. Consequently, the larger the length of FIR fil-ters 
is, the more the proposed structures can save from the existing FFA 
structures, with respect to the hardware cost. Overall, in this paper, we 
have provided new parallel FIR structures consisting of advantageous 
polyphase decompositions dealing with symmetric convolutions com-paratively 
better than the existing FFA structures in terms of hardware 
consumption. 
REFERENCES 
[1] D. A. Parker and K. K. Parhi, “Low-area/power parallel FIR digital 
filter implementations,” J. VLSI Signal Process. Syst., vol. 17, no. 1, 
pp. 75–92, 1997. 
[2] J. G. Chung and K. K. Parhi, “Frequency-spectrum-based low-area 
low-power parallel FIR filter design,” EURASIP J. Appl. Signal 
Process., vol. 2002, no. 9, pp. 444–453, 2002. 
[3] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Im-plementation. 
New York: Wiley, 1999. 
[4] Z.-J. Mou and P. Duhamel, “Short-length FIR filters and their use in 
fast nonrecursive filtering,” IEEE Trans. Signal Process., vol. 39, no. 
6, pp. 1322–1332, Jun. 1991. 
[5] J. I. Acha, “Computational structures for fast implementation of L-path 
and L-block digital filters,” IEEE Trans. Circuit Syst., vol. 36, no. 6, pp. 
805–812, Jun. 1989. 
[6] C. Cheng and K. K. Parhi, “Hardware efficient fast parallel FIR filter 
structures based on iterated short convolution,” IEEE Trans. Circuits 
Syst. I, Reg. Papers, vol. 51, no. 8, pp. 1492–1500, Aug. 2004. 
[7] C. Cheng and K. K. Parhi, “Furthur complexity reduction of parallel 
FIR filters,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 2005), 
Kobe, Japan, May 2005. 
[8] C. Cheng and K. K. Parhi, “Low-cost parallel FIR structures with 
2-stage parallelism,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 
54, no. 2, pp. 280–290, Feb. 2007. 
[9] I.-S. Lin and S. K. Mitra, “Overlapped block digital filtering,” IEEE 
Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no. 8, 
pp. 586–596, Aug. 1996. 
[10] “Design Compiler User Guide,” ver. B-2008.09, Synopsys Inc., Sep. 
2008. 
Low-Power and Area-Efficient Carry Select Adder 
B. Ramkumar and Harish M Kittur 
Abstract—Carry Select Adder (CSLA) is one of the fastest adders used 
in many data-processing processors to perform fast arithmetic functions. 
From the structure of the CSLA, it is clear that there is scope for reducing 
the area and power consumption in the CSLA. This work uses a simple and 
efficient gate-level modification to significantly reduce the area and power 
of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root 
CSLA (SQRT CSLA) architecture have been developed and compared with 
the regular SQRT CSLA architecture. The proposed design has reduced 
area and power as compared with the regular SQRT CSLA with only a 
slight increase in the delay. This work evaluates the performance of the 
proposed designs in terms of delay, area, power, and their products by 
hand with logical effort and through custom design and layout in 0.18- m 
CMOS process technology. The results analysis shows that the proposed 
CSLA structure is better than the regular SQRT CSLA. 
Index Terms—Application-specific integrated circuit (ASIC), area-effi-cient, 
CSLA, low power. 
I. INTRODUCTION 
Design of area- and power-efficient high-speed data path logic sys-tems 
are one of the most substantial areas of research in VLSI system 
design. In digital adders, the speed of addition is limited by the time 
required to propagate a carry through the adder. The sum for each bit 
position in an elementary adder is generated sequentially only after the 
previous bit position has been summed and a carry propagated into the 
next position. 
The CSLA is used in many computational systems to alleviate the 
problem of carry propagation delay by independently generating mul-tiple 
carries and then select a carry to generate the sum [1]. However, 
the CSLA is not area efficient because it uses multiple pairs of Ripple 
Carry Adders (RCA) to generate partial sum and carry by considering 
carry input    and   , then the final sum and carry are 
selected by the multiplexers (mux). 
The basic idea of this work is to use Binary to Excess-1 Converter 
(BEC) instead of RCA with    in the regular CSLA to achieve 
lower area and power consumption [2]–[4]. The main advantage of this 
BEC logic comes from the lesser number of logic gates than the -bit 
Full Adder (FA) structure. The details of the BEC logic are discussed 
in Section III. 
This brief is structured as follows. Section II deals with the delay 
and area evaluation methodology of the basic adder blocks. Section III 
presents the detailed structure and the function of the BEC logic. The 
SQRT CSLA has been chosen for comparison with the proposed de-sign 
as it has a more balanced delay, and requires lower power and 
area [5], [6]. The delay and area evaluation methodology of the regular 
and modified SQRT CSLA are presented in Sections IV and V, respec-tively. 
The ASIC implementation details and results are analyzed in 
Section VI. Finally, the work is concluded in Section VII. 
Manuscript received May 12, 2010; revised October 28, 2010; accepted De-cember 
15, 2010. Date of publication January 24, 2011; date of current version 
January 18, 2012. 
The authors are with the School of Electronics Engineering, VIT University, 
Vellore 632 014, India (e-mail: ramkumar.b@vit.ac.in; kittur@vit.ac.in). 
Color versions of one or more of the figures in this paper are available online 
at http://ieeexplore.ieee.org. 
Digital Object Identifier 10.1109/TVLSI.2010.2101621 
1063-8210/$26.00 © 2011 IEEE
372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 
Fig. 1. Delay and Area evaluation of an XOR gate. 
Fig. 2. 4-b BEC. 
Fig. 3. 4-b BEC with 8:4 mux. 
II. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASIC 
ADDER BLOCKS 
The AND, OR, and Inverter (AOI) implementation of an XOR gate is 
shown in Fig. 1. The gates between the dotted lines are performing the 
operations in parallel and the numeric representation of each gate indi-cates 
the delay contributed by that gate. The delay and area evaluation 
methodology considers all gates to be made up of AND, OR, and In-verter, 
each having delay equal to 1 unit and area equal to 1 unit. We 
then add up the number of gates in the longest path of a logic block 
that contributes to the maximum delay. The area evaluation is done by 
counting the total number of AOI gates required for each logic block. 
Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder 
(HA), and FA are evaluated and listed in Table I. 
TABLE I 
DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA 
TABLE II 
FUNCTION TABLE OF THE 4-b BEC 
III. BEC 
As stated above the main idea of this work is to use BEC instead of 
the RCA with    in order to reduce the area and power consump-tion 
of the regular CSLA. To replace the -bit RCA, an -bit BEC 
is required. A structure and the function table of a 4-b BEC are shown 
in Fig. 2 and Table II, respectively. 
Fig. 3 illustrates how the basic function of the CSLA is obtained by 
using the 4-bit BEC together with the mux. One input of the 8:4 mux 
gets as it input (B3, B2, B1, and B0) and another input of the mux is the 
BEC output. This produces the two possible partial results in parallel 
and the mux is used to select either the BEC output or the direct inputs 
according to the control signal Cin. The importance of the BEC logic 
stems from the large silicon area reduction when the CSLA with large 
number of bits are designed. The Boolean expressions of the 4-bit BEC 
is listed as (note the functional symbols  NOT,  	
XOR)
IV. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR 
16-B SQRT CSLA 
The structure of the 16-b regular SQRT CSLA is shown in Fig. 4. It 
has five groups of different size RCA. The delay and area evaluation of 
each group are shown in Fig. 5, in which the numerals within [] specify 
the delay values, e.g., sum2 requires 10 gate delays. The steps leading 
to the evaluation are as follows. 
1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based on 
. 
the consideration of delay values of Table I, the arrival time of 
selection input    of 6:3 mux is earlier than   
 and later than   Thus,    is summation of 
 and    and
is summation of  and 
mux.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 373 
Fig. 4. Regular 16-b SQRT CSLA. 
Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) 
group3, (c) group4, and (d) group5. F is a Full Adder. 
2) Except for group2, the arrival time of mux selection input is al-ways 
greater than the arrival time of data outputs from the RCA’s. 
Thus, the delay of group3 to group5 is determined, respectively as 
follows:

More Related Content

PDF
Iaetsd 128-bit area
PDF
Implementation of Low Power and Area-Efficient Carry Select Adder
PDF
Eq36876880
PDF
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
PDF
Design and Estimation of delay, power and area for Parallel prefix adders
PDF
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
PDF
IRJET- FPGA Implementation of High Speed and Low Power Speculative Adder
PDF
IRJET- The RTL Model of a Reconfigurable Pipelined MCM
Iaetsd 128-bit area
Implementation of Low Power and Area-Efficient Carry Select Adder
Eq36876880
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
Design and Estimation of delay, power and area for Parallel prefix adders
IRJET - High Speed Inexact Speculative Adder using Carry Look Ahead Adder...
IRJET- FPGA Implementation of High Speed and Low Power Speculative Adder
IRJET- The RTL Model of a Reconfigurable Pipelined MCM

What's hot (20)

PDF
Cq25550554
PDF
Bu34437441
PDF
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
PDF
IRJET- A Survey on Reconstruct Structural Design of FPGA
PDF
128 bit low power and area efficient carry select adder amit bakshi academia
PPT
32-bit unsigned multiplier by using CSLA & CLAA
PDF
IRJET - A Speculative Approximate Adder for Error Recovery Unit
PDF
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...
PDF
J045075661
PDF
IRJET- Accuracy Configurable Adder
PDF
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
PDF
Area efficient parallel LFSR for cyclic redundancy check
PDF
High Performance MAC Unit for FFT Implementation
PDF
Spatialization Parameter Estimation in MDCT Domain for Stereo Audio
PDF
Evaluation of High Speed and Low Memory Parallel Prefix Adders
PDF
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
PDF
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
PDF
Arithmetic Operations in Multi-Valued Logic
PDF
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Cq25550554
Bu34437441
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
IRJET- A Survey on Reconstruct Structural Design of FPGA
128 bit low power and area efficient carry select adder amit bakshi academia
32-bit unsigned multiplier by using CSLA & CLAA
IRJET - A Speculative Approximate Adder for Error Recovery Unit
IRJET- A Review of Approximate Adders for Energy-Efficient Digital Signal Pro...
J045075661
IRJET- Accuracy Configurable Adder
High Speed and Time Efficient 1-D DWT on Xilinx Virtex4 DWT Using 9/7 Filter ...
Area efficient parallel LFSR for cyclic redundancy check
High Performance MAC Unit for FFT Implementation
Spatialization Parameter Estimation in MDCT Domain for Stereo Audio
Evaluation of High Speed and Low Memory Parallel Prefix Adders
DUAL FIELD DUAL CORE SECURE CRYPTOPROCESSOR ON FPGA PLATFORM
A Configurable and Low Power Hard-Decision Viterbi Decoder in VLSI Architecture
Arithmetic Operations in Multi-Valued Logic
Iaetsd finger print recognition by cordic algorithm and pipelined fft
Ad

Viewers also liked (20)

PPT
Low power & area efficient carry select adder
PDF
carry select adder
PDF
Design and Verification of Area Efficient Carry Select Adder
PPTX
Csla 130319073823-phpapp01-140821210430-phpapp02
PPTX
Vhdl programming
PPTX
ODP
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
PDF
VLSI Implementation of Vedic Multiplier Using Urdhva– Tiryakbhyam Sutra in VH...
PPT
Design and development of carry select adder
PPTX
Design & implementation of high speed carry select adder
PPT
Planning lesson9b
PDF
Ron Tite Slide Deck: SPMA AGM 2014
PPTX
Sakuntla nivas Tour & Travels
PPTX
Documentary History & Modes
PDF
Resume +Tahir 2016
PPTX
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
PPTX
shakuntalanivas
PDF
CV_Psychology_Asst.Prof_Vijaya Bhaskar
PDF
Matching Game In Java
PDF
1. uu no. 44_th_2009_ttg_rumah_sakit
Low power & area efficient carry select adder
carry select adder
Design and Verification of Area Efficient Carry Select Adder
Csla 130319073823-phpapp01-140821210430-phpapp02
Vhdl programming
VHDL Packages, Coding Styles for Arithmetic Operations and VHDL-200x Additions
VLSI Implementation of Vedic Multiplier Using Urdhva– Tiryakbhyam Sutra in VH...
Design and development of carry select adder
Design & implementation of high speed carry select adder
Planning lesson9b
Ron Tite Slide Deck: SPMA AGM 2014
Sakuntla nivas Tour & Travels
Documentary History & Modes
Resume +Tahir 2016
Engineering Velocity @indeed eng presented on Sept 24 2014 at Beyond Agile
shakuntalanivas
CV_Psychology_Asst.Prof_Vijaya Bhaskar
Matching Game In Java
1. uu no. 44_th_2009_ttg_rumah_sakit
Ad

Similar to 1.area efficient carry select adder (20)

PDF
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
An Area Efficient Adder Design for VLSI Circuits
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Design and Implementation of Low-Power and Area-Efficient 64 bit CSLA using VHDL
PDF
FPGA Implementation of High Speed Architecture of CSLA using D-Latches
PDF
D0532025
PDF
Implementation of Low Power and Area Efficient Carry Select Adder
PDF
Enhanced low power, fast and area efficient carry select adder
PDF
High Speed Carryselect Adder
PDF
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
PDF
J43015355
PDF
High Speed and Area Efficient Booth Multiplier Using SQRT CSLA with Zero Find...
PDF
128-Bit Area Efficient Reconfigurable Carry Select Adder
PDF
M367578
PDF
B045060813
DOCX
Project report on design & implementation of high speed carry select adder
PPTX
PPT.pptx
PPTX
implementation and comparision of effective area efficient architecture for CSLA
PDF
W4408123126
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
An Area Efficient Adder Design for VLSI Circuits
International Journal of Engineering Research and Development (IJERD)
Design and Implementation of Low-Power and Area-Efficient 64 bit CSLA using VHDL
FPGA Implementation of High Speed Architecture of CSLA using D-Latches
D0532025
Implementation of Low Power and Area Efficient Carry Select Adder
Enhanced low power, fast and area efficient carry select adder
High Speed Carryselect Adder
Area Delay Power Efficient and Implementation of Modified Square-Root Carry S...
J43015355
High Speed and Area Efficient Booth Multiplier Using SQRT CSLA with Zero Find...
128-Bit Area Efficient Reconfigurable Carry Select Adder
M367578
B045060813
Project report on design & implementation of high speed carry select adder
PPT.pptx
implementation and comparision of effective area efficient architecture for CSLA
W4408123126

Recently uploaded (20)

PDF
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
DOCX
A Contemporary Luxury Villa in Dubai Jumeirah-2.docx
PPTX
Orthtotics presentation regarding physcial therapy
PPT
UNIT I- Yarn, types, explanation, process
PDF
Pongal 2026 Sponsorship Presentation - Bhopal Tamil Sangam
PPTX
Special finishes, classification and types, explanation
PDF
Chalkpiece Annual Report from 2019 To 2025
PDF
Introduction-to-World-Schools-format-guide.pdf
PDF
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
PPTX
Media And Information Literacy for Grade 12
PPT
robotS AND ROBOTICSOF HUMANS AND MACHINES
PPTX
NEW EIA PART B - Group 5 (Section 50).pptx
PDF
ART & DESIGN HISTORY OF VEDIC CIVILISATION.pdf
PDF
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
PPTX
YV PROFILE PROJECTS PROFILE PRES. DESIGN
PDF
intro_to_rust.pptx_123456789012446789.pdf
PPTX
2. Competency Based Interviewing - September'16.pptx
PDF
Test slideshare presentation for blog post
PPT
Machine printing techniques and plangi dyeing
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
A Contemporary Luxury Villa in Dubai Jumeirah-2.docx
Orthtotics presentation regarding physcial therapy
UNIT I- Yarn, types, explanation, process
Pongal 2026 Sponsorship Presentation - Bhopal Tamil Sangam
Special finishes, classification and types, explanation
Chalkpiece Annual Report from 2019 To 2025
Introduction-to-World-Schools-format-guide.pdf
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
Media And Information Literacy for Grade 12
robotS AND ROBOTICSOF HUMANS AND MACHINES
NEW EIA PART B - Group 5 (Section 50).pptx
ART & DESIGN HISTORY OF VEDIC CIVILISATION.pdf
Skskkxiixijsjsnwkwkaksixindndndjdjdjsjjssk
YV PROFILE PROJECTS PROFILE PRES. DESIGN
intro_to_rust.pptx_123456789012446789.pdf
2. Competency Based Interviewing - September'16.pptx
Test slideshare presentation for blog post
Machine printing techniques and plangi dyeing

1.area efficient carry select adder

  • 1. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 371 (MAD) algorithm introduced in [1], [2] is used for coefficients quanti-zation. The subfilter is based on canonical signed digit (CSD) structure and Carry-Save adders are used. Tables III, IV, and V show the results of area, power, and critical path delay, synthesized by Design Compiler [10] with 45-nm technology. VI. CONCLUSION In this paper, we have presented new parallel FIR filter structures, which are beneficial to symmetric convolutions when the number of taps is the multiple of 2 or 3. Multipliers are the major portions in hard-ware consumption for the parallel FIR filter implementation. The pro-posed new structure exploits the nature of even symmetric coefficients and save a significant amount of multipliers at the expense of addi-tional adders. Since multipliers outweigh adders in hardware cost, it is profitable to exchange multipliers with adders. Moreover, the number of increased adders stays still when the length of FIR filter becomes large, whereas the number of reduced multipliers increases along with the length of FIR filter. Consequently, the larger the length of FIR fil-ters is, the more the proposed structures can save from the existing FFA structures, with respect to the hardware cost. Overall, in this paper, we have provided new parallel FIR structures consisting of advantageous polyphase decompositions dealing with symmetric convolutions com-paratively better than the existing FFA structures in terms of hardware consumption. REFERENCES [1] D. A. Parker and K. K. Parhi, “Low-area/power parallel FIR digital filter implementations,” J. VLSI Signal Process. Syst., vol. 17, no. 1, pp. 75–92, 1997. [2] J. G. Chung and K. K. Parhi, “Frequency-spectrum-based low-area low-power parallel FIR filter design,” EURASIP J. Appl. Signal Process., vol. 2002, no. 9, pp. 444–453, 2002. [3] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Im-plementation. New York: Wiley, 1999. [4] Z.-J. Mou and P. Duhamel, “Short-length FIR filters and their use in fast nonrecursive filtering,” IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1322–1332, Jun. 1991. [5] J. I. Acha, “Computational structures for fast implementation of L-path and L-block digital filters,” IEEE Trans. Circuit Syst., vol. 36, no. 6, pp. 805–812, Jun. 1989. [6] C. Cheng and K. K. Parhi, “Hardware efficient fast parallel FIR filter structures based on iterated short convolution,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 8, pp. 1492–1500, Aug. 2004. [7] C. Cheng and K. K. Parhi, “Furthur complexity reduction of parallel FIR filters,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 2005), Kobe, Japan, May 2005. [8] C. Cheng and K. K. Parhi, “Low-cost parallel FIR structures with 2-stage parallelism,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 2, pp. 280–290, Feb. 2007. [9] I.-S. Lin and S. K. Mitra, “Overlapped block digital filtering,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no. 8, pp. 586–596, Aug. 1996. [10] “Design Compiler User Guide,” ver. B-2008.09, Synopsys Inc., Sep. 2008. Low-Power and Area-Efficient Carry Select Adder B. Ramkumar and Harish M Kittur Abstract—Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gate-level modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18- m CMOS process technology. The results analysis shows that the proposed CSLA structure is better than the regular SQRT CSLA. Index Terms—Application-specific integrated circuit (ASIC), area-effi-cient, CSLA, low power. I. INTRODUCTION Design of area- and power-efficient high-speed data path logic sys-tems are one of the most substantial areas of research in VLSI system design. In digital adders, the speed of addition is limited by the time required to propagate a carry through the adder. The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating mul-tiple carries and then select a carry to generate the sum [1]. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input and , then the final sum and carry are selected by the multiplexers (mux). The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with in the regular CSLA to achieve lower area and power consumption [2]–[4]. The main advantage of this BEC logic comes from the lesser number of logic gates than the -bit Full Adder (FA) structure. The details of the BEC logic are discussed in Section III. This brief is structured as follows. Section II deals with the delay and area evaluation methodology of the basic adder blocks. Section III presents the detailed structure and the function of the BEC logic. The SQRT CSLA has been chosen for comparison with the proposed de-sign as it has a more balanced delay, and requires lower power and area [5], [6]. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Sections IV and V, respec-tively. The ASIC implementation details and results are analyzed in Section VI. Finally, the work is concluded in Section VII. Manuscript received May 12, 2010; revised October 28, 2010; accepted De-cember 15, 2010. Date of publication January 24, 2011; date of current version January 18, 2012. The authors are with the School of Electronics Engineering, VIT University, Vellore 632 014, India (e-mail: ramkumar.b@vit.ac.in; kittur@vit.ac.in). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2010.2101621 1063-8210/$26.00 © 2011 IEEE
  • 2. 372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 Fig. 1. Delay and Area evaluation of an XOR gate. Fig. 2. 4-b BEC. Fig. 3. 4-b BEC with 8:4 mux. II. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASIC ADDER BLOCKS The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Fig. 1. The gates between the dotted lines are performing the operations in parallel and the numeric representation of each gate indi-cates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and In-verter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table I. TABLE I DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA TABLE II FUNCTION TABLE OF THE 4-b BEC III. BEC As stated above the main idea of this work is to use BEC instead of the RCA with in order to reduce the area and power consump-tion of the regular CSLA. To replace the -bit RCA, an -bit BEC is required. A structure and the function table of a 4-b BEC are shown in Fig. 2 and Table II, respectively. Fig. 3 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux. One input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols NOT, XOR)
  • 3. IV. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR 16-B SQRT CSLA The structure of the 16-b regular SQRT CSLA is shown in Fig. 4. It has five groups of different size RCA. The delay and area evaluation of each group are shown in Fig. 5, in which the numerals within [] specify the delay values, e.g., sum2 requires 10 gate delays. The steps leading to the evaluation are as follows. 1) The group2 [see Fig. 5(a)] has two sets of 2-b RCA. Based on . the consideration of delay values of Table I, the arrival time of selection input of 6:3 mux is earlier than and later than Thus, is summation of and and
  • 4. is summation of and mux.
  • 5. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 373 Fig. 4. Regular 16-b SQRT CSLA. Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. F is a Full Adder. 2) Except for group2, the arrival time of mux selection input is al-ways greater than the arrival time of data outputs from the RCA’s. Thus, the delay of group3 to group5 is determined, respectively as follows:
  • 6. 3) The one set of 2-b RCA in group2 has 2 FA for and the other set has 1 FA and 1 HA for
  • 7. . Based on the area count TABLE III DELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPS of Table I, the total number of gate counts in group2 is determined as follows: Gate count 4) Similarly, the estimated maximum delay and area of the other groups in the regular SQRT CSLA are evaluated and listed in Table III. V. DELAY AND AREA EVALUATION METHODOLOGY OF MODIFIED 16-B SQRT CSLA The structure of the proposed 16-b SQRT CSLA using BEC for RCA with to optimize the area and power is shown in Fig. 6. We again split the structure into five groups. The delay and area estimation of each group are shown in Fig. 7. The steps leading to the evaluation are given here. 1) The group2 [see Fig. 7(a)] has one 2-b RCA which has 1 FA and 1 HA for
  • 8. . Instead of another 2-b RCA with a 3-b BEC is used which adds one to the output from 2-b RCA. Based on the consideration of delay values of Table I, the arrival time of selection input of 6:3 mux is earlier than the and
  • 9. and later than the . Thus, the sum3 and final (output from mux) are depending on and mux and partial (input to mux) and mux, respectively. The sum2 depends on and mux. 2) For the remaining group’s the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC’s. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.
  • 10. 374 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 Fig. 6. Modified 16-b SQRT CSLA. The parallel RCA with is replaced with BEC. Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. H is a Half Adder. 3) The area count of group2 is determined as follows: Gate count
  • 11. TABLE IV DELAY AND AREA COUNT OF MODIFIED SQRT CSLA 4) Similarly, the estimated maximum delay and area of the other groups of the modified SQRT CSLA are evaluated and listed in Table IV. Comparing Tables III and IV, it is clear that the proposed modified SQRT CSLA saves 113 gate areas than the regular SQRT CSLA, with only 11 increases in gate delays. To further evaluate the performance, we have resorted to ASIC implementation and simulation. VI. ASIC IMPLEMENTATION RESULTS The design proposed in this paper has been developed using Ver-ilog- HDL and synthesized in Cadence RTL compiler using typical li-braries of TSMC 0.18 um technology. The synthesized Verilog netlist and their respective design constraints file (SDC) are imported to Ca-dence SoC Encounter and are used to generate automated layout from standard cells and placement and routing [7]. Parasitic extraction is per-formed using Encounter’s Native RC extraction tool and the extracted parasitic RC (SPEF format) is back annotated to Common Timing En-gine in Encounter platform for static timing analysis. For each word size of the adder, the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Cadence En-counter Power Analysis to perform the power simulations. The similar design flowis followed for both the regular and modified SQRT CSLA. Table V exhibits the simulation results of both the CSLA structures in terms of delay, area and power. The area indicates the total cell area of the design and the total power is sum of the leakage power, internal power and switching power. The percentage reduction in the cell area, total power, power-delay product and the area–delay product as func-tion of the bit size are shown in Fig. 8(a). Also plotted is the percentage delay overhead in Fig. 8(b). It is clear that the area of the 8-, 16-, 32-, and 64-b proposed SQRT CSLA is reduced by 9.7%, 15%, 16.7%, and 17.4%, respectively. The total power consumed shows a similar trend of increasing reduction in power consumption 7.6%, 10.56%, 13.63%, and 15.46 % with the bit size. Interestingly, the delay overhead also
  • 12. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 2, FEBRUARY 2012 375 TABLE V COMPARISON OF THE REGULAR AND MODIFIED SQRT CSLA Fig. 8. (a) Percentage reduction in the cell area, total power, power–delay product, and area–delay product. (b) Percentage of delay overhead. exhibits a similarly decreasing trend with bit size. The delay overhead for the 8, 16, and 32-b is 14%, 9.8%, and 6.7% respectively, whereas for the 64-b it reduces to only 3.76%. The power–delay product of the proposed 8-b is higher than that of the regular SQRT CSLA by 5.2% and the area-delay product is lower by 2.9%. However, the power-delay product of the proposed 16-b SQRT CSLA reduces by 1.76% and for the 32-b and 64-b by as much as 8.18%, and 12.28% respectively. Sim-ilarly the area-delay product of the proposed design for 16-, 32-, and 64-b is also reduced by 6.7%, 11%, and 14.4% respectively. VII. CONCLUSION A simple approach is proposed in this paper to reduce the area and power of SQRT CSLA architecture. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. The compared results show that the modified SQRT CSLA has a slightly larger delay (only 3.76%), but the area and power of the 64-b modified SQRT CSLA are significantly reduced by 17.4% and 15.4% respectively. The power-delay product and also the area-delay product of the proposed design show a decrease for 16-, 32-, and 64-b sizes which indicates the success of the method and not a mere tradeoff of delay for power and area. The modified CSLA architecture is there-fore, low area, low power, simple and efficient for VLSI hardware im-plementation. It would be interesting to test the design of the modified 128-b SQRT CSLA. ACKNOWLEDGMENT The authors would like to thank S. Sivanantham, P. MageshKannan, and S. Ravi of the VLSI Division, VIT University, Vellore, India, for their contributions to this work. REFERENCES [1] O. J. Bedrij, “Carry-select adder,” IRE Trans. Electron. Comput., pp. 340–344, 1962. [2] B. Ramkumar, H.M. Kittur, and P. M. Kannan, “ASIC implementation of modified faster carry save adder,” Eur. J. Sci. Res., vol. 42, no. 1, pp. 53–58, 2010. [3] T. Y. Ceiang and M. J. Hsiao, “Carry-select adder using single ripple carry adder,” Electron. Lett., vol. 34, no. 22, pp. 2101–2103, Oct. 1998. [4] Y. Kim and L.-S. Kim, “64-bit carry-select adder with reduced area,” Electron. Lett., vol. 37, no. 10, pp. 614–615, May 2001. [5] J. M. Rabaey, Digtal Integrated Circuits—A Design Perspec-tive. Upper Saddle River, NJ: Prentice-Hall, 2001. [6] Y. He, C. H. Chang, and J. Gu, “An area efficient 64-bit square root carry-select adder for lowpower applications,” in Proc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082–4085. [7] Cadence, “Encounter user guide,” Version 6.2.4, March 2008.