SlideShare a Scribd company logo
2
Most read
6
Most read
7
Most read
Prof.Dipak Mahurkar Department of E&Computer
Engineering
Sanjivani College of Engineering, Kopargaon
Department of Electronics & Computer Engineering
(An Autonomous Institute)
Affiliated to Savitribai Phule Pune University
Accredited ‘A’ Grade by NAAC
________________________________________________________________________________________
Subject: Digital Logic Design and HDL (EC203)
UNIT-1
Topic: computer arithmetic’s (fixed and floating point)
1
` There are two major approaches to store real numbers (i.e.,
numbers with fractional component) in modern computing.
These are (i) Fixed Point Notation and (ii) Floating Point
Notation. In fixed point notation, there are a fixed number of
digits after the decimal point, whereas floating point number
allows for a varying number of digits after the decimal point.
Prof.Dipak Mahurkar Department of E&Tc Engineering 2
• Fixed-Point Representation −
This representation has fixed number of bits for integer part
and for fractional part. For example, if given fixed-point
representation is IIII.FFFF, then you can store minimum value
is 0000.0001 and maximum value is 9999.9999. There are
three parts of a fixed-point number representation: the sign
field, integer field, and fractional field.
Prof.Dipak Mahurkar Department of E&Tc Engineering 3
Prof.Dipak Mahurkar Department of E&Tc Engineering 4
We can represent these numbers using:
• Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k
bits.
• 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)-
1), for k bits.
• 2’s complementation representation: range from -(2(k-1)) to (2(k-
1)-1), for k bits.
2’s complementation representation is preferred in computer
system because of unambiguous property and easier for
arithmetic operations.
Prof.Dipak Mahurkar Department of E&Tc Engineering 5
Example −Assume number is using 32-bit format
which reserve 1 bit for the sign, 15 bits for the
integer part and 16 bits for the fractional part.
Then, -43.625 is represented as following:
Prof.Dipak Mahurkar Department of E&Tc Engineering 6
• Where, 0 is used to represent + and 1 is used to represent - .
000000000101011 is 15 bit binary value for decimal 43 and
1010000000000000 is 16 bit binary value for fractional 0.625.
• The advantage of using a fixed-point representation is
performance and disadvantage is relatively limited range of
values that they can represent. So, it is usually inadequate for
numerical analysis as it does not allow enough numbers and
accuracy. A number whose representation exceeds 32 bits
would have to be stored inexactly.
Prof.Dipak Mahurkar Department of E&Tc Engineering 7
• These are above smallest positive number and largest positive
number which can be store in 32-bit representation as given
above format. Therefore, the smallest positive number is 2-
16 ≈ 0.000015 approximate and the largest positive number is
(215-1)+(1-2-16)=215(1-2-16) =32768, and gap between these
numbers is 2-16.
• We can move the radix point either left or right with the help
of only integer field is 1.
Prof.Dipak Mahurkar Department of E&Tc Engineering 8
• Floating-Point Representation −
• This representation does not reserve a specific number of bits for
the integer part or the fractional part. Instead it reserves a certain
number of bits for the number (called the mantissa or significand)
and a certain number of bits to say where within that number the
decimal place sits (called the exponent).
• The floating number representation of a number has two part: the
first part represents a signed fixed point number called mantissa.
The second part of designates the position of the decimal (or
binary) point and is called the exponent. The fixed point mantissa
may be fraction or an integer. Floating -point is always interpreted
to represent a number in the following form: Mxre.
Prof.Dipak Mahurkar Department of E&Tc Engineering 9
Only the mantissa m and the exponent e are physically
represented in the register (including their sign). A floating-
point binary number is represented in a similar manner
except that is uses base 2 for the exponent. A floating-point
number is said to be normalized if the most significant digit
of the mantissa is 1.
Prof.Dipak Mahurkar Department of E&Tc Engineering 10
• So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign
bit, m is the mantissa, e is the exponent value, and Bias is the
bias number.
• Note that signed integers and exponent are represented by
either sign representation, or one’s complement
representation, or two’s complement representation.
• The floating point representation is more flexible. Any non-zero
number can be represented in the normalized form
of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x.
Prof.Dipak Mahurkar Department of E&Tc Engineering 11
Example −Suppose number is using 32-bit format: the 1 bit sign
bit, 8 bits for signed exponent, and 23 bits for the fractional part.
The leading bit 1 is not stored (as it is always 1 for a normalized
number) and is referred to as a “hidden bit”.
Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 ,
which is represented as following below,
Prof.Dipak Mahurkar Department of E&Tc Engineering 12
• Where 00000101 is the 8-bit binary value of exponent value
+5(i.e 25 ).
• Note that 8-bit exponent field is used to store integer
exponents -126 ≤ n ≤ 127.
• The smallest normalized positive number that fits into 32 bits
is (1.00000000000000000000000)2x2-126=2-126≈1.18x10-38 ,
and largest normalized positive number that fits into 32 bits is
(1.11111111111111111111111)2x2127=(224-1)x2104 ≈
3.40x1038 . These numbers are represented as following below,
Prof.Dipak Mahurkar Department of E&Tc Engineering 13
• The precision of a floating-point format is the number of positions reserved for
binary digits plus one (for the hidden bit). In the examples considered here the
precision is 23+1=24.
• The gap between 1 and the next normalized floating-point number is known as
machine epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the
smallest positive floating-point number because of non-uniform spacing unlike in
the fixed-point scenario.
• Note that non-terminating binary numbers can be represented in floating point
representation, e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its
binary representation is non-terminating.
Prof.Dipak Mahurkar Department of E&Tc Engineering 14
IEEE Floating point Number Representation −
IEEE (Institute of Electrical and Electronics Engineers) has
standardized Floating-Point Representation as following
diagram.
Prof.Dipak Mahurkar Department of E&Tc Engineering 15
• So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the
mantissa, e is the exponent value, and Bias is the bias number. The sign bit
is 0 for positive number and 1 for negative number. Exponents are
represented by or two’s complement representation.
• According to IEEE 754 standard, the floating-point number is represented
in following ways:
• Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa
• Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa
• Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa
• Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit
mantissa
Prof.Dipak Mahurkar Department of E&Tc Engineering 16
Prof.Dipak Mahurkar Department of E&Tc Engineering
Thank You! 17

More Related Content

PPT
Huffman Coding
PPSX
Fixed point and floating-point numbers
PPT
Data representation
PDF
Huffman and Arithmetic coding - Performance analysis
PPTX
Adjacency And Incidence Matrix
PPTX
Chapter 5: Cominational Logic with MSI and LSI
PPTX
Error detection and correction
Huffman Coding
Fixed point and floating-point numbers
Data representation
Huffman and Arithmetic coding - Performance analysis
Adjacency And Incidence Matrix
Chapter 5: Cominational Logic with MSI and LSI
Error detection and correction

What's hot (20)

PPT
Registers
DOCX
Flag register 8086 assignment
PPTX
Modes of transfer - Computer Organization & Architecture - Nithiyapriya Pasav...
PPTX
Input Output Organization
PPT
Flip-Flop || Digital Electronics
PDF
Microprocessor 8086-lab-mannual
PPTX
Unit 4-booth algorithm
PPTX
Chapter 03 arithmetic for computers
PPTX
Floating point arithmetic operations (1)
PPTX
Structure of dbms
PPTX
CS304PC:Computer Organization and Architecture Session 11 general register or...
PPT
Floating point arithmetic
PPTX
Floating point representation
PPT
Types of instructions
PPTX
pipeline in computer architecture design
PPT
04 cache memory.ppt 1
PPTX
Multiplication algorithm
PPT
Memory Addressing
PPTX
Linear Block Codes
PDF
Basic Computer Organization and Design
Registers
Flag register 8086 assignment
Modes of transfer - Computer Organization & Architecture - Nithiyapriya Pasav...
Input Output Organization
Flip-Flop || Digital Electronics
Microprocessor 8086-lab-mannual
Unit 4-booth algorithm
Chapter 03 arithmetic for computers
Floating point arithmetic operations (1)
Structure of dbms
CS304PC:Computer Organization and Architecture Session 11 general register or...
Floating point arithmetic
Floating point representation
Types of instructions
pipeline in computer architecture design
04 cache memory.ppt 1
Multiplication algorithm
Memory Addressing
Linear Block Codes
Basic Computer Organization and Design
Ad

Similar to computer arithmetic’s (fixed and floating point) (20)

PPTX
Representation of numbers.pptx
PPTX
Only floating point lecture 7 (1)
PPTX
Floating Point Representation premium.pptx
PPTX
Unit 2 Arithmetic
PDF
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
PDF
IEEE 754 Standards For Floating Point Representation.pdf
PPTX
Computer Representation of Numbers and.pptx
PPTX
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
PPTX
Floating point representation and arithmetic
PDF
FIXED and FLOATING-POINT-REPRESENTATION.pdf
DOCX
Numerical Analysis_Computer Representation of Numbers.docx
PPTX
IEEE floating point representation
PPTX
Quick tutorial on IEEE 754 FLOATING POINT representation
PDF
LEC-3-CAO-FLOATING-POINT-REPRESENTATION.pdf
PPTX
digital logic circuits, digital component floting and fixed point
PPTX
Floating Point Representation_mvgrcollegeofengineering_departmentofece.pptx
PPTX
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
PPTX
Number formats for signals and coefficients in DSP system
PPTX
Digital signal processing and architecture
PPTX
Digital signal processing and architecture
Representation of numbers.pptx
Only floating point lecture 7 (1)
Floating Point Representation premium.pptx
Unit 2 Arithmetic
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
IEEE 754 Standards For Floating Point Representation.pdf
Computer Representation of Numbers and.pptx
IEEE-754 standard format to handle Floating-Point calculations in RISC-V CPUs...
Floating point representation and arithmetic
FIXED and FLOATING-POINT-REPRESENTATION.pdf
Numerical Analysis_Computer Representation of Numbers.docx
IEEE floating point representation
Quick tutorial on IEEE 754 FLOATING POINT representation
LEC-3-CAO-FLOATING-POINT-REPRESENTATION.pdf
digital logic circuits, digital component floting and fixed point
Floating Point Representation_mvgrcollegeofengineering_departmentofece.pptx
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
Number formats for signals and coefficients in DSP system
Digital signal processing and architecture
Digital signal processing and architecture
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
UNIT 4 Total Quality Management .pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Project quality management in manufacturing
PPTX
Sustainable Sites - Green Building Construction
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
additive manufacturing of ss316l using mig welding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Digital Logic Computer Design lecture notes
Automation-in-Manufacturing-Chapter-Introduction.pdf
UNIT 4 Total Quality Management .pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CYBER-CRIMES AND SECURITY A guide to understanding
Project quality management in manufacturing
Sustainable Sites - Green Building Construction
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Foundation to blockchain - A guide to Blockchain Tech
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
additive manufacturing of ss316l using mig welding

computer arithmetic’s (fixed and floating point)

  • 1. Prof.Dipak Mahurkar Department of E&Computer Engineering Sanjivani College of Engineering, Kopargaon Department of Electronics & Computer Engineering (An Autonomous Institute) Affiliated to Savitribai Phule Pune University Accredited ‘A’ Grade by NAAC ________________________________________________________________________________________ Subject: Digital Logic Design and HDL (EC203) UNIT-1 Topic: computer arithmetic’s (fixed and floating point) 1
  • 2. ` There are two major approaches to store real numbers (i.e., numbers with fractional component) in modern computing. These are (i) Fixed Point Notation and (ii) Floating Point Notation. In fixed point notation, there are a fixed number of digits after the decimal point, whereas floating point number allows for a varying number of digits after the decimal point. Prof.Dipak Mahurkar Department of E&Tc Engineering 2
  • 3. • Fixed-Point Representation − This representation has fixed number of bits for integer part and for fractional part. For example, if given fixed-point representation is IIII.FFFF, then you can store minimum value is 0000.0001 and maximum value is 9999.9999. There are three parts of a fixed-point number representation: the sign field, integer field, and fractional field. Prof.Dipak Mahurkar Department of E&Tc Engineering 3
  • 4. Prof.Dipak Mahurkar Department of E&Tc Engineering 4
  • 5. We can represent these numbers using: • Signed representation: range from -(2(k-1)-1) to (2(k-1)-1), for k bits. • 1’s complement representation: range from -(2(k-1)-1) to (2(k-1)- 1), for k bits. • 2’s complementation representation: range from -(2(k-1)) to (2(k- 1)-1), for k bits. 2’s complementation representation is preferred in computer system because of unambiguous property and easier for arithmetic operations. Prof.Dipak Mahurkar Department of E&Tc Engineering 5
  • 6. Example −Assume number is using 32-bit format which reserve 1 bit for the sign, 15 bits for the integer part and 16 bits for the fractional part. Then, -43.625 is represented as following: Prof.Dipak Mahurkar Department of E&Tc Engineering 6
  • 7. • Where, 0 is used to represent + and 1 is used to represent - . 000000000101011 is 15 bit binary value for decimal 43 and 1010000000000000 is 16 bit binary value for fractional 0.625. • The advantage of using a fixed-point representation is performance and disadvantage is relatively limited range of values that they can represent. So, it is usually inadequate for numerical analysis as it does not allow enough numbers and accuracy. A number whose representation exceeds 32 bits would have to be stored inexactly. Prof.Dipak Mahurkar Department of E&Tc Engineering 7
  • 8. • These are above smallest positive number and largest positive number which can be store in 32-bit representation as given above format. Therefore, the smallest positive number is 2- 16 ≈ 0.000015 approximate and the largest positive number is (215-1)+(1-2-16)=215(1-2-16) =32768, and gap between these numbers is 2-16. • We can move the radix point either left or right with the help of only integer field is 1. Prof.Dipak Mahurkar Department of E&Tc Engineering 8
  • 9. • Floating-Point Representation − • This representation does not reserve a specific number of bits for the integer part or the fractional part. Instead it reserves a certain number of bits for the number (called the mantissa or significand) and a certain number of bits to say where within that number the decimal place sits (called the exponent). • The floating number representation of a number has two part: the first part represents a signed fixed point number called mantissa. The second part of designates the position of the decimal (or binary) point and is called the exponent. The fixed point mantissa may be fraction or an integer. Floating -point is always interpreted to represent a number in the following form: Mxre. Prof.Dipak Mahurkar Department of E&Tc Engineering 9
  • 10. Only the mantissa m and the exponent e are physically represented in the register (including their sign). A floating- point binary number is represented in a similar manner except that is uses base 2 for the exponent. A floating-point number is said to be normalized if the most significant digit of the mantissa is 1. Prof.Dipak Mahurkar Department of E&Tc Engineering 10
  • 11. • So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the exponent value, and Bias is the bias number. • Note that signed integers and exponent are represented by either sign representation, or one’s complement representation, or two’s complement representation. • The floating point representation is more flexible. Any non-zero number can be represented in the normalized form of ±(1.b1b2b3 ...)2x2n This is normalized form of a number x. Prof.Dipak Mahurkar Department of E&Tc Engineering 11
  • 12. Example −Suppose number is using 32-bit format: the 1 bit sign bit, 8 bits for signed exponent, and 23 bits for the fractional part. The leading bit 1 is not stored (as it is always 1 for a normalized number) and is referred to as a “hidden bit”. Then −53.5 is normalized as -53.5=(-110101.1)2=(-1.101011)x25 , which is represented as following below, Prof.Dipak Mahurkar Department of E&Tc Engineering 12
  • 13. • Where 00000101 is the 8-bit binary value of exponent value +5(i.e 25 ). • Note that 8-bit exponent field is used to store integer exponents -126 ≤ n ≤ 127. • The smallest normalized positive number that fits into 32 bits is (1.00000000000000000000000)2x2-126=2-126≈1.18x10-38 , and largest normalized positive number that fits into 32 bits is (1.11111111111111111111111)2x2127=(224-1)x2104 ≈ 3.40x1038 . These numbers are represented as following below, Prof.Dipak Mahurkar Department of E&Tc Engineering 13
  • 14. • The precision of a floating-point format is the number of positions reserved for binary digits plus one (for the hidden bit). In the examples considered here the precision is 23+1=24. • The gap between 1 and the next normalized floating-point number is known as machine epsilon. the gap is (1+2-23)-1=2-23for above example, but this is same as the smallest positive floating-point number because of non-uniform spacing unlike in the fixed-point scenario. • Note that non-terminating binary numbers can be represented in floating point representation, e.g., 1/3 = (0.010101 ...)2 cannot be a floating-point number as its binary representation is non-terminating. Prof.Dipak Mahurkar Department of E&Tc Engineering 14
  • 15. IEEE Floating point Number Representation − IEEE (Institute of Electrical and Electronics Engineers) has standardized Floating-Point Representation as following diagram. Prof.Dipak Mahurkar Department of E&Tc Engineering 15
  • 16. • So, actual number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m is the mantissa, e is the exponent value, and Bias is the bias number. The sign bit is 0 for positive number and 1 for negative number. Exponents are represented by or two’s complement representation. • According to IEEE 754 standard, the floating-point number is represented in following ways: • Half Precision (16 bit): 1 sign bit, 5 bit exponent, and 10 bit mantissa • Single Precision (32 bit): 1 sign bit, 8 bit exponent, and 23 bit mantissa • Double Precision (64 bit): 1 sign bit, 11 bit exponent, and 52 bit mantissa • Quadruple Precision (128 bit): 1 sign bit, 15 bit exponent, and 112 bit mantissa Prof.Dipak Mahurkar Department of E&Tc Engineering 16
  • 17. Prof.Dipak Mahurkar Department of E&Tc Engineering Thank You! 17