SlideShare a Scribd company logo
FLOATING POINT REPRESENTATION
AND ARITHMETIC
Presentation Topic
B. Sc CS (H)I Year Computer Organization and Architecture
SIR CHHOTU RAM ENGG. INSTITUTE & TECH. CCS UNIV. CAMPUS, MEERUT
INTRODUCTION
•Objective: To understand how to represent floating point numbers
in the computer and how to perform arithmetic with them.
•Approximate arithmetic
–Finite Range
–Limited Precision
•Topics
–IEEE format for single and double precision floating point numbers
–Floating point addition
FLOATING POINT
• An IEEE floating point representation consists of
–A Sign Bit (no surprise)
–An Exponent (“times 2 to the what?”)
–Mantissa (“Significand”), which is assumed to be 1.xxxxx
(thus, one bit of the mantissa is implied as 1)
–This is called a normalized representation
• So a mantissa = 0 really is interpreted to be 1.0, and a
mantissa of all 1111 is interpreted to be 1.1111
FLOATING POINT STANDARD
• Defined by IEEE Std 754-1985
• Developed in response to divergence of representations
–Portability issues for scientific code
• Now almost universally adopted
• Two representations
–Single precision (32-bit)
–Double precision (64-bit)
S
s
5
IEEE Floating-Point Format
• S: sign bit (0  non-negative, 1  negative)
• Normalize significand: 1.0 ≤ |significand| < 2.0
– Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly
(hidden bit)
– Significand is Fraction with the “1.” restored
• Exponent: excess representation: actual exponent + Bias
– Ensures exponent is unsigned
– Single: Bias = 127; Double: Bias = 1203
S Exponent Fraction
single: 8 bits
double: 11 bits
single: 23 bits
double: 52 bits
Bias)
(Exponent
S
2
Fraction)
(1
1)
(
x 





FLOATING-POINT EXAMPLE
• Represent –0.75
– –0.75 = (–1)1 × 1.12 × 2–1
– S = 1
– Fraction = 1000…002
– Exponent = –1 + Bias
• Single: –1 + 127 = 126 = 011111102
• Double: –1 + 1023 = 1022 = 011111111102
• Single: 1011111101000…00
• Double: 1011111111101000…00
FLOATING-POINT EXAMPLE
• What number is represented by the single-
precision float
11000000101000…00
– S = 1
– Fraction = 01000…002
– Fxponent = 100000012 = 129
• x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
A
s
Representation of Floating Point Numbers
•IEEE 754 double precision
31 30 20 19 0
Sign Biased exponent Normalized Mantissa (implicit 53rd bit)
(-1)s  F  2E-1023
FLOATING POINT ARITHMETIC
• fl(x) = nearest floating point number to x
• Relative error (precision = s digits)
–|x - fl(x)|/|x| 1/2 1-s for = 2, 2-s
• Arithmetic
–x y = fl(x+y) = (x + y)(1 + ) for < u
–x y = fl(x y)(1 + ) for < u
ULP—Unit in the Last Place is the smallest possible increment or decrement
that can be made using the machine's FP arithmetic.
FLOATING POINT -ADDITION
• Consider a 4-digit decimal example
– 9.999 × 101 + 1.610 × 10–1
• 1. Align decimal points
– Shift number with smaller exponent
– 9.999 × 101 + 0.016 × 101
• 2. Add significands
– 9.999 × 101 + 0.016 × 101 = 10.015 × 101
• 3. Normalize result & check for over/underflow
– 1.0015 × 102
• 4. Round and renormalize if necessary
– 1.002 × 102

More Related Content

PPTX
Floating point representation and arithmetic
PPTX
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
PPTX
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
PPTX
digital logic circuits, digital component floting and fixed point
PDF
Floating Point Numbers
PPTX
PPT
Class10
PPTX
06 floating point
Floating point representation and arithmetic
B.sc cs-ii-u-1.8 digital logic circuits, digital component floting and fixed ...
Bca 2nd sem-u-1.8 digital logic circuits, digital component floting and fixed...
digital logic circuits, digital component floting and fixed point
Floating Point Numbers
Class10
06 floating point

Similar to Aviraj --floating point representation and arithmetic.pptx (20)

PPT
An introduction to floating point arithmetic.ppt
PPTX
BOOTH ALGO, DIVISION(RESTORING _ NON RESTORING) etc etc
PPTX
Floating point Binary Represenataion
PPTX
Floating Point Represenataion.pptx
PPT
L12-FloatingPoint.ppt
PPTX
FloatingPoint2pptbyproffessorsfromrenowedinstitute.pptx
PDF
IEEE 754 Standards For Floating Point Representation.pdf
PDF
Floating point presentation
PPTX
Unit-1.pptx
PPT
number system: Floating Point representation.ppt
PPT
CSe_Cumilla Bangladeshrr_Country CSE CSE213_5.ppt
PPTX
IEEE floating point representation
PPT
09 arithmetic 2
PPT
09 arithmetic
PPT
09 arithmetic
PDF
Floating_point_representation.pdf
PDF
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
PPT
Single Precision Floating Point Format.ppt
PPTX
VHDL and Cordic Algorithim
PPTX
Computer Organization - Arithmetic & Logic Unit.pptx
An introduction to floating point arithmetic.ppt
BOOTH ALGO, DIVISION(RESTORING _ NON RESTORING) etc etc
Floating point Binary Represenataion
Floating Point Represenataion.pptx
L12-FloatingPoint.ppt
FloatingPoint2pptbyproffessorsfromrenowedinstitute.pptx
IEEE 754 Standards For Floating Point Representation.pdf
Floating point presentation
Unit-1.pptx
number system: Floating Point representation.ppt
CSe_Cumilla Bangladeshrr_Country CSE CSE213_5.ppt
IEEE floating point representation
09 arithmetic 2
09 arithmetic
09 arithmetic
Floating_point_representation.pdf
3. IEEE 754 FLOATING POINT For Comp. ORG.pdf
Single Precision Floating Point Format.ppt
VHDL and Cordic Algorithim
Computer Organization - Arithmetic & Logic Unit.pptx
Ad

Recently uploaded (20)

PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Introduction to Inferential Statistics.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
modul_python (1).pptx for professional and student
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
Business_Capability_Map_Collection__pptx
DOCX
Factor Analysis Word Document Presentation
PDF
Introduction to the R Programming Language
PPTX
Managing Community Partner Relationships
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
New ISO 27001_2022 standard and the changes
PDF
annual-report-2024-2025 original latest.
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Microsoft 365 products and services descrption
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Microsoft Core Cloud Services powerpoint
retention in jsjsksksksnbsndjddjdnFPD.pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Introduction to Inferential Statistics.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
modul_python (1).pptx for professional and student
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Leprosy and NLEP programme community medicine
Business_Capability_Map_Collection__pptx
Factor Analysis Word Document Presentation
Introduction to the R Programming Language
Managing Community Partner Relationships
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
New ISO 27001_2022 standard and the changes
annual-report-2024-2025 original latest.
SAP 2 completion done . PRESENTATION.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Microsoft 365 products and services descrption
Qualitative Qantitative and Mixed Methods.pptx
Microsoft Core Cloud Services powerpoint
Ad

Aviraj --floating point representation and arithmetic.pptx

  • 1. FLOATING POINT REPRESENTATION AND ARITHMETIC Presentation Topic B. Sc CS (H)I Year Computer Organization and Architecture SIR CHHOTU RAM ENGG. INSTITUTE & TECH. CCS UNIV. CAMPUS, MEERUT
  • 2. INTRODUCTION •Objective: To understand how to represent floating point numbers in the computer and how to perform arithmetic with them. •Approximate arithmetic –Finite Range –Limited Precision •Topics –IEEE format for single and double precision floating point numbers –Floating point addition
  • 3. FLOATING POINT • An IEEE floating point representation consists of –A Sign Bit (no surprise) –An Exponent (“times 2 to the what?”) –Mantissa (“Significand”), which is assumed to be 1.xxxxx (thus, one bit of the mantissa is implied as 1) –This is called a normalized representation • So a mantissa = 0 really is interpreted to be 1.0, and a mantissa of all 1111 is interpreted to be 1.1111
  • 4. FLOATING POINT STANDARD • Defined by IEEE Std 754-1985 • Developed in response to divergence of representations –Portability issues for scientific code • Now almost universally adopted • Two representations –Single precision (32-bit) –Double precision (64-bit)
  • 5. S s 5 IEEE Floating-Point Format • S: sign bit (0  non-negative, 1  negative) • Normalize significand: 1.0 ≤ |significand| < 2.0 – Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) – Significand is Fraction with the “1.” restored • Exponent: excess representation: actual exponent + Bias – Ensures exponent is unsigned – Single: Bias = 127; Double: Bias = 1203 S Exponent Fraction single: 8 bits double: 11 bits single: 23 bits double: 52 bits Bias) (Exponent S 2 Fraction) (1 1) ( x      
  • 6. FLOATING-POINT EXAMPLE • Represent –0.75 – –0.75 = (–1)1 × 1.12 × 2–1 – S = 1 – Fraction = 1000…002 – Exponent = –1 + Bias • Single: –1 + 127 = 126 = 011111102 • Double: –1 + 1023 = 1022 = 011111111102 • Single: 1011111101000…00 • Double: 1011111111101000…00
  • 7. FLOATING-POINT EXAMPLE • What number is represented by the single- precision float 11000000101000…00 – S = 1 – Fraction = 01000…002 – Fxponent = 100000012 = 129 • x = (–1)1 × (1 + 012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0
  • 8. A s Representation of Floating Point Numbers •IEEE 754 double precision 31 30 20 19 0 Sign Biased exponent Normalized Mantissa (implicit 53rd bit) (-1)s  F  2E-1023
  • 9. FLOATING POINT ARITHMETIC • fl(x) = nearest floating point number to x • Relative error (precision = s digits) –|x - fl(x)|/|x| 1/2 1-s for = 2, 2-s • Arithmetic –x y = fl(x+y) = (x + y)(1 + ) for < u –x y = fl(x y)(1 + ) for < u ULP—Unit in the Last Place is the smallest possible increment or decrement that can be made using the machine's FP arithmetic.
  • 10. FLOATING POINT -ADDITION • Consider a 4-digit decimal example – 9.999 × 101 + 1.610 × 10–1 • 1. Align decimal points – Shift number with smaller exponent – 9.999 × 101 + 0.016 × 101 • 2. Add significands – 9.999 × 101 + 0.016 × 101 = 10.015 × 101 • 3. Normalize result & check for over/underflow – 1.0015 × 102 • 4. Round and renormalize if necessary – 1.002 × 102