2. 1. Fixed-Point Format
2. Double-Precision Fixed-Point Format
3. Floating-Point Format
4. Block Floating-Point Format
5. Dynamic Range and Precision
3. 1 Fixed-Point Format:
1.The simplest scheme for number representation.
Uses a fixed number of bits to represent an integer or fraction.
2. Fixed-Point Signed Integer Representation
An n-bit fixed-point signed integer follows the format:
where:
s represents the sign of the number:
s = 0 for positive numbers and
s = 1 for negative numbers
The range of signed integer values:
4. 3. Fixed-Point Signed Fraction Representation
A fraction can also be represented using a fixed number of bits.
An n-bit signed fraction has an implied binary point after the sign bit.
The value is given by:
6. 4. Handling Multiplication in Fixed-Point Format
Multiplication of integers may produce results requiring more bits.
If bits are limited, wraparound occurs
Fractional representation can mitigate this issue:
Multiplying two fractions produces another fraction.
The result can retain the same bit width by discarding less significant
bits.
7. 2 Double-Precision Fixed-Point Format
1. Definition
To increase the range of numbers in fixed-point format, one approach
is to double the bit size.
2. Benefits of Double-Precision Format
Extends the range of representable numbers.
Provides higher precision.
3. Drawbacks of Double-Precision Format
Requires double the storage space.
May double the number of memory accesses when the DSP device has
8. 3 Floating-Point Format
1. Purpose and Need for Floating-Point Representation
In DSP applications, some algorithms involve the summation of a
large number of products.
A large number of bits is required to represent signals and allow for
adequate signal growth.
Since processors have a limited number of bits, some DSP processors
use a floating-point format for computations.
2. Floating-Point Representation
Mantissa (Mx) – Represents the significant digits.
Exponent (Ex) – Determines the scaling factor.
The value of a floating-point number is given by:
9. 3. Multiplication of Floating-Point Numbers
If two floating-point numbers, x and y, are multiplied:
The multiplication process requires:
A multiplier for the mantissas.
An adder for the exponents.
4. Addition of Floating-Point Numbers
Before adding floating-point numbers, their exponents must be
normalized to ensure they are the same.
This normalization step involves shifting the mantissa of the smaller
exponent to match the larger one.
10. 5. IEEE-754 Single-Precision Floating-Point Representation
A commonly used representation is IEEE-754 single-precision format.
The value of a number in this format is given by:
where:
S is the sign bit (0 for positive, 1 for negative).
F is the fractional part of the mantissa.
E is the exponent, stored as an integer with a bias.
The range of fractional numbers that can be represented in the
mantissa is:
11. 6. Bias and Exponent Range
The bias depends on the number of bits reserved for the exponent:
In single-precision (8-bit exponent):
Bias = 127
Exponent range = 0 to 255
7. Disadvantages of Floating-Point Format
While floating-point format increases the range of representable
numbers, it introduces speed reduction due to additional
computational steps:
Multiplication requires exponent addition and mantissa multiplication.
Addition requires normalization before performing mantissa addition.
13. 4 Block Floating-Point Format
1. Purpose
Increases the range and precision of fixed-point numbers without
requiring full floating-point hardware.
2. Representation
A block of fixed-point numbers shares a common exponent, stored
separately.
Each number in the block has a different mantissa, stored in fixed-
point format.
14. 3. Computation
The common exponent is determined based on the smallest number
of leading zeros in the block.
Numbers are shifted to maximize precision within the given fixed-
point format.
16. 1. Dynamic Range
Definition: The ratio of the maximum value to the minimum value that can be
represented in a given number representation format.
Impact of Bits: More bits increase the dynamic range.
Formula: Every additional bit increases the dynamic range by 6 dB.
2. Resolution
Definition: The smallest value that can be represented in a given format.
Formula:
Higher bits Better resolution.
→
3. Trade-Off Between Precision and Speed
More bits Higher precision Slower processing.
→ →
17. Truncation vs. Rounding:
Truncation: Simply cuts off extra bits Faster but less accurate.
→
Rounding: Adjusts values properly More accurate but slower.
→