Mathematical Fundamentals
biased representation
quantization step
numbers. Often, the rational numbers are considered to be normalized to one,
i.e., to be limited to the range [-1, 1). In such a case, the decimal point is placed
before the leftmost binary digit.
For the floating point representation we can follow different conventions. In
particular, the IEEE 754 floating-point single-precision numbers obey to the
following rules
the number is represented as
1.xx . . . x
where x are the binary digits of the mantissa and y are the binary digits
of the exponent
The number is represented on 32 bits according to the following block
bit 31: sign bit
bits 2330: exponent yy . . . y in biased representation
, from the
most negative 00 . . . 0 to the most positive 11 . . . 1
bits 022: mantissa in unsigned binary representation
The IEEE 754 standard of double-precision floating-point numbers uses 11 bits
for the exponent and 52 bits for the mantissa.
It should be clear that both the fixed- and the floating-point representations
take a subset of rational numbers. Fixed-point numbers are equally spaced be-
tween the minimum and the maximum representable value with a quantization
step equal to 2
, where d is the number of digits on the right of the deci-
mal point. Floating-point numbers are unevenly distributed, being more sparse
for large values of the exponent and more dense for little exponents. Floating-
point numbers have the possibility to represent a large range, from 2 10
2 10
in single precision, and from 2 10
to 2 10
in double precision.
Therefore, it is possible to do many computations without worrying of errors
due to overflow. Moreover, the high density of small numbers reduces the prob-
lems due to the quantization step. This is paid in terms of a more complicated
The bias is 127. Therefore, the exponent 1 is coded as 1 + 127 = 128 = 10000000
. The
biased representation simplifies the bit-oriented sorting operations.
Next Page >>
<< Previous Page
Back to the Table of Contents