overflow oscillations

lossy quantization

overflow-protected

without any precision loss. If successive operations use operands represented

with b bits it is clear that the least-significant bits must be eliminated, thus

introducing a quantization. The effects of these quantizations can be studied

resorting to the additive white noise model, where the points of injection of

noises are the points where the quantization actually occurs.

Both phenomena can be expressed as nonzero signals that are maintained even

when the system has stopped to produce usuful signals. The limit cycles are

usually small oscillations due to the fact that, because of rounding, the sources

of quantization noise determine a local amplification or attenuation of the sig-

nal (see fig. 4). If the signals within the system have a physical meaning (e.g.,

they are propagating waves), the limit cycles can be avoided by forcing a lossy

quantization, which truncates the numbers always toward zero. This operation

corresponds to introducing a small numerical dissipation. The overflow oscilla-

tions are more serious because they produce signals as large as the maximum

amplitude that can be represented. They can be produced by operations whose

results exceed the largest representable number, so that the result is slapped

back into the legal range of two's complement numbers. Such a distructive os-

cillation can be avoided by using overflow-protected operations, which are op-

erations that saturate the result to the largest representable number (or to the

most negative representable number).

are nonlinearities, since any linear and stable system can not give a persistent

nonzero output with a zero input.

the relative error

effects of quantization with floating point implementations.

linearly in the amplitude range. The idea, resemblant of the quasi logarithmic

sensitivity of the ear, is to have many more levels where signals are small and

a coarser quantization for large amplitudes. This is justified if the signals being

quantized do not have a statistical uniform distribution but tend to assume small

amplitudes more often than large amplitudes. Usually the distribution of levels

is exponential, in such a way that the intervals between points increase exponen-

tially with magnitude. This kind of quantization is called logarithmic because,

in practical realizations, a logarithmic compressor precedes a linear quantization

stage [69]. Floating-point quantization can be considered as a piecewise-linear

logarithmic quantization, where each linear piece corresponds to a value of the

exponent.