masking

Figure 8: Degree of consonance between two sine tones as a function of their
frequency distance, measured as a fraction of critical bandwidth (Measurement
by Plomp and Levelt (1965) reported also in [105]).

thresholding level in order to become audible in presence of the masker. This
phenomenon is called masking and it is cartoonified in figure 9. Indeed, masking
is ill-defined in the immediate proximity of the masker, because there the pres-
ence of beats may let the interference between masker and masked tone become
apparent.

more effective towards high frequencies (note also the log scale in frequency).
Second, high-intensity maskers spread their effects even more towards high fre-
quencies. The latter phenomenon is called upward spread of masking, and it is
due to the nonlinear behavior of the outer hair cells of the cochlea, whose stiff-
ness depends on the excitation they receive [4]. A high-frequency cell, excited
by a lower-frequency tone, increases its stiffness and becomes less sensitive to
components at its characteristic frequency.

cerned, so that it may well happen that in a tone with a few dozens of partials,
only five or six emerge from a collective masking threshold. In a sound coding
task, it is obvious that we should use all our resources (i.e., the bits) to encode
those partials, thus neglecting the components that are masked. This idea is the
basis for perceptual audio coding, as it is found in the MPEG-1 standard [69].

the effects of masking extend in the future for up to 40ms (forward masking),
and in the past for up to 10ms (backward masking). These temporal effects
may occur because the brain integrates sound information over time, and there
are inherent delays in this operation. Therefore, a soft tone preceding a louder
tone by a couple of milliseconds is likely to be just canceled from our perceptual
system.