88
D. Rocchesso: Sound Processing
phase unwrapping
linear predictive coding
LPC
vocoder
source signal
target signal
white noise
pulse train
voiced
unvoiced
residual
prediction error
allpole filter
where A(m)e
j(m)
contain the amplitude and instantaneous phase of the sinu-
soid that falls within the k-th bin, and W (
2
N
k-
i
(m)) is the window transform.
If we have access to the instantaneous phase, we can deduce the instantaneous
frequency by back difference between two adjacent frames. This can be done as
long as we deal with the problem of phase unwrapping, due to the fact that the
phase is known modulo 2.
It can be shown [52, pag. 287288] that phase unwrapping can be unambigu-
ous under
Assumption 2 Said H the hop size and
2
N
the separation between adjacent
bins, let
2
N
H < .
(26)
The assumption 2 holds for rectangular windows and imposes H <
N
2
. For
Hann or Hamming windows the hop size must be such that H <
N
4
(75%
overlap). Therefore the frame rate to be used for accurate partial estimation is
higher than the minimal frame rate needed for perfect reconstruction.
4.2
Linear predictive coding
(with Federico Fontana)
The analysis/synthesis method known as linear predictive coding (LPC) was
introduced in the sixties as an efficient and effective mean to achieve synthetic
speech and speech signal communication [92]. The efficiency of the method is
due to the speed of the analysis algorithm and to the low bandwidth required
for the encoded signals. The effectiveness is related to the intelligibility of the
decoded vocal signal.
The LPC implements a type of vocoder [10], which is an analysis/synthesis
scheme where the spectrum of a source signal is weighted by the spectral compo-
nents of the target signal that is being analyzed. The phase vocoder of figures 2
and 5 is a special kind of vocoder where amplitude and phase information of the
analysis channels is retained and can be used as weights for complex sinusoids
in the synthesis stage.
In the standard formulation of LPC, the source signals are either a white
noise or a pulse train, thus resembling voiced or unvoiced excitations of the
vocal tract, respectively.
The basic assumption behind LPC is the correlation between the n-th sample
and the P previous samples of the target signal. Namely, the n-th signal sample
is represented as a linear combination of the previous P samples, plus a residual
representing the prediction error:
x(n) = -a
1
x(n - 1) - a
2
x(n - 2) - . . . - a
P
x(n - P ) + e(n) .
(27)
Equation (27) is an autoregressive formulation of the target signal, and the
analysis problem is equivalent to the identification of the coefficients a
1
, . . . a
P
of an allpole filter. If we try to minimize the error in a mean square sense, the
problem translates into a set of P equations
P
k=1
a
k
n
x(n - k)x(n - i) = -
n
x(n)x(n - i) ,
(28)
Next Page >>
<< Previous Page
Back to the Table of Contents