88

D. Rocchesso: Sound Processing

phase unwrapping

linear predictive coding

LPC

vocoder

source signal

target signal

white noise

pulse train

voiced

unvoiced

residual

prediction error

allpole filter

where A(m)e

j(m)

contain the amplitude and instantaneous phase of the sinu-

soid that falls within the k-th bin, and W (

2

N

k-

i

(m)) is the window transform.

If we have access to the instantaneous phase, we can deduce the instantaneous

frequency by back difference between two adjacent frames. This can be done as

long as we deal with the problem of phase unwrapping, due to the fact that the

phase is known modulo 2.

It can be shown [52, pag. 287288] that phase unwrapping can be unambigu-
ous under

Assumption 2 Said H the hop size and

2

N

the separation between adjacent

bins, let

2

N

H < .

(26)

The assumption 2 holds for rectangular windows and imposes H <

N

2

. For

Hann or Hamming windows the hop size must be such that H <

N

4

(75%

overlap). Therefore the frame rate to be used for accurate partial estimation is

higher than the minimal frame rate needed for perfect reconstruction.

4.2

Linear predictive coding

(with Federico Fontana)

The analysis/synthesis method known as linear predictive coding (LPC) was

introduced in the sixties as an efficient and effective mean to achieve synthetic

speech and speech signal communication [92]. The efficiency of the method is

due to the speed of the analysis algorithm and to the low bandwidth required

for the encoded signals. The effectiveness is related to the intelligibility of the

decoded vocal signal.
The LPC implements a type of vocoder [10], which is an analysis/synthesis
scheme where the spectrum of a source signal is weighted by the spectral compo-

nents of the target signal that is being analyzed. The phase vocoder of figures 2

and 5 is a special kind of vocoder where amplitude and phase information of the

analysis channels is retained and can be used as weights for complex sinusoids

in the synthesis stage.
In the standard formulation of LPC, the source signals are either a white

noise or a pulse train, thus resembling voiced or unvoiced excitations of the

vocal tract, respectively.

The basic assumption behind LPC is the correlation between the n-th sample

and the P previous samples of the target signal. Namely, the n-th signal sample

is represented as a linear combination of the previous P samples, plus a residual

representing the prediction error:

x(n) = -a

1

x(n - 1) - a

2

x(n - 2) - . . . - a

P

x(n - P ) + e(n) .

(27)

Equation (27) is an autoregressive formulation of the target signal, and the
analysis problem is equivalent to the identification of the coefficients a

1

, . . . a

P

of an allpole filter. If we try to minimize the error in a mean square sense, the

problem translates into a set of P equations

P

k=1

a

k

n

x(n - k)x(n - i) = -

n

x(n)x(n - i) ,

(28)