9.2. Linear Predictive Coding

In order to understand LPC and many other signal processing techniques it is necessary to look at the mathematics behind the z-transform. Unfortunately this is quite a hairy topic for a non-mathematical audience and so can be a hurdle in properly understanding these techniques. In our book we try to give a simple step-by-step coverage of the z-transform and how it relates to digital filtering and I would like to walk through the major points in these notes. The aim of this exercise is so that you can look at, for example, equation 18 in Picone's paper and not only stay upright but maybe even say, "hmm, I see".

The story begins back in Chapter 6 on page 178. If you remember back to 801, we talked about frequency domain filtering as a mulitplication operation between the spectra of the source and filter. In this section we develop a new way of talking about the frequency domain, the z-transform, such that we can characterise a signal and a filter as a polynomial, and perform the filtering operation by multiplying these together.

9.2.1. The Spectrum and the Z-transform

Section 6.6.1 explains how the amplitude and phase of a digital signal can be expressed as a vector of complex numbers and in particular that the spectrum of an impulse [1 0 0 0 ...] can be expressed as the vector X[k] = 1 + 0i for all k. This corresponds to saying that the unit impulse is the sum of sinusoids at all (digital) frequencies with amplitude 1 and phase 0 radians. Following this we see that the spectrum of a time shifted version of this impulse signal differs only in the phase component and can be neatly expressed via Euler's relation as X[k] = Aexp(-iWkp) (where exp denotes the exponential and the signal has an amplitude of A and has been shifted by P points relative to the original unit impulse). We then define a new variable z = exp(iWk) so that the shifted spectrum can be written as X(z) = Az-p -- a polynomial in z or the z-transform of the shifted impulse signal.

We next note that any signal can be seen as the sum of weighted, shifted, impulse signals ([2 3 4 5] = 2*[1 0 0 0] + 3*[0 1 0 0 ] etc). Hence (since we are dealing with linear time-invariant signals) the z-transform of an arbitary signal can be derived from the z-transforms of the shifted impulses.

x[n] = [4 2 1 3 0 0 0 0]

X(z) = 4 + 2z-1 + z-2 + 3z-3

(note that there's a typo in this example in the book (top of p185) where the exponent of the third term is -3 instead of -2.)

So, now we know what a z-transform is (a way of writing the spectrum of a signal) and we can derive the z-transform for any digital signal we come across. Note that by substituting z = exp(iWk), where k is a vector of digital frequency values, into any z-transform we can get back to the DFT of a signal.

9.2.2. Convolution as Multiplication

We know that we can apply a digital filter either by convolving a time domain signal with the impulse response of the filter or by multiplying the fourier spectra of the signal and the filter. Section 6.6.2 describes how we can also apply a filter by multiplying the z-transforms of the signal and the filter. In 6.6.3 we see that writing the source filter equation in terms of z-transforms:

Y(z)A(z) = B(z)X(z)

allows us to combine the recursive (A(z)) and non-recursive (B(z)) parts of the filter into a single transfer function H(z) which is the z-transform of the impulse response of the filter -- that is, the z-transform of the signal you would get if you passed a unit impulse [1 0 0 0 ...] through the filter. H(z) is a complicated polynomial in z but knowing that we can characterise any filter by a polynomial H(z) is the important point to take away.

Figure 6.16 summarises the relationship between signals, spectra and z-transforms. The impulse response and the spectrum are two ways of characterising the effect of filter; either can be derived from the convolution equation and we can transform between them using the DFT and IDFT.

9.2.3. LPC analysis

Moving now to Chapter 8 we will begin to look at Linear Predictive Coding. LPC is another method of separating out the effects of source and filter from a speech signal; similar in intention to cepstral analysis but using quite different methods. One way of thinking about LPC is as a coding method -- a way of encoding the information in a speech signal into a smaller space for transmission over a restricted channell. LPC encodes a signal by finding a set of weights on earlier signal values that can predict the next signal value:

y[n] = a[1]y[n-1] + a[2]y[n-1] + a[3]y[n-3] + e[n]

If values for a[1..3] can be found such that e[n] is very small for a stretch of speech (say one analysis window), then we can transmit only a[1..3] instead of the signal values in the window. The speech frame can be reconstructed at the other end by using a default e[n] signal and predicting subsequent values from earlier ones. Clearly this relies on being able to find these values of a[1..k] but there are a couple of algorithms which can do this (one is covered in the book). The result of LPC analysis then is a set of coefficients a[1..k] and an error signal e[n], the error signal will be as small as possible and represents the difference between the predicted signal and the original.

There is an obvious parallel between the LPC equation and that of a recursive filter (y*a = x):

y[n] = -a[1]y[n-1] - a[2]y[n-1] - a[3]y[n-3] + ... + x[n]

where we have rearranged the terms as in Equation 8.9 in the text. The LPC coefficients correspond to those of a recursive filter and the error signal corresonds to a source signal. Moreover, the conditions under which the error signal is minimised in LPC analysis mean that the error signal will have a flat spectrum and hence that the error signal will approximate either an impulse train or a white noise signal. This is a very close match to our source filter model of speech production where we excite a vocal tract filter with either a voiced signal (which looks like a series of impulses) or a noise source. So, LPC analysis has the wonderful property of finding the coefficients of a filter which will convert either noise or an impulse train into the original frame of speech.

The result isn't quite perfect; as pointed out on page 214 the filter coefficients derived by LPC analysis contain information about the glottal source filter, the lip radiation/preemphasis filter and the vocal tract itself. However since these are much less variable than the vocal tract filter we can factor them out in practice (eg. by preemphasis before LPC analysis).

9.2.4. Formants and Smooth Spectra

Why did we need to know about z-transforms to cover LPC analysis? Well, if this were as far as we were going then we didn't need z, but LPC is really just a way in to some more interesting signal analysis techniques.

The LPC coefficients make up a model of the vocal tract shape that produced the original speech signal. A spectrum generated from these coefficients would show us the properties of the vocal tract shape without the interference of the source spectrum. From our earlier discussion we know that we can take the spectrum of the filter in various ways, for example by passing an impulse through the filter and taking it's DFT, or by substituting for z=exp(iWk) in the z transform of the signal. Either way, the result can be quite useful in signal analysis.

Looking at an LPC smoothed spectrum of voiced speech we can clearly see the formant peaks; they tend to be much more well defined than in a cepstrally smoothed spectrum. As discussed on p223, we can use the z-transform notation to find the locations of these formant peaks for a given set of LPC coefficients, corresponding to the points at which A(z) is zero. This is the key to automatic formant tracking of speech signals -- derive the LPC coefficients, solve the z-transform equation and record the resulting formant positions. Unfortunately since the LPC model isn't a perfect fit to real speech production (it assumes a lossless, all pole model, for example) this method will derive spurious formants; most of the work in a good formant tracking program is working out which of the candidiate formants is the real thing.

LPC coefficients can also be used to derive cepstral coefficients and area functions as described in the remainder of Chapter 6. LPC is a powerful signal modelling technique and is very important in speech recognition and speech analysis.