Table of Contents
The Fourier transform transforms a time domain signal into a frequency domain representation of that signal. This means that it generates a description of the distribution of the energy in the signal as a function of frequency. This is normally displayed as a plot of frequency (x-axis) against amplitude (y-axis) called a spectrum.
In digital signal processing the Fourier transform is almost always performed using an algorithm called the Fast Fourier Transform or FFT. This is, as its name suggests, a quick way of performing this transform and it gives the same results as the slower Discrete Fourier Transform (DFT) would. We needn't worry here about any of the details of how the transform is carried out, we are interested only in the results. One consequence of using the FFT algorithm is that the length of the signal being analysed must be a power of two, eg. 128, 256, 512, 1024 and so on (although many modern implementations get around this limitation).
Applying the FFT algorithm to a 512 point window of data taken from a speech signal results in a vector of 512 numbers corresponding to the energy at 512 frequencies spanning the range from 0Hz to the sampling frequency. From Fourier's theorem we know that adding together 512 sinusoids at these frequencies will exactly reconstruct the original signal. From Nyquist's theorem we know that the largest frequency component in the original signal must be half the sampling frequency, all frequencies above this are aliased onto lower frequencies. It follows that the second half of the spectrum is the mirror image of the first half:
In fact, the first point in the spectrum isn't reflected and the frequency range from 0 to half the sampling frequency is covered by (N/2)+1 points where N is the size of the window. So from a 512 point fft of speech sampled at 10000Hz we get 257 unique spectral points covering the range 0 to 5000Hz.