The root mean square amplitude is a measure of the energy in a speech signal. When applied to successive windows of a speech signal it gives us a measure of the change in amplitude over time.
Zero crossing rate measures the number of times a signal crosses the zero line per unit of time. This gives us a measure of the dominant frequency in a signal -- the frequency with the largest amplitude. This is often correlated with the first formant in vowels. ZCR can be useful in differentiating between voiced and unvoiced sounds since unvoiced sounds tend to have a large ZCD while in voiced sounds it is smaller.
A combination of RMS and ZCR can be used to make a simple speech/nospeech decision. ZCR tends to be high in unvoiced sounds, RMS is high in voiced sounds -- a suitable weighted sum will be high for any speech sound and low for non-speech.
Autocorrelation can be used to find the fundamental frequency or pitch of a signal. The technique relies on finding the correlation between a signal and a delayed version of itself. If the delay corresponds to exactly one pitch period, then the signal and the delayed version will co-vary -- that is when one goes up the other will go up, etc -- we say that they will be highly correlated. If the delay corresponds to half a pitch period, they will be opposed to one another or un-correlated. If we plot the degree of correlation (which varies between 1 and -1) vs. the lag between the two signals we get an autocorrelation curve, as shown in the figure. We can see a peak in the curve corresponding to a lag of one pitch period -- and hence autocorrelation can be used to determine the pitch of a signal.
Real speech doesn't give as clean a curve as in the example above (which was from a sinusoid). But as the example here shows we can often see good peaks in the autocorrelation curve of voiced speech (right). Unvoiced speech (left) has a very different autocorrelation pattern and hence autocorrelation is another way of making the voiced/unvoiced decision.